gatb / minia Goto Github PK
View Code? Open in Web Editor NEWMinia is a short-read assembler based on a de Bruijn graph
Home Page: https://gatb.inria.fr/software/minia
License: GNU Affero General Public License v3.0
Minia is a short-read assembler based on a de Bruijn graph
Home Page: https://gatb.inria.fr/software/minia
License: GNU Affero General Public License v3.0
Hi,
Can I use minia as a rna-seq denovo assembler?
Best,
Kun
Hello,
I'm attempting to use minia to perform local assembly of Linked-Reads data, and realign the obtained contigs to determine breakpoints as a part of a SV calling tool.
However, when running multiple instances of minia, with the same parameters, and on the same input file, it seems to output (slightly) different assemblies.
While all assemblies are very similar, some of them sometimes end up messing up the breakpoints computation (although I can definitely take some of the blame on that problem).
I've tried playing around with the parameters a bit, but couldn't come up with something that would yield deterministic assemblies. So I'm wondering, is that an expected behaviour, or maybe am I overlooking something here?
Thanks.
Best,
Pierre
when solidity-kind is cutom, specifies
=> custom
Hi Minia Creators,
Did you have any idea/clues about the behaviour of Minia on a targeted genes with high ploidy samples (tumors) ?
best Regards,
JB
In [DSK: Pass 1/1, Step 2: counting kmers ] the memory limit seems to be ignored. Minia went over the ram limit and entered SWAP.
Minia old version is so fast and memory efficient. What about 3.2.0?
Could you rename your h5dump
to minia-h5dump
?
Hello -
I am trying to troubleshoot a genome assembly, and in reading the minia log file I see several thousand instances of "Weird, there was supposed to be an in-neighbor. Maybe there's a loop. Remove this print if it never happensin degree 1" - does this have any impact on the output? If so, how can I mitigate the issue? Thank you.
how is it possible to use barcoded reads from 10X data as inputs of minia
I built https://github.com/GATB/minia/releases/download/v3.2.0/minia-v3.2.0-Source.tar.gz
When I run minia
it prints:
Minia 3, git commit fatal: not a git repository (or any of the parent directories): .git
Was hoping for 3.2.0
Can I use multiple -1 -2 --mp-1 --mp-2 flags to load multiple input files or do I have to cat all files together (that's what I used previously). If they have to be concatenated, can both --mp-12 and -12 flags can be used at the same time?
../gatb-minia-pipeline/gatb --max-memory 150000 --nb-cores 38
--mp-1 /path/s-11kb_1_R1.fq.gz
--mp-2 /path/s-11kb_1_R2.fq.gz
--mp-1 /path/s-11kb_2_R1.fq.gz
--mp-2 /path/s-11kb_2_R2.fq.gz
--mp-1 /path/s-11kb_3_R1.fq.gz
--mp-2 /path/s-11kb_3_R2.fq.gz
--mp-1 /path/s-11kb_4_R1.fq.gz
--mp-2 /path/s-11kb_4_R2.fq.gz
--mp-1 /path/s-11kb_5_R1.fq.gz
--mp-2 /path/s-11kb_5_R2.fq.gz
--mp-1 /path/s-11kb_6_R1.fq.gz
--mp-2 /path/s-11kb_6_R2.fq.gz
--mp-1 /path/s-11kb_7_R1.fq.gz
--mp-2 /path/s-11kb_7_R2.fq.gz
--mp-1 /path/s-11kb_8_R1.fq.gz
--mp-2 /path/s-11kb_8_R2.fq.gz
-1 /path/s-400bp_1_R1.fq.gz
-2 /path/s-400bp_1_R2.fq.gz
-1 /path/s-400bp_2_R1.fq.gz
-2 /path/s-400bp_2_R2.fq.gz
-1 /path/s-400bp_3_R1.fq.gz
-2 /path/s-400bp_3_R2.fq.gz
-1 /path/s-400bp_4_R1.fq.gz
-2 /path/s-400bp_4_R2.fq.gz
-1 /path/s-400bp_5_R1.fq.gz
-2 /path/s-400bp_5_R2.fq.gz
-1 /path/s-400bp_6_R1.fq.gz
-2 /path/s-400bp_6_R2.fq.gz
-1 /path/s-400bp_7_R1.fq.gz
-2 /path/s-400bp_7_R2.fq.gz
-1 /path/s-400bp_8_R1.fq.gz
-2 /path/s-400bp_8_R2.fq.gz
-1 /path/s-500bp_1_R1.fq.gz
-2 /path/s-500bp_1_R2.fq.gz
-1 /path/s-500bp_2_R1.fq.gz
-2 /path/s-500bp_2_R2.fq.gz
-1 /path/s-500bp_3_R1.fq.gz
-2 /path/s-500bp_3_R2.fq.gz
-1 /path/s-500bp_4_R1.fq.gz
-2 /path/s-500bp_4_R2.fq.gz
-1 /path/s-500bp_5_R1.fq.gz
-2 /path/s-500bp_5_R2.fq.gz
-1 /path/s-500bp_6_R1.fq.gz
-2 /path/s-500bp_6_R2.fq.gz
-1 /path/s-500bp_7_R1.fq.gz
-2 /path/s-500bp_7_R2.fq.gz
-1 /path/s-500bp_8_R1.fq.gz
-2 /path/s-500bp_8_R2.fq.gz
--mp-1 /path/s-8kb_1_R1.fq.gz
--mp-2 /path/s-8kb_1_R2.fq.gz
--mp-1 /path/s-8kb_2_R1.fq.gz
--mp-2 /path/s-8kb_2_R2.fq.gz
--mp-1 /path/s-8kb_3_R1.fq.gz
--mp-2 /path/s-8kb_3_R2.fq.gz
--mp-1 /path/s-8kb_4_R1.fq.gz
--mp-2 /path/s-8kb_4_R2.fq.gz
--mp-1 /path/s-8kb_5_R1.fq.gz
--mp-2 /path/s-8kb_5_R2.fq.gz
--mp-1 /path/s-8kb_6_R1.fq.gz
--mp-2 /path/s-8kb_6_R2.fq.gz
--mp-1 /path/s-8kb_7_R1.fq.gz
--mp-2 /path/s-8kb_7_R2.fq.gz
--mp-1 /path/s-8kb_8_R1.fq.gz
--mp-2 /path/s-8kb_8_R2.fq.gz
From the manual:
"For proper assembly, we rec- ommend that you use the Minia-pipeline that runs Minia multiple times, with an iterative multi-k algorithm.โ
Do you know where this script is?
Should print minia 3.2.0
to stdout
and exit with 0
.
Question1:
I use this command 'minia-v3.2.1-bin-Linux/bin/minia -in trim_cat.fq -kmer-size 101 -out-dir minia_assembly -max-memory 2048000' to assembly. But there was some error as followed in picture.
Question2:
I want to set kmer as 31,41,51,61, but the '-kmer-size 31,41,51,61' was not useful. could you help me? Thank you very much!
Best wishes,
Shirley
I'm testing minia using ~400 input fastq.gz files. I observed a strange segfault and was immediately curious if there might be something dependent on the number of input files. The input is ~60G or so, around what might be normal for a lower coverage human assembly, but derived from a reduced representation of the genome (this is for Capsicum, and we're using "genotyping-by-sequencing" data).
Here's the error log:
Minia 3, git commit efef7c7 [907/1955]
bglue_algo params, prefix:dummy.unitigs.fa k:5 threads:32
debug: not deleting glue files
setting storage type to hdf5
[Approximating frequencies of minimizers ] 100 % elapsed: 1 min 34 sec remaining: 0 min 0 sec cpu: 99.8 % mem: [ 36, 36, 123] MB
[DSK: nb solid kmers found : 84023707 ] 100 % elapsed: 49 min 34 sec remaining: 0 min 0 sec cpu: 330.7 % mem: [1866, 6918, 6952] MB
bcalm_algo params, prefix:pepper_pangenome_k51_m3.unitigs.fa k:51 a:3 minsize:10 threads:32 mintype:1
DSK used 1 passes and 608 partitions
prior to queues allocation 14:29:18 memory [current, maxRSS]: [1863, 6952] MB
Starting BCALM2 14:29:18 memory [current, maxRSS]: [1863, 6952] MB
[Iterating DSK partitions ] 0 % elapsed: 0 min 0 sec remaining: 0 min 0 sec
Iterated 711514 kmers, among them 47212 were doubled
In this superbucket (containing 2872 active minimizers),
sum of time spent in lambda's: 5177.6 msecs
longest lambda: 21.5 msecs
tot time of best scheduling of lambdas: 5177.6 msecs
best theoretical speedup: 240.6x
Done with partition 0 14:29:19 memory [current, maxRSS]: [1926, 6952] MB
[Iterating DSK partitions ] 9.87 % elapsed: 0 min 35 sec remaining: 5 min 17 sec
Iterated 332394 kmers, among them 19685 were doubled
Loaded 14123 doubled kmers for partition 61
In this superbucket (containing 539 active minimizers),
sum of time spent in lambda's: 2474.9 msecs
longest lambda: 32.3 msecs
tot time of best scheduling of lambdas: 2474.9 msecs
best theoretical speedup: 76.6x
Done with partition 61 14:29:53 memory [current, maxRSS]: [2018, 6952] MB
[Iterating DSK partitions ] 19.7 % elapsed: 0 min 51 sec remaining: 3 min 29 sec
Iterated 221450 kmers, among them 15343 were doubled
Loaded 13248 doubled kmers for partition 122
In this superbucket (containing 603 active minimizers),
sum of time spent in lambda's: 1531.1 msecs
longest lambda: 21.7 msecs
tot time of best scheduling of lambdas: 1531.1 msecs
best theoretical speedup: 70.6x
Done with partition 122 14:30:10 memory [current, maxRSS]: [2075, 6952] MB
[2] 20638 segmentation fault minia -in kept_fastqs.txt -kmer-size 51 -abundance-min 3 -out
minia -in kept_fastqs.txt -kmer-size 51 -abundance-min 3 -out 10587.55s user 172.51s system 314% cpu 56:59.59 total
as evidenced by https://serratus-public.s3.amazonaws.com/rce/assembly/report_first_1k.tsv
after some high depth, coronaSPAdes can assemble, Minia cannot. Might be true for high-coverage parts of metagenomes too
% minia -in SRR1661276_R1.fastq.gz -out-dir minia
HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
#000: /builds/workspace/tool-minia-build-debian7-64bits-gcc-4.7/gatb-minia/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5F.c line 1500 in H5Fcreate(): unable to create file
major: File accessibilty
minor: Unable to open file
Does not do this if i add -out
as well BUT then it doesn't put them in the -out-dir
folder, but .
instead.
This is the 2.0.7 binary.
There has been a lot of changes since 2.0.7 - can you tag a new release for packaging?
Dr. Chikhi,
I get an error when I run sh INSTALL
Running simple test... INSTALL: 17: ./simple_test.sh: [[: not found
I think there is a bug in the last line of the INSTALL script (an extra dot). When I corrected it, it seems to work.
But I know why
HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
#000: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Dio.c line 173 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#1: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Dio.c line 554 in H5D__read(): can't read data
major: Dataset
minor: Read failed
#2: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Dchunk.c line 1875 in H5D__chunk_read(): unable to read raw data chunk
major: Low-level I/O
minor: Read failed
#3: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Dchunk.c line 2905 in H5D__chunk_lock(): data pipeline read failed
major: Data filters
minor: Filter operation failed
#4: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Z.c line 1372 in H5Z_pipeline(): filter returned failure during read
major: Data filters
minor: Read failed
#5:/software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Zdeflate.c line 125 in H5Z_filter_deflate(): inflate() failed
major: Data filters
minor: Unable to initialize object
Hi there,
I'm using Minia 3 (git commit 099e154, installed a week ago) to assemble a large read set of 2.9T. I got the following segmentation fault:
created vector of hashes, size approx 43573 MB) 02:27:28 memory [current, maxRSS]: [63055, 63057] MB
pass 3/3, 2855611176 unique hashes written to disk, size 21786 MB 02:32:29 memory [current, maxRSS]: [22333, 63057] MB
loaded all unique UF elements (8566991033) into a single vector of size 65360 MB 02:36:12 memory [current, maxRSS]: [87694, 87694] MB
[Building BooPHF] 100 % elapsed: 4 min 48 sec remaining: 0 min 0 sec
Bitarray 40352161792 bits (100.00 %) (array + ranks )
final hash 0 bits (0.00 %) (nb in final hash 0)
UF MPHF constructed (4810 MB) 02:41:03 memory [current, maxRSS]: [25397, 93822] MB
/var/spool/slurm/slurmd/spool/job3088264/slurm_script: line 10: 50997 Segmentation fault (core dumped) minia -in 3_QinN -nb-cores 30 -kmer-size 19 -out QinN
Seems that it's not a memory issue. Full log file is attached.
The command I used: minia -in 3_QinN -nb-cores 30 -kmer-size 19 -out QinN
Really appreciate it if anyone can help!
Minia.log
Hi,
I am looking for some assemblers to assemble a high heterozygous rate plant genome(diploid, het rate > 2%, haplotype genome size ~3.6G).
And I want to know how to use minia to assemble such a genome.
Best wishes,
Kun
Is there an option to do multithreading in minia?
http://minia.genouest.org/ still shows 2.0.3 as the latest?
Hi,
Any plans to implement paired-end read support? Have you ever tried Flash or
shuffleReadsfasta.pl from Velvet?
Thank you in advance.
Mic
Is it possible to export the assembly graph in GFA format?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.