gatb / minia Goto Github PK

View Code? Open in Web Editor NEW

70.0 70.0 12.0 1.78 MB

Minia is a short-read assembler based on a de Bruijn graph

Home Page: https://gatb.inria.fr/software/minia

License: GNU Affero General Public License v3.0

CMake 5.18% C++ 67.62% Shell 18.42% Python 8.78%

minia's People

Contributors

Stargazers

Watchers

Forkers

abremges ppericard ysard threadmapper xinli0111 wangzhennan14 pythseq tw7649116 henry-ding zhijunbioinf fakher77 wangpanqiao

minia's Issues

Can I use minia to assemble rna-seq data?

Hi,

Can I use minia as a rna-seq denovo assembler?

Best,
Kun

minia generates non-deterministic assemblies?

Hello,

I'm attempting to use minia to perform local assembly of Linked-Reads data, and realign the obtained contigs to determine breakpoints as a part of a SV calling tool.

However, when running multiple instances of minia, with the same parameters, and on the same input file, it seems to output (slightly) different assemblies.

While all assemblies are very similar, some of them sometimes end up messing up the breakpoints computation (although I can definitely take some of the blame on that problem).

I've tried playing around with the parameters a bit, but couldn't come up with something that would yield deterministic assemblies. So I'm wondering, is that an expected behaviour, or maybe am I overlooking something here?

Thanks.

Best,
Pierre

minor typo

when solidity-kind is cutom, specifies

=> custom

What about Haplotype reconstruction and Minia

Hi Minia Creators,

Did you have any idea/clues about the behaviour of Minia on a targeted genes with high ploidy samples (tumors) ?

best Regards,
JB

-max-memory ignored

In [DSK: Pass 1/1, Step 2: counting kmers ] the memory limit seems to be ignored. Minia went over the ram limit and entered SWAP.

Is 3.2.0 fast as before ?

Minia old version is so fast and memory efficient. What about 3.2.0?

h5dump clashes with official version from hdf5 package

Could you rename your h5dump to minia-h5dump ?

Log Message

Hello -
I am trying to troubleshoot a genome assembly, and in reading the minia log file I see several thousand instances of "Weird, there was supposed to be an in-neighbor. Maybe there's a loop. Remove this print if it never happensin degree 1" - does this have any impact on the output? If so, how can I mitigate the issue? Thank you.

using 10X data

how is it possible to use barcoded reads from 10X data as inputs of minia

Minia 3, git commit fatal: not a git repository

I built https://github.com/GATB/minia/releases/download/v3.2.0/minia-v3.2.0-Source.tar.gz

When I run minia it prints:

Minia 3, git commit fatal: not a git repository (or any of the parent directories): .git

Was hoping for 3.2.0

multiple input files

Can I use multiple -1 -2 --mp-1 --mp-2 flags to load multiple input files or do I have to cat all files together (that's what I used previously). If they have to be concatenated, can both --mp-12 and -12 flags can be used at the same time?

../gatb-minia-pipeline/gatb --max-memory 150000 --nb-cores 38
--mp-1 /path/s-11kb_1_R1.fq.gz
--mp-2 /path/s-11kb_1_R2.fq.gz
--mp-1 /path/s-11kb_2_R1.fq.gz
--mp-2 /path/s-11kb_2_R2.fq.gz
--mp-1 /path/s-11kb_3_R1.fq.gz
--mp-2 /path/s-11kb_3_R2.fq.gz
--mp-1 /path/s-11kb_4_R1.fq.gz
--mp-2 /path/s-11kb_4_R2.fq.gz
--mp-1 /path/s-11kb_5_R1.fq.gz
--mp-2 /path/s-11kb_5_R2.fq.gz
--mp-1 /path/s-11kb_6_R1.fq.gz
--mp-2 /path/s-11kb_6_R2.fq.gz
--mp-1 /path/s-11kb_7_R1.fq.gz
--mp-2 /path/s-11kb_7_R2.fq.gz
--mp-1 /path/s-11kb_8_R1.fq.gz
--mp-2 /path/s-11kb_8_R2.fq.gz
-1 /path/s-400bp_1_R1.fq.gz
-2 /path/s-400bp_1_R2.fq.gz
-1 /path/s-400bp_2_R1.fq.gz
-2 /path/s-400bp_2_R2.fq.gz
-1 /path/s-400bp_3_R1.fq.gz
-2 /path/s-400bp_3_R2.fq.gz
-1 /path/s-400bp_4_R1.fq.gz
-2 /path/s-400bp_4_R2.fq.gz
-1 /path/s-400bp_5_R1.fq.gz
-2 /path/s-400bp_5_R2.fq.gz
-1 /path/s-400bp_6_R1.fq.gz
-2 /path/s-400bp_6_R2.fq.gz
-1 /path/s-400bp_7_R1.fq.gz
-2 /path/s-400bp_7_R2.fq.gz
-1 /path/s-400bp_8_R1.fq.gz
-2 /path/s-400bp_8_R2.fq.gz
-1 /path/s-500bp_1_R1.fq.gz
-2 /path/s-500bp_1_R2.fq.gz
-1 /path/s-500bp_2_R1.fq.gz
-2 /path/s-500bp_2_R2.fq.gz
-1 /path/s-500bp_3_R1.fq.gz
-2 /path/s-500bp_3_R2.fq.gz
-1 /path/s-500bp_4_R1.fq.gz
-2 /path/s-500bp_4_R2.fq.gz
-1 /path/s-500bp_5_R1.fq.gz
-2 /path/s-500bp_5_R2.fq.gz
-1 /path/s-500bp_6_R1.fq.gz
-2 /path/s-500bp_6_R2.fq.gz
-1 /path/s-500bp_7_R1.fq.gz
-2 /path/s-500bp_7_R2.fq.gz
-1 /path/s-500bp_8_R1.fq.gz
-2 /path/s-500bp_8_R2.fq.gz
--mp-1 /path/s-8kb_1_R1.fq.gz
--mp-2 /path/s-8kb_1_R2.fq.gz
--mp-1 /path/s-8kb_2_R1.fq.gz
--mp-2 /path/s-8kb_2_R2.fq.gz
--mp-1 /path/s-8kb_3_R1.fq.gz
--mp-2 /path/s-8kb_3_R2.fq.gz
--mp-1 /path/s-8kb_4_R1.fq.gz
--mp-2 /path/s-8kb_4_R2.fq.gz
--mp-1 /path/s-8kb_5_R1.fq.gz
--mp-2 /path/s-8kb_5_R2.fq.gz
--mp-1 /path/s-8kb_6_R1.fq.gz
--mp-2 /path/s-8kb_6_R2.fq.gz
--mp-1 /path/s-8kb_7_R1.fq.gz
--mp-2 /path/s-8kb_7_R2.fq.gz
--mp-1 /path/s-8kb_8_R1.fq.gz
--mp-2 /path/s-8kb_8_R2.fq.gz

Can't locate the Minia-pipeline multi kmer script

From the manual:

"For proper assembly, we rec- ommend that you use the Minia-pipeline that runs Minia multiple times, with an iterative multi-k algorithm.”

Do you know where this script is?

Add -version flag

Should print minia 3.2.0 to stdout and exit with 0.

how to set sveral kmer

Question1:
I use this command 'minia-v3.2.1-bin-Linux/bin/minia -in trim_cat.fq -kmer-size 101 -out-dir minia_assembly -max-memory 2048000' to assembly. But there was some error as followed in picture.
Question2:
I want to set kmer as 31,41,51,61, but the '-kmer-size 31,41,51,61' was not useful. could you help me? Thank you very much!

Best wishes,
Shirley

strange segfault

I'm testing minia using ~400 input fastq.gz files. I observed a strange segfault and was immediately curious if there might be something dependent on the number of input files. The input is ~60G or so, around what might be normal for a lower coverage human assembly, but derived from a reduced representation of the genome (this is for Capsicum, and we're using "genotyping-by-sequencing" data).

Here's the error log:

Minia 3, git commit efef7c7                                                                                                                                    [907/1955]
bglue_algo params, prefix:dummy.unitigs.fa k:5 threads:32
debug: not deleting glue files
setting storage type to hdf5
[Approximating frequencies of minimizers ]  100  %   elapsed:   1 min 34 sec   remaining:   0 min 0  sec   cpu:  99.8 %   mem: [  36,   36,  123] MB
[DSK: nb solid kmers found : 84023707    ]  100  %   elapsed:  49 min 34 sec   remaining:   0 min 0  sec   cpu: 330.7 %   mem: [1866, 6918, 6952] MB
bcalm_algo params, prefix:pepper_pangenome_k51_m3.unitigs.fa k:51 a:3 minsize:10 threads:32 mintype:1
DSK used 1 passes and 608 partitions
prior to queues allocation                      14:29:18     memory [current, maxRSS]: [1863, 6952] MB
Starting BCALM2                                 14:29:18     memory [current, maxRSS]: [1863, 6952] MB
[Iterating DSK partitions                ]  0    %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec
Iterated 711514 kmers, among them 47212 were doubled

In this superbucket (containing 2872 active minimizers),
                  sum of time spent in lambda's: 5177.6 msecs
                                 longest lambda: 21.5 msecs
         tot time of best scheduling of lambdas: 5177.6 msecs
                       best theoretical speedup: 240.6x
Done with partition 0                           14:29:19     memory [current, maxRSS]: [1926, 6952] MB
[Iterating DSK partitions                ]  9.87 %   elapsed:   0 min 35 sec   remaining:   5 min 17 sec
Iterated 332394 kmers, among them 19685 were doubled
Loaded 14123 doubled kmers for partition 61

In this superbucket (containing 539 active minimizers),
                  sum of time spent in lambda's: 2474.9 msecs
                                 longest lambda: 32.3 msecs
         tot time of best scheduling of lambdas: 2474.9 msecs
                       best theoretical speedup: 76.6x
Done with partition 61                          14:29:53     memory [current, maxRSS]: [2018, 6952] MB
[Iterating DSK partitions                ]  19.7 %   elapsed:   0 min 51 sec   remaining:   3 min 29 sec
Iterated 221450 kmers, among them 15343 were doubled
Loaded 13248 doubled kmers for partition 122

In this superbucket (containing 603 active minimizers),
                  sum of time spent in lambda's: 1531.1 msecs
                                 longest lambda: 21.7 msecs
         tot time of best scheduling of lambdas: 1531.1 msecs
                       best theoretical speedup: 70.6x
Done with partition 122                         14:30:10     memory [current, maxRSS]: [2075, 6952] MB
[2]    20638 segmentation fault  minia -in kept_fastqs.txt -kmer-size 51 -abundance-min 3 -out
minia -in kept_fastqs.txt -kmer-size 51 -abundance-min 3 -out   10587.55s user 172.51s system 314% cpu 56:59.59 total

minia doesn't assemble high-coverage regions

as evidenced by https://serratus-public.s3.amazonaws.com/rce/assembly/report_first_1k.tsv
after some high depth, coronaSPAdes can assemble, Minia cannot. Might be true for high-coverage parts of metagenomes too

minia without "-out" starts running then dies with HDF5 error

% minia -in SRR1661276_R1.fastq.gz -out-dir minia

HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
  #000: /builds/workspace/tool-minia-build-debian7-64bits-gcc-4.7/gatb-minia/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5F.c line 1500 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file

Does not do this if i add -out as well BUT then it doesn't put them in the -out-dir folder, but . instead.

This is the 2.0.7 binary.

Tag a new release

There has been a lot of changes since 2.0.7 - can you tag a new release for packaging?

Install fails

Dr. Chikhi,

I get an error when I run sh INSTALL

Running simple test... INSTALL: 17: ./simple_test.sh: [[: not found

I think there is a bug in the last line of the INSTALL script (an extra dot). When I corrected it, it seems to work.

An error occurred while running

But I know why

HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
#000: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Dio.c line 173 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#1: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Dio.c line 554 in H5D__read(): can't read data
major: Dataset
minor: Read failed
#2: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Dchunk.c line 1875 in H5D__chunk_read(): unable to read raw data chunk
major: Low-level I/O
minor: Read failed
#3: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Dchunk.c line 2905 in H5D__chunk_lock(): data pipeline read failed
major: Data filters
minor: Filter operation failed
#4: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Z.c line 1372 in H5Z_pipeline(): filter returned failure during read
major: Data filters
minor: Read failed
#5:/software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Zdeflate.c line 125 in H5Z_filter_deflate(): inflate() failed
major: Data filters
minor: Unable to initialize object

Segmentation fault when using Minia 3

Hi there,

I'm using Minia 3 (git commit 099e154, installed a week ago) to assemble a large read set of 2.9T. I got the following segmentation fault:

created vector of hashes, size approx 43573 MB) 02:27:28 memory [current, maxRSS]: [63055, 63057] MB
pass 3/3, 2855611176 unique hashes written to disk, size 21786 MB 02:32:29 memory [current, maxRSS]: [22333, 63057] MB
loaded all unique UF elements (8566991033) into a single vector of size 65360 MB 02:36:12 memory [current, maxRSS]: [87694, 87694] MB
[Building BooPHF] 100 % elapsed: 4 min 48 sec remaining: 0 min 0 sec
Bitarray 40352161792 bits (100.00 %) (array + ranks )
final hash 0 bits (0.00 %) (nb in final hash 0)
UF MPHF constructed (4810 MB) 02:41:03 memory [current, maxRSS]: [25397, 93822] MB
/var/spool/slurm/slurmd/spool/job3088264/slurm_script: line 10: 50997 Segmentation fault (core dumped) minia -in 3_QinN -nb-cores 30 -kmer-size 19 -out QinN

Seems that it's not a memory issue. Full log file is attached.

The command I used: minia -in 3_QinN -nb-cores 30 -kmer-size 19 -out QinN

Really appreciate it if anyone can help!
Minia.log

Is minia suitable for high heterozygous rate plant genome?

Hi,

I am looking for some assemblers to assemble a high heterozygous rate plant genome(diploid, het rate > 2%, haplotype genome size ~3.6G).
And I want to know how to use minia to assemble such a genome.

Best wishes,
Kun

Mic

Assembly Graph

Is it possible to export the assembly graph in GFA format?

gatb / minia Goto Github PK

minia's People

Contributors

Stargazers

Watchers

Forkers

minia's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs