GithubHelp home page GithubHelp logo

gatb / minia Goto Github PK

View Code? Open in Web Editor NEW
70.0 70.0 12.0 1.78 MB

Minia is a short-read assembler based on a de Bruijn graph

Home Page: https://gatb.inria.fr/software/minia

License: GNU Affero General Public License v3.0

CMake 5.18% C++ 67.62% Shell 18.42% Python 8.78%

minia's People

Contributors

cdeltel avatar genscale-admin avatar rchikhi avatar rizkg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

minia's Issues

minia generates non-deterministic assemblies?

Hello,

I'm attempting to use minia to perform local assembly of Linked-Reads data, and realign the obtained contigs to determine breakpoints as a part of a SV calling tool.

However, when running multiple instances of minia, with the same parameters, and on the same input file, it seems to output (slightly) different assemblies.

While all assemblies are very similar, some of them sometimes end up messing up the breakpoints computation (although I can definitely take some of the blame on that problem).

I've tried playing around with the parameters a bit, but couldn't come up with something that would yield deterministic assemblies. So I'm wondering, is that an expected behaviour, or maybe am I overlooking something here?

Thanks.

Best,
Pierre

minor typo

when solidity-kind is cutom, specifies

=> custom

-max-memory ignored

In [DSK: Pass 1/1, Step 2: counting kmers ] the memory limit seems to be ignored. Minia went over the ram limit and entered SWAP.

Log Message

Hello -
I am trying to troubleshoot a genome assembly, and in reading the minia log file I see several thousand instances of "Weird, there was supposed to be an in-neighbor. Maybe there's a loop. Remove this print if it never happensin degree 1" - does this have any impact on the output? If so, how can I mitigate the issue? Thank you.

using 10X data

how is it possible to use barcoded reads from 10X data as inputs of minia

Minia 3, git commit fatal: not a git repository

I built https://github.com/GATB/minia/releases/download/v3.2.0/minia-v3.2.0-Source.tar.gz

When I run minia it prints:

Minia 3, git commit fatal: not a git repository (or any of the parent directories): .git

Was hoping for 3.2.0

multiple input files

Can I use multiple -1 -2 --mp-1 --mp-2 flags to load multiple input files or do I have to cat all files together (that's what I used previously). If they have to be concatenated, can both --mp-12 and -12 flags can be used at the same time?

../gatb-minia-pipeline/gatb --max-memory 150000 --nb-cores 38
--mp-1 /path/s-11kb_1_R1.fq.gz
--mp-2 /path/s-11kb_1_R2.fq.gz
--mp-1 /path/s-11kb_2_R1.fq.gz
--mp-2 /path/s-11kb_2_R2.fq.gz
--mp-1 /path/s-11kb_3_R1.fq.gz
--mp-2 /path/s-11kb_3_R2.fq.gz
--mp-1 /path/s-11kb_4_R1.fq.gz
--mp-2 /path/s-11kb_4_R2.fq.gz
--mp-1 /path/s-11kb_5_R1.fq.gz
--mp-2 /path/s-11kb_5_R2.fq.gz
--mp-1 /path/s-11kb_6_R1.fq.gz
--mp-2 /path/s-11kb_6_R2.fq.gz
--mp-1 /path/s-11kb_7_R1.fq.gz
--mp-2 /path/s-11kb_7_R2.fq.gz
--mp-1 /path/s-11kb_8_R1.fq.gz
--mp-2 /path/s-11kb_8_R2.fq.gz
-1 /path/s-400bp_1_R1.fq.gz
-2 /path/s-400bp_1_R2.fq.gz
-1 /path/s-400bp_2_R1.fq.gz
-2 /path/s-400bp_2_R2.fq.gz
-1 /path/s-400bp_3_R1.fq.gz
-2 /path/s-400bp_3_R2.fq.gz
-1 /path/s-400bp_4_R1.fq.gz
-2 /path/s-400bp_4_R2.fq.gz
-1 /path/s-400bp_5_R1.fq.gz
-2 /path/s-400bp_5_R2.fq.gz
-1 /path/s-400bp_6_R1.fq.gz
-2 /path/s-400bp_6_R2.fq.gz
-1 /path/s-400bp_7_R1.fq.gz
-2 /path/s-400bp_7_R2.fq.gz
-1 /path/s-400bp_8_R1.fq.gz
-2 /path/s-400bp_8_R2.fq.gz
-1 /path/s-500bp_1_R1.fq.gz
-2 /path/s-500bp_1_R2.fq.gz
-1 /path/s-500bp_2_R1.fq.gz
-2 /path/s-500bp_2_R2.fq.gz
-1 /path/s-500bp_3_R1.fq.gz
-2 /path/s-500bp_3_R2.fq.gz
-1 /path/s-500bp_4_R1.fq.gz
-2 /path/s-500bp_4_R2.fq.gz
-1 /path/s-500bp_5_R1.fq.gz
-2 /path/s-500bp_5_R2.fq.gz
-1 /path/s-500bp_6_R1.fq.gz
-2 /path/s-500bp_6_R2.fq.gz
-1 /path/s-500bp_7_R1.fq.gz
-2 /path/s-500bp_7_R2.fq.gz
-1 /path/s-500bp_8_R1.fq.gz
-2 /path/s-500bp_8_R2.fq.gz
--mp-1 /path/s-8kb_1_R1.fq.gz
--mp-2 /path/s-8kb_1_R2.fq.gz
--mp-1 /path/s-8kb_2_R1.fq.gz
--mp-2 /path/s-8kb_2_R2.fq.gz
--mp-1 /path/s-8kb_3_R1.fq.gz
--mp-2 /path/s-8kb_3_R2.fq.gz
--mp-1 /path/s-8kb_4_R1.fq.gz
--mp-2 /path/s-8kb_4_R2.fq.gz
--mp-1 /path/s-8kb_5_R1.fq.gz
--mp-2 /path/s-8kb_5_R2.fq.gz
--mp-1 /path/s-8kb_6_R1.fq.gz
--mp-2 /path/s-8kb_6_R2.fq.gz
--mp-1 /path/s-8kb_7_R1.fq.gz
--mp-2 /path/s-8kb_7_R2.fq.gz
--mp-1 /path/s-8kb_8_R1.fq.gz
--mp-2 /path/s-8kb_8_R2.fq.gz

Can't locate the Minia-pipeline multi kmer script

From the manual:

"For proper assembly, we rec- ommend that you use the Minia-pipeline that runs Minia multiple times, with an iterative multi-k algorithm.โ€

Do you know where this script is?

how to set sveral kmer

Question1:
I use this command 'minia-v3.2.1-bin-Linux/bin/minia -in trim_cat.fq -kmer-size 101 -out-dir minia_assembly -max-memory 2048000' to assembly. But there was some error as followed in picture.
Question2:
I want to set kmer as 31,41,51,61, but the '-kmer-size 31,41,51,61' was not useful. could you help me? Thank you very much!

Best wishes,
Shirley

strange segfault

I'm testing minia using ~400 input fastq.gz files. I observed a strange segfault and was immediately curious if there might be something dependent on the number of input files. The input is ~60G or so, around what might be normal for a lower coverage human assembly, but derived from a reduced representation of the genome (this is for Capsicum, and we're using "genotyping-by-sequencing" data).

Here's the error log:

Minia 3, git commit efef7c7                                                                                                                                    [907/1955]
bglue_algo params, prefix:dummy.unitigs.fa k:5 threads:32
debug: not deleting glue files
setting storage type to hdf5
[Approximating frequencies of minimizers ]  100  %   elapsed:   1 min 34 sec   remaining:   0 min 0  sec   cpu:  99.8 %   mem: [  36,   36,  123] MB
[DSK: nb solid kmers found : 84023707    ]  100  %   elapsed:  49 min 34 sec   remaining:   0 min 0  sec   cpu: 330.7 %   mem: [1866, 6918, 6952] MB
bcalm_algo params, prefix:pepper_pangenome_k51_m3.unitigs.fa k:51 a:3 minsize:10 threads:32 mintype:1
DSK used 1 passes and 608 partitions
prior to queues allocation                      14:29:18     memory [current, maxRSS]: [1863, 6952] MB
Starting BCALM2                                 14:29:18     memory [current, maxRSS]: [1863, 6952] MB
[Iterating DSK partitions                ]  0    %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec
Iterated 711514 kmers, among them 47212 were doubled

In this superbucket (containing 2872 active minimizers),
                  sum of time spent in lambda's: 5177.6 msecs
                                 longest lambda: 21.5 msecs
         tot time of best scheduling of lambdas: 5177.6 msecs
                       best theoretical speedup: 240.6x
Done with partition 0                           14:29:19     memory [current, maxRSS]: [1926, 6952] MB
[Iterating DSK partitions                ]  9.87 %   elapsed:   0 min 35 sec   remaining:   5 min 17 sec
Iterated 332394 kmers, among them 19685 were doubled
Loaded 14123 doubled kmers for partition 61

In this superbucket (containing 539 active minimizers),
                  sum of time spent in lambda's: 2474.9 msecs
                                 longest lambda: 32.3 msecs
         tot time of best scheduling of lambdas: 2474.9 msecs
                       best theoretical speedup: 76.6x
Done with partition 61                          14:29:53     memory [current, maxRSS]: [2018, 6952] MB
[Iterating DSK partitions                ]  19.7 %   elapsed:   0 min 51 sec   remaining:   3 min 29 sec
Iterated 221450 kmers, among them 15343 were doubled
Loaded 13248 doubled kmers for partition 122

In this superbucket (containing 603 active minimizers),
                  sum of time spent in lambda's: 1531.1 msecs
                                 longest lambda: 21.7 msecs
         tot time of best scheduling of lambdas: 1531.1 msecs
                       best theoretical speedup: 70.6x
Done with partition 122                         14:30:10     memory [current, maxRSS]: [2075, 6952] MB
[2]    20638 segmentation fault  minia -in kept_fastqs.txt -kmer-size 51 -abundance-min 3 -out
minia -in kept_fastqs.txt -kmer-size 51 -abundance-min 3 -out   10587.55s user 172.51s system 314% cpu 56:59.59 total

minia without "-out" starts running then dies with HDF5 error

% minia -in SRR1661276_R1.fastq.gz -out-dir minia

HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 0:
  #000: /builds/workspace/tool-minia-build-debian7-64bits-gcc-4.7/gatb-minia/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5F.c line 1500 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file

Does not do this if i add -out as well BUT then it doesn't put them in the -out-dir folder, but . instead.

This is the 2.0.7 binary.

Tag a new release

There has been a lot of changes since 2.0.7 - can you tag a new release for packaging?

Install fails

Dr. Chikhi,

I get an error when I run sh INSTALL

Running simple test... INSTALL: 17: ./simple_test.sh: [[: not found

I think there is a bug in the last line of the INSTALL script (an extra dot). When I corrected it, it seems to work.

An error occurred while running

But I know why

HDF5-DIAG: Error detected in HDF5 (1.8.18) thread 0:
#000: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Dio.c line 173 in H5Dread(): can't read data
major: Dataset
minor: Read failed
#1: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Dio.c line 554 in H5D__read(): can't read data
major: Dataset
minor: Read failed
#2: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Dchunk.c line 1875 in H5D__chunk_read(): unable to read raw data chunk
major: Low-level I/O
minor: Read failed
#3: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Dchunk.c line 2905 in H5D__chunk_lock(): data pipeline read failed
major: Data filters
minor: Filter operation failed
#4: /software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Z.c line 1372 in H5Z_pipeline(): filter returned failure during read
major: Data filters
minor: Read failed
#5:/software/minia-v3.2.1-Source/thirdparty/gatb-core/gatb-core/thirdparty/hdf5/src/H5Zdeflate.c line 125 in H5Z_filter_deflate(): inflate() failed
major: Data filters
minor: Unable to initialize object

Segmentation fault when using Minia 3

Hi there,

I'm using Minia 3 (git commit 099e154, installed a week ago) to assemble a large read set of 2.9T. I got the following segmentation fault:

created vector of hashes, size approx 43573 MB) 02:27:28 memory [current, maxRSS]: [63055, 63057] MB
pass 3/3, 2855611176 unique hashes written to disk, size 21786 MB 02:32:29 memory [current, maxRSS]: [22333, 63057] MB
loaded all unique UF elements (8566991033) into a single vector of size 65360 MB 02:36:12 memory [current, maxRSS]: [87694, 87694] MB
[Building BooPHF] 100 % elapsed: 4 min 48 sec remaining: 0 min 0 sec
Bitarray 40352161792 bits (100.00 %) (array + ranks )
final hash 0 bits (0.00 %) (nb in final hash 0)
UF MPHF constructed (4810 MB) 02:41:03 memory [current, maxRSS]: [25397, 93822] MB
/var/spool/slurm/slurmd/spool/job3088264/slurm_script: line 10: 50997 Segmentation fault (core dumped) minia -in 3_QinN -nb-cores 30 -kmer-size 19 -out QinN

Seems that it's not a memory issue. Full log file is attached.

The command I used: minia -in 3_QinN -nb-cores 30 -kmer-size 19 -out QinN

Really appreciate it if anyone can help!
Minia.log

Paired-end read support

Hi,
Any plans to implement paired-end read support? Have you ever tried Flash or
shuffleReadsfasta.pl from Velvet?

Thank you in advance.

Mic

Assembly Graph

Is it possible to export the assembly graph in GFA format?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.