Comments (12)
I should note that this machine has 256G of RAM and I don't appear to have any limitations as far as disk space.
from minia.
Hi Erik!
Curious, and that's definitely not tied to the number of fastq files, as at this stage minia only considers the counted kmers and no longer the input files. That's the first time I see this stage fail, it could be due to some special unhandled case in the graph structure.
I'd be happy to assist with debugging.
- Can the data be sent by any chance?
- Does it complete with a different kmer size?
from minia.
It does complete with a shorter kmer size (41) and the same abundance. I would love to share the data but I will need to ask my collaborators. I will send you an email if I can share. I definitely understand that's the only way to really resolve this problem. Interestingly the k=41 assembly has very few contigs (in the order of a few hundred kb) while the unitig set is several hundred mb. Also this data is a little strange, it's going to consist of many small fragments by library design (GBS/RadSeq).
from minia.
I see. Well, I'll keep an eye for that email, otherwise, just let me know if you encounter that bug again in another dataset. I'd like to get a sense of whether this is a one-of-a-kind thing.
from minia.
Hello,
I am also experiencing a segmentation fault of minia at the Iterating DSK partitions step at k41 (see the log file below). I am using a computer cluster with nearly 2TB of RAM and a large disk space. Minia runs without an issue with the test GATB dataset. Any help?
Thanks!
(2018-08-13 22:18:53) GATB-pipeline starting
(2018-08-13 22:18:53) Command line: /home/pavel/bin/gatb-minia-pipeline/gatb -1 /home/pavel/Run_1837_Tyrophagus_putrescentiae_Mex/Sample_79070/2a-removeDuplicates_clumpify/79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_Results/79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.fastq.gz -2 /home/pavel/Run_1837_Tyrophagus_putrescentiae_Mex/Sample_79070/2a-removeDuplicates_clumpify/79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_Results/79070_TGACCA_S1_L003_R2_001_bbduk_no_adapters_deduplicated.fastq.gz -o 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly
(2018-08-13 22:18:53) Setting maximum kmer length to: 151 bp
(2018-08-13 22:18:53) Multi-k values and cutoffs: [(21, 2), (41, 2), (61, 2), (81, 2), (101, 2), (121, 2), (141, 2)]
(2018-08-13 22:18:53) Minia assembling at k=21 min_abundance=2
(2018-08-13 22:18:53) Execution of 'minia/minia'. Command line:
/home/pavel/bin/gatb-minia-pipeline/tools/memused /home/pavel/bin/gatb-minia-pipeline/minia/minia -in 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly.list_reads -kmer-size 21 -abundance-min 2 -out 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly_k21
(2018-08-14 03:36:01) Finished Minia k=21
(2018-08-14 03:36:01) Minia assembling at k=41 min_abundance=2
(2018-08-14 03:36:01) Execution of 'minia/minia'. Command line:
/home/pavel/bin/gatb-minia-pipeline/tools/memused /home/pavel/bin/gatb-minia-pipeline/minia/minia -in 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly.list_reads -kmer-size 41 -abundance-min 2 -out 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly_k41
(2018-08-14 05:06:33) Execution of 'minia/minia' failed. Command line:
/home/pavel/bin/gatb-minia-pipeline/tools/memused /home/pavel/bin/gatb-minia-pipeline/minia/minia -in 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly.list_reads -kmer-size 41 -abundance-min 2 -out 79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.assembly_k41
pavel@sagarana:~/Run_1837_Tyrophagus_putrescentiae_Mex/Sample_79070/5d-gatb-pipeline/79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated> cat 0-gatb_79070_TGACCA_S1_L003_R1_001_bbduk_no_adapters_deduplicated.pbs.e108512
[Approximating frequencies of minimizers ] 100 % elapsed: 0 min 48 sec remaining: 0 min 0 sec cpu: 99.9 % mem: [ 20, 20, 20] MB
[DSK: nb solid kmers found : 595537654 ] 212 % elapsed: 71 min 3 sec remaining: 0 min 0 sec cpu: 1813.0 % mem: [7845, 8032, 8050] MB
[Iterating DSK partitions ] 99.7 % elapsed: 21 min 6 sec remaining: 0 min 4 sec
[Building BooPHF] 100 % elapsed: 0 min 11 sec remaining: 0 min 0 sec
[removing tips, pass 1 ] 100 % elapsed: 8 min 49 sec remaining: 0 min 0 sec cpu: 25282.3 % mem: [28646, 28646, 33965] MB
[removing tips, pass 2 ] 100 % elapsed: 0 min 39 sec remaining: 0 min 0 sec cpu: 24498.1 % mem: [24143, 28675, 33965] MB
[removing tips, pass 3 ] 100 % elapsed: 0 min 21 sec remaining: 0 min 0 sec cpu: 7850.8 % mem: [24082, 24134, 33965] MB
[removing tips, pass 4 ] 100 % elapsed: 0 min 22 sec remaining: 0 min 0 sec cpu: 8396.4 % mem: [24082, 24082, 33965] MB
[removing tips, pass 5 ] 100 % elapsed: 0 min 21 sec remaining: 0 min 0 sec cpu: 7820.0 % mem: [24082, 24082, 33965] MB
[removing bulges, pass 1 ] 100 % elapsed: 10 min 29 sec remaining: 0 min 0 sec cpu: 22194.7 % mem: [24194, 24194, 33965] MB
[removing bulges, pass 2 ] 100 % elapsed: 9 min 7 sec remaining: 0 min 0 sec cpu: 23147.3 % mem: [23158, 24203, 33965] MB
[removing bulges, pass 3 ] 100 % elapsed: 9 min 10 sec remaining: 0 min 0 sec cpu: 23453.0 % mem: [22105, 23163, 33965] MB
[removing bulges, pass 4 ] 100 % elapsed: 9 min 12 sec remaining: 0 min 0 sec cpu: 23583.9 % mem: [21399, 22098, 33965] MB
[removing bulges, pass 5 ] 100 % elapsed: 9 min 44 sec remaining: 0 min 0 sec cpu: 23581.3 % mem: [20617, 21390, 33965] MB
[removing ec, pass 1 ] 100 % elapsed: 5 min 10 sec remaining: 0 min 0 sec cpu: 24824.4 % mem: [21343, 21343, 33965] MB
[removing ec, pass 2 ] 100 % elapsed: 1 min 4 sec remaining: 0 min 0 sec cpu: 20397.3 % mem: [20668, 21375, 33965] MB
[removing ec, pass 3 ] 100 % elapsed: 0 min 27 sec remaining: 0 min 0 sec cpu: 22962.5 % mem: [20167, 20698, 33965] MB
[removing ec, pass 4 ] 100 % elapsed: 0 min 27 sec remaining: 0 min 0 sec cpu: 23310.6 % mem: [19924, 20170, 33965] MB
[removing ec, pass 5 ] 100 % elapsed: 0 min 28 sec remaining: 0 min 0 sec cpu: 23301.5 % mem: [19860, 19925, 33965] MB
[removing tips, pass 6 ] 100 % elapsed: 0 min 55 sec remaining: 0 min 0 sec cpu: 25142.7 % mem: [19974, 19974, 33965] MB
[removing bulges, pass 6 ] 100 % elapsed: 1 min 36 sec remaining: 0 min 0 sec cpu: 22626.6 % mem: [19948, 20002, 33965] MB
[removing ec, pass 6 ] 100 % elapsed: 0 min 27 sec remaining: 0 min 0 sec cpu: 22708.7 % mem: [19920, 19947, 33965] MB
[removing tips, pass 7 ] 100 % elapsed: 0 min 21 sec remaining: 0 min 0 sec cpu: 24399.3 % mem: [19909, 19921, 33965] MB
[removing bulges, pass 7 ] 100 % elapsed: 1 min 37 sec remaining: 0 min 0 sec cpu: 23080.5 % mem: [19853, 19911, 33965] MB
[removing ec, pass 7 ] 100 % elapsed: 0 min 26 sec remaining: 0 min 0 sec cpu: 23476.6 % mem: [19837, 19848, 33965] MB
[removing tips, pass 8 ] 100 % elapsed: 0 min 16 sec remaining: 0 min 0 sec cpu: 8285.7 % mem: [19837, 19841, 33965] MB
[removing bulges, pass 8 ] 100 % elapsed: 1 min 37 sec remaining: 0 min 0 sec cpu: 23144.2 % mem: [19833, 19837, 33965] MB
[removing ec, pass 8 ] 100 % elapsed: 0 min 26 sec remaining: 0 min 0 sec cpu: 23530.7 % mem: [19827, 19828, 33965] MB
[removing tips, pass 9 ] 100 % elapsed: 0 min 15 sec remaining: 0 min 0 sec cpu: 8222.7 % mem: [19826, 19827, 33965] MB
[removing bulges, pass 9 ] 100 % elapsed: 1 min 36 sec remaining: 0 min 0 sec cpu: 23186.3 % mem: [19831, 19831, 33965] MB
[removing ec, pass 9 ] 100 % elapsed: 0 min 27 sec remaining: 0 min 0 sec cpu: 23680.5 % mem: [19825, 19826, 33965] MB
[removing tips, pass 10 ] 100 % elapsed: 0 min 16 sec remaining: 0 min 0 sec cpu: 8274.7 % mem: [19825, 19826, 33965] MB
[removing bulges, pass 10 ] 100 % elapsed: 1 min 36 sec remaining: 0 min 0 sec cpu: 23208.0 % mem: [19829, 19829, 33965] MB
[removing ec, pass 10 ] 100 % elapsed: 0 min 28 sec remaining: 0 min 0 sec cpu: 23634.8 % mem: [19825, 19825, 33965] MB
[Minia : assembly ] 100 % elapsed: 8 min 6 sec remaining: 0 min 0 sec cpu: 100.2 % mem: [19798, 19798, 33965] MB
[Approximating frequencies of minimizers ] 100 % elapsed: 1 min 17 sec remaining: 0 min 0 sec cpu: 99.8 % mem: [ 28, 28, 28] MB
[DSK: nb solid kmers found : 678092331 ] 201 % elapsed: 88 min 9 sec remaining: 0 min 0 sec cpu: 1630.6 % mem: [7642, 7848, 7871] MB
[Iterating DSK partitions ] 0 % elapsed: 0 min 0 sec remaining: 0 min 0 sec/home/pavel/bin/gatb-minia-pipeline/tools/memused: line 18: 249751 Segmentation fault "$@"
from minia.
Hi Mozart, thanks for reporting it. I'm assuming you cannot share the data either, therefore I'd be curious to see if the problem occurs with a different k-mer combinations. Can you please try the following command line? ./gatb --kmer-sizes 31,51,71 -1 [..] -2 [..] -o [..]
from minia.
from minia.
Hi Pavel,
Very nice! Please email me at [email protected]. Any way to get the data is fine, if you can share it on your end. Otherwise I'll send a shared dropbox link.
Rayan
from minia.
HI Rayan,
I have a similar problem with DSK(core dumped), the log is the following:
Minia 3, git commit 4b32fec
setting storage type to hdf5
[Approximating frequencies of minimizers ] 100 % elapsed: 2 min 45 sec remaining: 0 min 0 sec cpu: 99.9 % mem: [ 25, 25, 25] MB
[DSK: Pass 1/1, Step 2: counting kmers ] 73 % elapsed: 145 min 33 sec remaining: 53 min 49 sec cpu: 553.2 % mem: [36580, 36580, 36580] MB /home/adigenova/binaries/gatb-minia-pipeline/tools/memused: line 18: 33019 Segmentation fault (core dumped) "$@"
maximal memory used: 41023 MB
(2018-09-18 19:12:07) Execution of 'minia/minia' failed. Command line:
/home/adigenova/binaries/gatb-minia-pipeline/tools/memused /home/adigenova/binaries/gatb-minia-pipeline/minia/minia -in MMK.list_reads -kmer-size 61 -abundance-min 2 -out MMK_k61 -max-memory 35000
The command line that I used was the following:
~/binaries/gatb-minia-pipeline/gatb -l sequences.txt --nb-cores 20 --max-memory 35000 -o MMK --kmer-sizes 31,61,91 --abundance-mins 2,2,2
and the sequences.txt file contains the following reads:
stLFR.split_read.1.fq.gz
stLFR.split_read.2.fq.gz
the reads are from GIAB:
ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/stLFR/stLFR.split_read.1.fq.gz
ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/stLFR/stLFR.split_read.2.fq.gz
Let me know if you need more information to reproduce the error.
Thank in advance.
The best
Alex
from minia.
Hi Alex,
Thanks for the very detailed bug report and sorry for the answer delay. Can you reproduce this problem on another machine? Unfortunately I cannot, the pipeline finished without crashing on my server using the command line you provided, on the master branch of gatb-pipeline repo.
from minia.
from minia.
After offline discussion with Alex it seems that the problem he reported was related to higher memory usage than what was available on the system.
I'm closing this thread as it's aggregating a bunch of unrelated segfault problems.
from minia.
Related Issues (20)
- Assembly Graph HOT 2
- using 10X data HOT 1
- Is 3.2.0 fast as before ? HOT 3
- Install fails HOT 1
- minor typo HOT 1
- Add -version flag HOT 1
- Minia 3, git commit fatal: not a git repository HOT 1
- Can't locate the Minia-pipeline multi kmer script HOT 4
- Segmentation fault when using Minia 3 HOT 7
- how to set sveral kmer HOT 4
- multithread with minia HOT 2
- An error occurred while running HOT 3
- What about Haplotype reconstruction and Minia HOT 1
- -max-memory ignored HOT 4
- multiple input files HOT 1
- Is minia suitable for high heterozygous rate plant genome? HOT 5
- minia doesn't assemble high-coverage regions
- Log Message HOT 1
- minia generates non-deterministic assemblies? HOT 2
- Can I use minia to assemble rna-seq data? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from minia.