Comments (9)
Hi! Does this happen during query or construction? Is there any error message? If so, could you paste it here?
Are you using --file-colors by any chance? That option will use GGCAT to construct the graph, but we just found out yesterday that there might be a deadlock in GGCAT. We have submitted an issue to the developers of GGCAT: algbio/ggcat#18. Options --sequence-colors and --manual-colors are not affected by this.
from themisto.
subprocess.CalledProcessError: Command 'build_index --k 31 --n-threads 128 --input-file test/klebs/ref.fasta.gz --auto-colors --index-dir test/klebs/ref_idx --temp-dir test/klebs/ref_idx/tmp' returned non-zero exit status 1.
(mGEMS3) build_index --k 31 --n-threads 128 --input-file test/klebs/ref.fasta.gz --auto-colors --index-dir test/klebs/ref_idx --temp-dir test/klebs/ref_idx/tmp
0.0420 Wed Feb 1 15:12:59 2023 Themisto-v1.2.0
0.0420 Wed Feb 1 15:12:59 2023 Maximum k-mer length (size of the de Bruijn graph node labels): 63
Input file = test/klebs/ref.fasta.gz
Input format = gzip
Index directory = test/klebs/ref_idx
Temporary directory = test/klebs/ref_idx/tmp
k = 31
Number of threads = 128
Memory megabytes = 1000
Automatic colors = true
Load BOSS = false
Preprocessing buffer size = 4096
0.0440 Wed Feb 1 15:12:59 2023 Starting
0.0440 Wed Feb 1 15:12:59 2023 Decompressing the input file
2.4960 Wed Feb 1 15:13:02 2023 Making all characters upper case and replacing non-{A,C,G,T} characters with random characters from {A,C,G,T}
8.2380 Wed Feb 1 15:13:07 2023 Replaced 1751985 characters
8.2380 Wed Feb 1 15:13:07 2023 Building BOSS
8.2380 Wed Feb 1 15:13:07 2023 Building KMC database
Validating input alphabet
Calling KMC with: kmc -b -fm -k32 -m1 -ci1 -cs1 -cx4294967295 -t128 test/klebs/ref_idx/tmp/seqs-J0AGuvoYp4NSGksi1wKUxfrRr test/klebs/ref_idx/tmp/KMCgCilginfe6FO3RZfzVMhRrzlJ test/klebs/ref_idx/tmp
Warning: number of threads is reduced to 64 (maximun numer of threads equals 64 * value of the -m parameter)
Stage 1: 100%*** Error in `build_index': double free or corruption (out): 0x00002aaae18b9010 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81329)[0x2aaaab8c0329]
build_index(_ZN4CKMCILj1EE7ProcessEv+0x157c)[0x5e4c7c]
build_index(_Z8call_kmciPPc+0x3535)[0x592155]
build_index(_Z11KMC_wrapperlllSsSsb+0x9ba)[0x592c4a]
build_index(_ZN8Themisto14construct_bossESslllb+0xd2)[0x587002]
build_index(_Z5main2iPPc+0x2cf8)[0x52a448]
build_index(main+0xd3)[0x516bf3]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaab861555]
build_index[0x517fbe]
======= Memory map: ========
00400000-00507000 r--p 00000000 00:2c 283253988 /projects/ciwars/pathogen_annotation/pathogen_workdir/221123_themisto/build_index
00507000-0089b000 r-xp 00107000 00:2c 283253988 /projects/ciwars/pathogen_annotation/pathogen_workdir/221123_themisto/build_index
0089b000-00933000 r--p 0049b000 00:2c 283253988 /projects/ciwars/pathogen_annotation/pathogen_workdir/221123_themisto/build_index
00933000-0093c000 r--p 00532000 00:2c 283253988 /projects/ciwars/pathogen_annotation/pathogen_workdir/221123_themisto/build_index
0093c000-0093e000 rw-p 0053b000 00:2c 283253988 /projects/ciwars/pathogen_annotation/pathogen_workdir/221123_themisto/build_index
0093e000-0277a000 rw-p 00000000 00:00 0 [heap]
2aaaaaaab000-2aaaaaacd000 r-xp 00000000 08:03 12699918 /usr/lib64/ld-2.17.so
2aaaaaacd000-2aaaaaacf000 r-xp 00000000 00:00 0 [vdso]
2aaaaaacf000-2aaaaaad1000 rw-p 00000000 00:00 0
2aaaaaaee000-2aaaaaaf3000 rw-p 00000000 00:00 0
2aaaaaccc000-2aaaaaccd000 r--p 00021000 08:03 12699918 /usr/lib64/ld-2.17.so
2aaaaaccd000-2aaaaacce000 rw-p 00022000 08:03 12699918 /usr/lib64/ld-2.17.so
2aaaaacce000-2aaaaaccf000 rw-p 00000000 00:00 0
2aaaaaccf000-2aaaaace4000 r-xp 00000000 08:03 14401574 /usr/lib64/libz.so.1.2.7
2aaaaace4000-2aaaaaee3000 ---p 00015000 08:03 14401574 /usr/lib64/libz.so.1.2.7
2aaaaaee3000-2aaaaaee4000 r--p 00014000 08:03 14401574 /usr/lib64/libz.so.1.2.7
2aaaaaee4000-2aaaaaee5000 rw-p 00015000 08:03 14401574 /usr/lib64/libz.so.1.2.7
2aaaaaee5000-2aaaaaefc000 r-xp 00000000 08:03 13360674 /usr/lib64/libpthread-2.17.so
2aaaaaefc000-2aaaab0fb000 ---p 00017000 08:03 13360674 /usr/lib64/libpthread-2.17.so
2aaaab0fb000-2aaaab0fc000 r--p 00016000 08:03 13360674 /usr/lib64/libpthread-2.17.so
2aaaab0fc000-2aaaab0fd000 rw-p 00017000 08:03 13360674 /usr/lib64/libpthread-2.17.so
2aaaab0fd000-2aaaab101000 rw-p 00000000 00:00 0
2aaaab101000-2aaaab126000 r-xp 00000000 08:03 13011462 /usr/lib64/libgomp.so.1.0.0
2aaaab126000-2aaaab325000 ---p 00025000 08:03 13011462 /usr/lib64/libgomp.so.1.0.0
2aaaab325000-2aaaab326000 r--p 00024000 08:03 13011462 /usr/lib64/libgomp.so.1.0.0
2aaaab326000-2aaaab327000 rw-p 00025000 08:03 13011462 /usr/lib64/libgomp.so.1.0.0
2aaaab327000-2aaaab428000 r-xp 00000000 08:03 13146652 /usr/lib64/libm-2.17.so
2aaaab428000-2aaaab627000 ---p 00101000 08:03 13146652 /usr/lib64/libm-2.17.so
2aaaab627000-2aaaab628000 r--p 00100000 08:03 13146652 /usr/lib64/libm-2.17.so
2aaaab628000-2aaaab629000 rw-p 00101000 08:03 13146652 /usr/lib64/libm-2.17.so
2aaaab629000-2aaaab63e000 r-xp 00000000 08:03 12954121 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
2aaaab63e000-2aaaab83d000 ---p 00015000 08:03 12954121 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
2aaaab83d000-2aaaab83e000 r--p 00014000 08:03 12954121 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
2aaaab83e000-2aaaab83f000 rw-p 00015000 08:03 12954121 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
2aaaab83f000-2aaaaba03000 r-xp 00000000 08:03 12889094 /usr/lib64/libc-2.17.so
2aaaaba03000-2aaaabc02000 ---p 001c4000 08:03 12889094 /usr/lib64/libc-2.17.so
2aaaabc02000-2aaaabc06000 r--p 001c3000 08:03 12889094 /usr/lib64/libc-2.17.so
2aaaabc06000-2aaaabc08000 rw-p 001c7000 08:03 12889094 /usr/lib64/libc-2.17.so
2aaaabc08000-2aaad96b8000 rw-p 00000000 00:00 0
2aaae18b9000-2aaae57ba000 rw-p 00000000 00:00 0
2aaae8bd4000-2aaae8bd5000 ---p 00000000 00:00 0
2aaae8bd5000-2aaae8dd5000 rw-p 00000000 00:00 0
2aaae8dd5000-2aaae8dd6000 ---p 00000000 00:00 0
2aaae8dd6000-2aaae8fd6000 rw-p 00000000 00:00 0
2aaae8fd6000-2aaae8fd7000 ---p 00000000 00:00 0
2aaae8fd7000-2aaae91d7000 rw-p 00000000 00:00 0
2aaae91d7000-2aaae91d8000 ---p 00000000 00:00 0
2aaae91d8000-2aaae93d8000 rw-p 00000000 00:00 0
2aaae93d8000-2aaae93d9000 ---p 00000000 00:00 0
2aaae93d9000-2aaae95d9000 rw-p 00000000 00:00 0
2aaae95d9000-2aaae95da000 ---p 00000000 00:00 0
2aaae95da000-2aaae97da000 rw-p 00000000 00:00 0
2aaae97da000-2aaae97db000 ---p 00000000 00:00 0
2aaae97db000-2aaae99db000 rw-p 00000000 00:00 0
2aaae99db000-2aaae99dc000 ---p 00000000 00:00 0
2aaae99dc000-2aaae9bdc000 rw-p 00000000 00:00 0
2aaae9bdc000-2aaae9bdd000 ---p 00000000 00:00 0
2aaae9bdd000-2aaae9ddd000 rw-p 00000000 00:00 0
2aaae9ddd000-2aaae9dde000 ---p 00000000 00:00 0
2aaae9dde000-2aaae9fde000 rw-p 00000000 00:00 0
2aaae9fde000-2aaae9fdf000 ---p 00000000 00:00 0
2aaae9fdf000-2aaaea1df000 rw-p 00000000 00:00 0
2aaaea1df000-2aaaea1e0000 ---p 00000000 00:00 0
2aaaea1e0000-2aaaea3e0000 rw-p 00000000 00:00 0
2aaaea3e0000-2aaaea3e1000 ---p 00000000 00:00 0
2aaaea3e1000-2aaaea5e1000 rw-p 00000000 00:00 0
2aaaea5e1000-2aaaea5e2000 ---p 00000000 00:00 0
2aaaea5e2000-2aaaea7e2000 rw-p 00000000 00:00 0
2aaaea7e2000-2aaaea7e3000 ---p 00000000 00:00 0
2aaaea7e3000-2aaaea9e3000 rw-p 00000000 00:00 0
2aaaea9e3000-2aaaea9e4000 ---p 00000000 00:00 0
2aaaea9e4000-2aaaeabe4000 rw-p 00000000 00:00 0
2aaaeabe4000-2aaaeabe5000 ---p 00000000 00:00 0
2aaaeabe5000-2aaaeade5000 rw-p 00000000 00:00 0
2aaaeade5000-2aaaeade6000 ---p 00000000 00:00 0
2aaaeade6000-2aaaeb336000 rw-p 00000000 00:00 0
2aaaeb336000-2aaaeb337000 ---p 00000000 00:00 0
2aaaeb337000-2aaaeb537000 rw-p 00000000 00:00 0
2aaaeb738000-2aaaeb739000 ---p 00000000 00:00 0
2aaaeb739000-2aaaeb939000 rw-p 00000000 00:00 0
2aaaf4000000-2aaaf4021000 rw-p 00000000 00:00 0
2aaaf4021000-2aaaf8000000 ---p 00000000 00:00 0
2aaaf8000000-2aaaf8021000 rw-p 00000000 00:00 0
2aaaf8021000-2aaafc000000 ---p 00000000 00:00 0
2aaafc000000-2aaafc021000 rw-p 0000000caught signal: 6
Cleaning up temporary files
Aborting
from themisto.
Hi Connor, I would try running Themisto with more memory (you can control this via the --mem-megas
option) or using fewer threads. The error you're getting suggests that there is an issue with memory allocation and KMC is giving an error related to the number of threads: Warning: number of threads is reduced to 64 (maximun numer of threads equals 64 * value of the -m parameter)
so this is likely the culprit.
Also it looks like demix_check hasn't been updated in a while and it's using a somewhat old version of Themisto. I'll have to contact the devs about updating it once we finish our release of Themisto 3.0, which is much more efficient than the v1.x.x or v2.x.x.
from themisto.
Ah-- this makes sense-- thank you! will now just use demix_check check and manually do the initial steps. Will update when I confirm this fixes it
Connor
from themisto.
One follow-up, though, just in case you see this before I just replicate what was in the NComs paper-- v2.1 requires a color file for building an index on a set of mixed references. Does 1 color = 1 distinct genome, e.g., 1 color for 1 E. coli isolate? Thus, do I need to provide 1 color to all contigs belonging to a given reference genome? or is it ok to have multiple colors for different contigs of the same genome?
from themisto.
adjusting the memory worked, thank you! edited the demix_check/reference.py l89 to adjust memory and posted suggestion to demix_check. closing for now.
from themisto.
Great to hear that worked!
In the NComs paper the indexes had 1 color for 1 isolate, which in practice means that all the contigs in the same genome have the same color.
from themisto.
oof-- ok-- so increasing memory stopped the error from happening, but building the index has taken days for ~5,000 genomes. I get that that's a lot of genomes, but the computation time is a real problem. is the only option to reduce the number of genomes?
from themisto.
That doesn't sound right. If you're using v2.1 it should take much less than 5 days. For reference the 14500 E. coli genomes reference in the Ncoms paper took less than a day with v2.1 using 20 threads, and v3.0 - which is being prepared for release - can index the same data in ~4h.
The runtime in v2.1 is mainly governed by 3 factors: the number of threads, the amount of RAM, and the speed of your hard drive storage. If Themisto runs out of RAM when indexing it resorts to using hard disk space (provided via the --temp-dir argument). This can slow things down if --temp-dir is not on an SSD, so increasing the amount of memory by as much as you can could help bring the runtime down.
from themisto.
Related Issues (20)
- Odd behavior if input multifasta to `build` contains empty sequences HOT 1
- Loading a precomputed de Bruijn graph requires `--node-length` parameter despite contrary info in the documentation HOT 1
- segmentation fault when trying to build without using the -c argument HOT 11
- Randomize non-ACGT and then build colors by --load-dbg may fail?
- BOSS_TEST.construction hangs sometimes
- Compilation issue HOT 9
- core dumped during sorting KMC database HOT 4
- Output is not sorted with --sort-output HOT 2
- Themisto hangs with invalid input format HOT 2
- terminate called after throwing an instance of 'std::runtime_error' what(): BUG: dead end in get_color_set_id HOT 13
- Wrong version number printed HOT 1
- option to print genome labels from index? HOT 5
- minor documentation issue: need to "git clone" for themisto installation instructions to work
- installation error with rust on macOS HOT 12
- Incorrect number of read alignments returned HOT 12
- Throw error if manual color ids are not integers
- stochastic bug causes themisto build to hang indefinitely (themisto_linux-v3.2.1) HOT 3
- Recommended usage for metatranscriptomics HOT 2
- [Feature Request] Bioconda installation HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from themisto.