GithubHelp home page GithubHelp logo

Comments (9)

jnalanko avatar jnalanko commented on August 11, 2024

Hi! Does this happen during query or construction? Is there any error message? If so, could you paste it here?

Are you using --file-colors by any chance? That option will use GGCAT to construct the graph, but we just found out yesterday that there might be a deadlock in GGCAT. We have submitted an issue to the developers of GGCAT: algbio/ggcat#18. Options --sequence-colors and --manual-colors are not affected by this.

from themisto.

clb21565 avatar clb21565 commented on August 11, 2024

subprocess.CalledProcessError: Command 'build_index --k 31 --n-threads 128 --input-file test/klebs/ref.fasta.gz --auto-colors --index-dir test/klebs/ref_idx --temp-dir test/klebs/ref_idx/tmp' returned non-zero exit status 1.

(mGEMS3) build_index --k 31 --n-threads 128 --input-file test/klebs/ref.fasta.gz --auto-colors --index-dir test/klebs/ref_idx --temp-dir test/klebs/ref_idx/tmp

0.0420 Wed Feb 1 15:12:59 2023 Themisto-v1.2.0
0.0420 Wed Feb 1 15:12:59 2023 Maximum k-mer length (size of the de Bruijn graph node labels): 63
Input file = test/klebs/ref.fasta.gz
Input format = gzip
Index directory = test/klebs/ref_idx
Temporary directory = test/klebs/ref_idx/tmp
k = 31
Number of threads = 128
Memory megabytes = 1000
Automatic colors = true
Load BOSS = false
Preprocessing buffer size = 4096
0.0440 Wed Feb 1 15:12:59 2023 Starting
0.0440 Wed Feb 1 15:12:59 2023 Decompressing the input file
2.4960 Wed Feb 1 15:13:02 2023 Making all characters upper case and replacing non-{A,C,G,T} characters with random characters from {A,C,G,T}
8.2380 Wed Feb 1 15:13:07 2023 Replaced 1751985 characters
8.2380 Wed Feb 1 15:13:07 2023 Building BOSS
8.2380 Wed Feb 1 15:13:07 2023 Building KMC database
Validating input alphabet
Calling KMC with: kmc -b -fm -k32 -m1 -ci1 -cs1 -cx4294967295 -t128 test/klebs/ref_idx/tmp/seqs-J0AGuvoYp4NSGksi1wKUxfrRr test/klebs/ref_idx/tmp/KMCgCilginfe6FO3RZfzVMhRrzlJ test/klebs/ref_idx/tmp
Warning: number of threads is reduced to 64 (maximun numer of threads equals 64 * value of the -m parameter)


Stage 1: 100%*** Error in `build_index': double free or corruption (out): 0x00002aaae18b9010 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x81329)[0x2aaaab8c0329]
build_index(_ZN4CKMCILj1EE7ProcessEv+0x157c)[0x5e4c7c]
build_index(_Z8call_kmciPPc+0x3535)[0x592155]
build_index(_Z11KMC_wrapperlllSsSsb+0x9ba)[0x592c4a]
build_index(_ZN8Themisto14construct_bossESslllb+0xd2)[0x587002]
build_index(_Z5main2iPPc+0x2cf8)[0x52a448]
build_index(main+0xd3)[0x516bf3]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaab861555]
build_index[0x517fbe]
======= Memory map: ========
00400000-00507000 r--p 00000000 00:2c 283253988 /projects/ciwars/pathogen_annotation/pathogen_workdir/221123_themisto/build_index
00507000-0089b000 r-xp 00107000 00:2c 283253988 /projects/ciwars/pathogen_annotation/pathogen_workdir/221123_themisto/build_index
0089b000-00933000 r--p 0049b000 00:2c 283253988 /projects/ciwars/pathogen_annotation/pathogen_workdir/221123_themisto/build_index
00933000-0093c000 r--p 00532000 00:2c 283253988 /projects/ciwars/pathogen_annotation/pathogen_workdir/221123_themisto/build_index
0093c000-0093e000 rw-p 0053b000 00:2c 283253988 /projects/ciwars/pathogen_annotation/pathogen_workdir/221123_themisto/build_index
0093e000-0277a000 rw-p 00000000 00:00 0 [heap]
2aaaaaaab000-2aaaaaacd000 r-xp 00000000 08:03 12699918 /usr/lib64/ld-2.17.so
2aaaaaacd000-2aaaaaacf000 r-xp 00000000 00:00 0 [vdso]
2aaaaaacf000-2aaaaaad1000 rw-p 00000000 00:00 0
2aaaaaaee000-2aaaaaaf3000 rw-p 00000000 00:00 0
2aaaaaccc000-2aaaaaccd000 r--p 00021000 08:03 12699918 /usr/lib64/ld-2.17.so
2aaaaaccd000-2aaaaacce000 rw-p 00022000 08:03 12699918 /usr/lib64/ld-2.17.so
2aaaaacce000-2aaaaaccf000 rw-p 00000000 00:00 0
2aaaaaccf000-2aaaaace4000 r-xp 00000000 08:03 14401574 /usr/lib64/libz.so.1.2.7
2aaaaace4000-2aaaaaee3000 ---p 00015000 08:03 14401574 /usr/lib64/libz.so.1.2.7
2aaaaaee3000-2aaaaaee4000 r--p 00014000 08:03 14401574 /usr/lib64/libz.so.1.2.7
2aaaaaee4000-2aaaaaee5000 rw-p 00015000 08:03 14401574 /usr/lib64/libz.so.1.2.7
2aaaaaee5000-2aaaaaefc000 r-xp 00000000 08:03 13360674 /usr/lib64/libpthread-2.17.so
2aaaaaefc000-2aaaab0fb000 ---p 00017000 08:03 13360674 /usr/lib64/libpthread-2.17.so
2aaaab0fb000-2aaaab0fc000 r--p 00016000 08:03 13360674 /usr/lib64/libpthread-2.17.so
2aaaab0fc000-2aaaab0fd000 rw-p 00017000 08:03 13360674 /usr/lib64/libpthread-2.17.so
2aaaab0fd000-2aaaab101000 rw-p 00000000 00:00 0
2aaaab101000-2aaaab126000 r-xp 00000000 08:03 13011462 /usr/lib64/libgomp.so.1.0.0
2aaaab126000-2aaaab325000 ---p 00025000 08:03 13011462 /usr/lib64/libgomp.so.1.0.0
2aaaab325000-2aaaab326000 r--p 00024000 08:03 13011462 /usr/lib64/libgomp.so.1.0.0
2aaaab326000-2aaaab327000 rw-p 00025000 08:03 13011462 /usr/lib64/libgomp.so.1.0.0
2aaaab327000-2aaaab428000 r-xp 00000000 08:03 13146652 /usr/lib64/libm-2.17.so
2aaaab428000-2aaaab627000 ---p 00101000 08:03 13146652 /usr/lib64/libm-2.17.so
2aaaab627000-2aaaab628000 r--p 00100000 08:03 13146652 /usr/lib64/libm-2.17.so
2aaaab628000-2aaaab629000 rw-p 00101000 08:03 13146652 /usr/lib64/libm-2.17.so
2aaaab629000-2aaaab63e000 r-xp 00000000 08:03 12954121 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
2aaaab63e000-2aaaab83d000 ---p 00015000 08:03 12954121 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
2aaaab83d000-2aaaab83e000 r--p 00014000 08:03 12954121 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
2aaaab83e000-2aaaab83f000 rw-p 00015000 08:03 12954121 /usr/lib64/libgcc_s-4.8.5-20150702.so.1
2aaaab83f000-2aaaaba03000 r-xp 00000000 08:03 12889094 /usr/lib64/libc-2.17.so
2aaaaba03000-2aaaabc02000 ---p 001c4000 08:03 12889094 /usr/lib64/libc-2.17.so
2aaaabc02000-2aaaabc06000 r--p 001c3000 08:03 12889094 /usr/lib64/libc-2.17.so
2aaaabc06000-2aaaabc08000 rw-p 001c7000 08:03 12889094 /usr/lib64/libc-2.17.so
2aaaabc08000-2aaad96b8000 rw-p 00000000 00:00 0
2aaae18b9000-2aaae57ba000 rw-p 00000000 00:00 0
2aaae8bd4000-2aaae8bd5000 ---p 00000000 00:00 0
2aaae8bd5000-2aaae8dd5000 rw-p 00000000 00:00 0
2aaae8dd5000-2aaae8dd6000 ---p 00000000 00:00 0
2aaae8dd6000-2aaae8fd6000 rw-p 00000000 00:00 0
2aaae8fd6000-2aaae8fd7000 ---p 00000000 00:00 0
2aaae8fd7000-2aaae91d7000 rw-p 00000000 00:00 0
2aaae91d7000-2aaae91d8000 ---p 00000000 00:00 0
2aaae91d8000-2aaae93d8000 rw-p 00000000 00:00 0
2aaae93d8000-2aaae93d9000 ---p 00000000 00:00 0
2aaae93d9000-2aaae95d9000 rw-p 00000000 00:00 0
2aaae95d9000-2aaae95da000 ---p 00000000 00:00 0
2aaae95da000-2aaae97da000 rw-p 00000000 00:00 0
2aaae97da000-2aaae97db000 ---p 00000000 00:00 0
2aaae97db000-2aaae99db000 rw-p 00000000 00:00 0
2aaae99db000-2aaae99dc000 ---p 00000000 00:00 0
2aaae99dc000-2aaae9bdc000 rw-p 00000000 00:00 0
2aaae9bdc000-2aaae9bdd000 ---p 00000000 00:00 0
2aaae9bdd000-2aaae9ddd000 rw-p 00000000 00:00 0
2aaae9ddd000-2aaae9dde000 ---p 00000000 00:00 0
2aaae9dde000-2aaae9fde000 rw-p 00000000 00:00 0
2aaae9fde000-2aaae9fdf000 ---p 00000000 00:00 0
2aaae9fdf000-2aaaea1df000 rw-p 00000000 00:00 0
2aaaea1df000-2aaaea1e0000 ---p 00000000 00:00 0
2aaaea1e0000-2aaaea3e0000 rw-p 00000000 00:00 0
2aaaea3e0000-2aaaea3e1000 ---p 00000000 00:00 0
2aaaea3e1000-2aaaea5e1000 rw-p 00000000 00:00 0
2aaaea5e1000-2aaaea5e2000 ---p 00000000 00:00 0
2aaaea5e2000-2aaaea7e2000 rw-p 00000000 00:00 0
2aaaea7e2000-2aaaea7e3000 ---p 00000000 00:00 0
2aaaea7e3000-2aaaea9e3000 rw-p 00000000 00:00 0
2aaaea9e3000-2aaaea9e4000 ---p 00000000 00:00 0
2aaaea9e4000-2aaaeabe4000 rw-p 00000000 00:00 0
2aaaeabe4000-2aaaeabe5000 ---p 00000000 00:00 0
2aaaeabe5000-2aaaeade5000 rw-p 00000000 00:00 0
2aaaeade5000-2aaaeade6000 ---p 00000000 00:00 0
2aaaeade6000-2aaaeb336000 rw-p 00000000 00:00 0
2aaaeb336000-2aaaeb337000 ---p 00000000 00:00 0
2aaaeb337000-2aaaeb537000 rw-p 00000000 00:00 0
2aaaeb738000-2aaaeb739000 ---p 00000000 00:00 0
2aaaeb739000-2aaaeb939000 rw-p 00000000 00:00 0
2aaaf4000000-2aaaf4021000 rw-p 00000000 00:00 0
2aaaf4021000-2aaaf8000000 ---p 00000000 00:00 0
2aaaf8000000-2aaaf8021000 rw-p 00000000 00:00 0
2aaaf8021000-2aaafc000000 ---p 00000000 00:00 0
2aaafc000000-2aaafc021000 rw-p 0000000caught signal: 6
Cleaning up temporary files
Aborting

from themisto.

tmaklin avatar tmaklin commented on August 11, 2024

Hi Connor, I would try running Themisto with more memory (you can control this via the --mem-megas option) or using fewer threads. The error you're getting suggests that there is an issue with memory allocation and KMC is giving an error related to the number of threads: Warning: number of threads is reduced to 64 (maximun numer of threads equals 64 * value of the -m parameter) so this is likely the culprit.

Also it looks like demix_check hasn't been updated in a while and it's using a somewhat old version of Themisto. I'll have to contact the devs about updating it once we finish our release of Themisto 3.0, which is much more efficient than the v1.x.x or v2.x.x.

from themisto.

clb21565 avatar clb21565 commented on August 11, 2024

Ah-- this makes sense-- thank you! will now just use demix_check check and manually do the initial steps. Will update when I confirm this fixes it
Connor

from themisto.

clb21565 avatar clb21565 commented on August 11, 2024

One follow-up, though, just in case you see this before I just replicate what was in the NComs paper-- v2.1 requires a color file for building an index on a set of mixed references. Does 1 color = 1 distinct genome, e.g., 1 color for 1 E. coli isolate? Thus, do I need to provide 1 color to all contigs belonging to a given reference genome? or is it ok to have multiple colors for different contigs of the same genome?

from themisto.

clb21565 avatar clb21565 commented on August 11, 2024

adjusting the memory worked, thank you! edited the demix_check/reference.py l89 to adjust memory and posted suggestion to demix_check. closing for now.

from themisto.

tmaklin avatar tmaklin commented on August 11, 2024

Great to hear that worked!

In the NComs paper the indexes had 1 color for 1 isolate, which in practice means that all the contigs in the same genome have the same color.

from themisto.

clb21565 avatar clb21565 commented on August 11, 2024

oof-- ok-- so increasing memory stopped the error from happening, but building the index has taken days for ~5,000 genomes. I get that that's a lot of genomes, but the computation time is a real problem. is the only option to reduce the number of genomes?

from themisto.

tmaklin avatar tmaklin commented on August 11, 2024

That doesn't sound right. If you're using v2.1 it should take much less than 5 days. For reference the 14500 E. coli genomes reference in the Ncoms paper took less than a day with v2.1 using 20 threads, and v3.0 - which is being prepared for release - can index the same data in ~4h.

The runtime in v2.1 is mainly governed by 3 factors: the number of threads, the amount of RAM, and the speed of your hard drive storage. If Themisto runs out of RAM when indexing it resorts to using hard disk space (provided via the --temp-dir argument). This can slow things down if --temp-dir is not on an SSD, so increasing the amount of memory by as much as you can could help bring the runtime down.

from themisto.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.