mklarqvist / tomahawk Goto Github PK
View Code? Open in Web Editor NEWFast calculations of linkage-disequilibrium in large-scale human cohorts
Home Page: https://mklarqvist.github.io/tomahawk/
License: MIT License
Fast calculations of linkage-disequilibrium in large-scale human cohorts
Home Page: https://mklarqvist.github.io/tomahawk/
License: MIT License
I'm trying to use Tomahawk on CentOS 7. Compilation is successful, but the resulting binary crashes immediately after execution:
$ ./tomahawk
terminate called after throwing an instance of 'std::regex_error'
what(): regex_error
zsh: abort ./tomahawk
I am not using the provided install.sh
script, instead using the following library versions as provided by CentOS 7:
I notice you're using a specific commit hash for htslib in install.sh
. Does tomahawk depend on certain versions of any of these libraries? Thanks!
What are the rules for an acceptable number of chunks?
I'm having a hard time figuring it out from load_balancer_ld.cpp
I'm having trouble compiling this on Linux. I tried both your dockerfile docker build -t tomahawk .
and on our local system (centOS7.8 with gcc 7.3.1) and received the same error. It looks like a problem with the way headers are being included.
g++ -std=c++0x -O3 -msse4.2 -I../htslib/ -I./include/ -I./lib/ -I/usr/local/include/ -c -DVERSION=\"beta-0.7.1\" -o lib/header_internal.o lib/header_internal.cpp
In file included from lib/header_internal.cpp:1:0:
lib/header_internal.h:16:44: error: expected class-name before '{' token
class VcfHeaderInternal : public VcfHeader {
^
lib/header_internal.h:37:28: error: 'string' in namespace 'std' does not name a type
void AddSample(const std::string& sample_name);
^
lib/header_internal.cpp: In function 'const bcf_hrec_t* tomahawk::GetPopulatedHrec(const bcf_idpair_t&)':
lib/header_internal.cpp:12:2: error: 'cerr' is not a member of 'std'
std::cerr << "No populated hrec in idPair. Error in htslib." << std::endl;
^
lib/header_internal.cpp:12:66: error: 'endl' is not a member of 'std'
std::cerr << "No populated hrec in idPair. Error in htslib." << std::endl;
...
best,
Jared
Tomahawk will not compile on a target architectures with < SSE4.2
as Cloudflare ZLIB
requires the _mm_crc32_u32
intrinsic first described in SSE4.2
.
We are currently considering to revert this implementation back to standard ZLIB
Getting this error: tomahawk: error while loading shared libraries: libhts.so.2: cannot open shared object file: No such file or directory
Hoping someone has a fix.
I'm trying to extract .two format data in native binary format, however this always yields a text file:
tomahawk view -i in.calc_sorted.two -B -I tig00000855 -a 0 -f 2 >tig00000855.two
How do I output binary TWO format?
Hi,
I'm trying to import a vcf file tmp_subset.vcf.gz and getting the following error:
$ tomahawk import -i out/tmp_subset.vcf -o tmp/test.twk
Program: tomahawk-beta-0.7.1-dirty (Tools for computing, querying and storing LD data)
Libraries: tomahawk-0.7.0; ZSTD-1.5.6; htslib 1.13+ds
Contact: Marcus D. R. Klarqvist [email protected]
Documentation: https://github.com/mklarqvist/tomahawk
License: MIT
[2024-05-29 10:33:41,791][LOG] Calling import...
[2024-05-29 10:33:41,792][LOG][READER] Opening out/tmp_subset.vcf...
Segmentation fault (core dumped)
The VCF file format looks OK to me so I began trying to use gdb to diagnose the problem:
$ gdb --args tomahawk import -i out/tmp_subset.vcf -o tmp/test.twk
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from tomahawk...
(gdb) break tomahawk::VcfHeaderInternal::AddContigInfo
Breakpoint 1 at 0x3ef80: file lib/header_internal.cpp, line 17.
Could you help with this?
I've been trying tomahawk with diffferent VCF data sets. For one set, it worked very well and the results looked great. When I tried another VCF, for the same region, but with more samples, everything ran fine at the "tomahawk" import step but it did not generate a "twk" file. It did not throw an error message either.
The command run was (also tried with default options):
tomahawk import -I locus.bcf -o locus_snp -m 0.2 -h 0.001
The output for the good set has these messages:
Program: tomahawk beta-0.6.1
Contact: Marcus D. R. Klarqvist <[email protected]>
Documentation: https://github.com/mklarqvist/tomahawk
License: MIT
----------
[2018-11-08 10:31:01,989][LOG] Calling import...
[2018-11-08 10:31:01,997][LOG][VCF] Constructing lookup table for 603 contigs...
[2018-11-08 10:31:01,998][LOG][RLE] Samples: 86 > 15... Skip
[2018-11-08 10:31:01,998][LOG][RLE] Samples: 86 < 4095...
[2018-11-08 10:31:01,998][LOG][RLE] Using 16-bit width...
[2018-11-08 10:31:01,998][LOG][WRITER] Opening: locus_snp.twk...
[2018-11-08 10:31:02,129][LOG][WRITER] Wrote: 2,576 variants to 6 blocks...
The messages for the bigger data set are:
Program: tomahawk beta-0.6.1
Contact: Marcus D. R. Klarqvist <[email protected]>
Documentation: https://github.com/mklarqvist/tomahawk
License: MIT
----------
[2018-11-08 10:32:18,162][LOG] Calling import...
[2018-11-08 10:32:18,164][LOG][VCF] Constructing lookup table for 603 contigs...
Apparently, tomahawk didn't import any variants.
The VCF files came from GATK pipeline. Both VCFs are sliced at the same locus of a 2Mbp region, the working one with 86 samples, and not working one with 353 samples.
Could you take a look at this?
We have permanently removed the Link Time Optimisation flags from the make
files as there is a gcc bug triggering an internal compilation error in versions <= 4.9.2. Users seeking to squeeze out some additional performance may want to update their gcc builds and add these compiler flags back
Hi,
I'm trying to import a bcf file that was generated by first converting a GATK vcf to bcf with bcftools. I'm getting the following error:
Program: tomahawk-beta-0.7.1 (Tools for computing, querying and storing LD data)
Libraries: tomahawk-0.7.0; ZSTD-1.4.0; htslib 1.9
Contact: Marcus D. R. Klarqvist <[email protected]>
Documentation: https://github.com/mklarqvist/tomahawk
License: MIT
----------
[2019-05-20 16:53:15,426][LOG] Calling import...
[2019-05-20 16:53:15,426][LOG][READER] Opening snp.bcf...
[2019-05-20 16:53:15,433][LOG][VCF] Constructing lookup table for 608 contigs...
[2019-05-20 16:53:15,434][LOG][VCF] Samples: 56...
[2019-05-20 16:53:15,434][LOG][WRITER] Opening snp.twk...
00000000
00001010
tomahawk: lib/core.cpp:117: void tomahawk::twk1_t::calculateHardyWeinberg(): Assertion `ref == 0 || ref == 1 || ref == 4 || ref == 5' failed.
Aborted (core dumped)
The SNPs seem to meet the expectations of the program. I'm not entirely sure what's going wrong here. Please let me know if additional info would be useful.
First of all, thank you for the amazing software!
I was able to convert all bcf files but one, so I guess the input file is the problem. Nonetheless, the error message is a bit cryptic for me:
$ ../tomahawk/tomahawk import -i BR.bcf -o BR
program: tomahawk-beta-0.7.1 (Tools for computing, querying and storing LD data)
Libraries: tomahawk-0.7.0; ZSTD-1.5.1; htslib 1.14
Contact: Marcus D. R. Klarqvist <[email protected]>
Documentation: https://github.com/mklarqvist/tomahawk
License: MIT
----------
[2021-12-28 12:10:21,629][LOG] Calling import...
[2021-12-28 12:10:21,629][LOG][READER] Opening BR.bcf...
[2021-12-28 12:10:21,637][LOG][VCF] Constructing lookup table for 22 contigs...
[2021-12-28 12:10:21,637][LOG][VCF] Samples: 171...
[2021-12-28 12:10:21,637][LOG][WRITER] Opening BR.twk...00000000
00000001
00000000
00000001
00000000
00000001
00000000
00000001
00000000
00000101
00000000
00000001
00000000
00000001
00000101
00000001
00000101
00000000
00000001
00000000
...
Assertion failed: (ref == 0 || ref == 1 || ref == 4 || ref == 5), function calculateHardyWeinberg, file lib/core.cpp, line 117.
Abort trap: 6
I have circumvented by commenting lines 116-117 in core.cpp and then compiling again, since I know there are no HW equilibrium issues in this population. Anyhow, it would be nice to know what is causing this problem. Should I send my .bcf file to [email protected]?
Thank you
Hi,
Is there a citation/reference for the math tomahawk does on unphased genotypes to compute LD?
I was alerted to your package recently and it looks extremely valuable, congratulations!
I did have a couple of feature requests, apologies if this is already implemented I didn't see the documentation.
This project is very exciting! Thanks for sharing the code.
I'm just trying to follow along with your examples, and I ran into a problem.
I wonder if you can provide any advice for debugging?
These steps work OK:
wget -nc http://s3.amazonaws.com/1000genomes/release/20101123/interim_phase1_release/ALL.chr21.phase1.projectConsensus.genotypes.vcf.gz
vcfgz=ALL.chr21.phase1.projectConsensus.genotypes.vcf.gz
bcf=ALL.chr21.phase1.projectConsensus.genotypes.bcf
prefix=${vcfgz%%.vcf.gz}
tabix -p vcf ALL.chr21.phase1.projectConsensus.genotypes.vcf.gz
bcftools convert --output-type b --threads 8 --output ${prefix}.bcf $vcfgz
tomahawk import -i $bcf -o $prefix -n 0.2 -h 0.001
ls -lha ALL.chr21*
-rw-rw-r-- 1 slowikow srlab 328M Feb 2 21:04 ALL.chr21.phase1.projectConsensus.genotypes.bcf
-rw-rw-r-- 1 slowikow srlab 17M Feb 2 21:17 ALL.chr21.phase1.projectConsensus.genotypes.twk
-rw-rw-r-- 1 slowikow srlab 29K Feb 2 21:17 ALL.chr21.phase1.projectConsensus.genotypes.twk.twi
-rw-rw-r-- 1 slowikow srlab 301M May 23 2012 ALL.chr21.phase1.projectConsensus.genotypes.vcf.gz
-rw-rw-r-- 1 slowikow srlab 34K Feb 2 20:59 ALL.chr21.phase1.projectConsensus.genotypes.vcf.gz.tbi
Calculating LD failed:
tomahawk calc -pdi ALL.chr21.phase1.projectConsensus.genotypes.twk -o ALL.chr21.phase1.projectConsensus.genotypes -a 5 -r 0.1 -P 0.1 -c 990 -C 1 -t 28
Program: tomahawk beta-0.2-3-g850e04d5-master
Contact: Marcus D. R. Klarqvist <[email protected]>
Documentation: https://github.com/mklarqvist/Tomahawk
License: MIT
----------
[2018-02-02 21:19:12,375][LOG] Calling calc...
[2018-02-02 21:19:13,563][LOG][TOTEMPOLE] Found: 496 blocks...
[2018-02-02 21:19:13,563][LOG][TOTEMPOLE] Found: 1 contigs and 1,094 samples...
[2018-02-02 21:19:13,563][LOG][TOTEMPOLE] Found: 476,154 variants...
[2018-02-02 21:19:13,564][LOG][RLE] Samples: 1094 > 15... Skip
[2018-02-02 21:19:13,564][LOG][RLE] Samples: 1094 < 4095...
[2018-02-02 21:19:13,564][LOG][RLE] Using 16-bit width...
[2018-02-02 21:19:13,564][LOG][BALANCER] Case is diagonal (chunk 0/990)...
[2018-02-02 21:19:13,564][LOG][BALANCER] Total comparisons: 66 and per thread: 2
Segmentation fault (core dumped)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.