GithubHelp home page GithubHelp logo

genome / pindel Goto Github PK

View Code? Open in Web Editor NEW
162.0 53.0 89.0 92.05 MB

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.

License: GNU General Public License v3.0

Perl 2.44% SystemVerilog 38.57% Shell 0.16% C++ 56.74% C 0.09% Makefile 0.31% Python 1.68%

pindel's Introduction

Pindel

Compiling

To compile Pindel you need three things: GNU Make and GCC (which usually are already installed on Linux) and htslib. The last is not installed on Linux by default, but it can be retrieved with:

git clone https://github.com/samtools/htslib

htslib needs to be built before you can start installing Pindel. (Go to htslib's directory, and follow the directions in its 'readme'. At the time of writing this (February 2016) it simply works if you give the commands "make" and "sudo make install".)

To compile Pindel on OS X, you may need to do more work - 'regular' installation under OS X may work, but in some cases OS X gives problems with the OpenMP library Pindel uses for speedup. In those cases, please follow the instruction on the following page to update your gcc

http://www.ficksworkshop.com/blog/14-coding/65-installing-gcc-on-mac

If htslib has been cloned and installed, go to the pindel directory ([my-path]/pindel) and use the INSTALL script there in the following way:

./INSTALL [path-to-htslib]

for example

./INSTALL ../htslib

After this, you can run pindel by using

./pindel [options]

Plain "./pindel" without command line options will list all available command line options, the FAQ in the Pindel root directory includes a usage example.

If there are any problems with installing or running Pindel, you may be able to find the solution in the FAQ (the FAQ file stored in the same directory as the INSTALL script), otherwise, feel free to open an issue on github (https://github.com/genome/pindel/issues) or to contact the main author, Kai Ye, at [email protected]

pindel's People

Contributors

apregier avatar benoberkfell avatar dtrudg avatar ewlameijer avatar ewlameijer-xjtu avatar jasonwalker80 avatar jmarshall avatar liangkaiye avatar mkroon1 avatar nnutter avatar sleongmgi avatar tmooney avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pindel's Issues

bam2pindel / Adaptor uninitialized value error

My Illumina sequencing data was aligned using BWA, converted to a BAM file using samtools normally, and the BAM file was then sorted the way the program suggests. I installed Pindel normally, and its sample data runs to completion without errors. However, when attempting to run bam2pindel with the previously mentioned data, I receive the following error:

Forked pipe to run: /home/medhat/bin/samtools view sorted_ByName.bam
Error while processing: /home/medhat/bin/samtools view sorted_ByName.bam
EVAL ERROR: Use of uninitialized value in array element at /path/pindel/pindel/Adaptor.pm line 92, <$PROC> line 1.

CHILD ERROR: 0

Reads in buffer:
$VAR1 = [];

Broke at record 1
at bam2pindel.pl line 341
main::run_parser('HASH(0x18ac4e0)') called at bam2pindel.pl line 83

segfault in GetRealStart4Deletion

using 0.2.5a7 I get a segfault in GetRealStart4Deletion, happened when called with RealStart and RealEnd both 1, TheInput is 27731856 but starts with 101000 N characters

pindel build failing on htslib.so.2

I have an installed version of htslib, and I have made source and used the ./INSTALL ../htslib

./pindel: error while loading shared libraries: libhts.so.2: cannot open shared object file: No such file or directory

I would appreciate any help.
kindest regards,

STephen

Bug in samtools 0.1.18, request update to newer version

Samtools version 0.1.18 has a number of bugs that cause pindel to segfault under certain conditions (e.g. using a different reference genome than the one used to align the bam file) with no descriptive error messages or clear diagnostic information. Samtools 0.1.18 is out of date and is no longer maintained. Can pindel transition to work with newer versions of samtools that have been debugged and/or are actively maintained?

not compile with samtools 1.2

Dear dev,

While runing the install script:

$ samtools_HOME=/scratch/modules/samtools/1.2/bin/
$ ./INSTALL ${samtools_HOME}/bin

I got an error:

pindel.h:34:19: error: khash.h: No such file or directory

It worked fine with samtools 0.1.19.

Best wishes,
Fengyuan

Infinite loop if file does not exist

In file src/bddata.cpp line 42 (and elsewhere) a file is opened, but is not checked for existence beforehand, nor checked after open, e.g. CheckbdFileFirst.good()
If the file does not exist, an infinite loop condition occurs because CheckbdFileFirst.eof() is never true.

make test expected to fail?

I'm installing pindel on our HPC cluster and on running the make test it fails with TargetOutput/colowobd15_D and ActualOutput/colowobd15_D not identical.

Just want to check if make test is expected to succeed? I note in the git repo the TargetOutput hasn't been updated for several years, while the code has.

The program appears to run OK.

INSTALL - option to statically link htslib

Useful to be able to build the binary and then remove htslib. Normally possible to do this by just removing htslib/libhts.so* before compiling dependent tools but doesn't seem to work for pindel.

New Pindel is very aggressive

Hello Pindel community,

I've been playing around with the new pindel (0.2.5b8, 20151210) and am focusing on a set of 24 samples on a targeted panel. A preliminary analysis threw many more indels than an older version (0.2.5a3, Oct 24 2013) so I decided to do some comparative analysis.

In short I ran the two versions of Pindel on 24 samples in single-sample mode, while using similar parameter values. This meant that I ran the new Pindel using default parameters while changed the following ones for the older version:

--balance_cutoff 100
--minimum_support_for_event 1
--report_interchromosomal_events false
--max_range_index 1

An additional parameter was --min_distance_to_the_end which is not available in the older version. Peeking into an older branch however suggests the same default value of 8 (minClose in src/user_defined_settings.h)?

I later filtered the output for minimum 3 supporting reads and allele fraction >= 0.02 for both runs. The difference in the number of calls is substantial. I see 73115 calls in 24 samples using the new version and 29288 in the older one. Looking at some of the calls in the KIT gene I am certain I find more true positives AND also more false positives at the same time. Allele frequencies in general are bumped up and in some cases by a big margin.

Are there any other parameters I should watch out for? Is the newer pindel expected to throw a much larger number of calls?

samtools requirements

please include required samtools libs etc in your repo, so users can git clone and just make.
Makes life nice and easy.

pindel2vcf error: unknown argument: –so

I am using Pindel to identify indels from tumour vs control. I would like to get the somatic p_values for the INDELs called using -so.
(-so/--somatic_p compute somatic p value when two samples are present, assume the order is normal and tumor. (default false))

my code:

pindel2vcf -P ../pindelfile -r ../ref.fasta -R hg19 -d 20160905 -so TRUE -v ../outputfile.vcf

I keep getting the error: unknown argument: –so

Can someone please help with this?
Many thanks

Pindel does not compile with samtools from git

I am trying to install Pindel, using the following steps:

First compile samtools:

mkdir /mnt/nexenta/haars001/projects/yeast_evolution/ && cd $_
git clone https://github.com/samtools/samtools.git && git clone https://github.com/samtools/htslib.git && cd samtools && make && ./samtools --version && cd -

Which nicely returns samtools 0.2.0-rc6.

Then try to install Pindel:

git clone https://github.com/genome/pindel && cd pindel && ./INSTALL /mnt/nexenta/haars001/projects/yeast_evolution/samtools/

With that I get a lot of errors (see below), as khash.h can't be found, although it is there (in htslib) :

jvh@assembly:/mnt/nexenta/haars001/projects/yeast_evolution$ find . -name khash.h
./htslib/htslib/khash.h

If I use the latest samtools (0.1.19) from Sourceforge, Pindel compiles fine.


Compilation errors :

jvh@assembly:/mnt/nexenta/haars001/projects/yeast_evolution$ git clone https://github.com/genome/pindel && cd pindel && ./INSTALL /mnt/nexenta/haars001/projects/yeast_evolution/samtools/
Cloning into 'pindel'...
remote: Reusing existing pack: 3760, done.
remote: Total 3760 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3760/3760), 81.82 MiB | 777 KiB/s, done.
Resolving deltas: 100% (2830/2830), done.
Checking out files: 100% (161/161), done.
path is now: /mnt/nexenta/haars001/projects/yeast_evolution/samtools/
WARNING: Created default Makefile.local; please review it.
make: *** No rule to make target `Makefile.local', needed by `pindel'.  Stop.
If this is the first time you're running this install script please wait a moment as we create the Makefile.local
make -C src pindel
make[1]: Entering directory `/mnt/nexenta/haars001/projects/yeast_evolution/pindel/src'
In file included from control_state.h:29:0,
                 from genotyping.h:21,
                 from genotyping.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from assembly.h:21,
                 from assembly.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from search_MEI.cpp:15:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from search_MEI_util.h:26:0,
                 from search_MEI_util.cpp:21:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from fn_parameters.cpp:10:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from user_defined_settings.cpp:9:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from pindel_read_reader.h:13:0,
                 from pindel_read_reader.cpp:9:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from ifstream_line_reader.cpp:3:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from read_buffer.h:13:0,
                 from read_buffer.cpp:8:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from bddata.h:14,
                 from bddata.cpp:3:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from searchdeletions.cpp:22:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from searchshortinsertions.cpp:22:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from search_variant.cpp:29:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from farend_searcher.cpp:23:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from output_sorter.cpp:27:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_tandem_duplications_nt.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_tandem_duplications.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from bam2depth.cpp:11:0:
/mnt/nexenta/haars001/projects/yeast_evolution/samtools/bam.h:54:25: fatal error: htslib/bgzf.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_inversions_nt.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_inversions.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_deletions_nt.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from control_state.cpp:22:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from refreader.cpp:26:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from searcher.cpp:22:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.cpp:32:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reader.cpp:29:0:
/mnt/nexenta/haars001/projects/yeast_evolution/samtools/bam.h:54:25: fatal error: htslib/bgzf.h: No such file or directory
compilation terminated.
In file included from pindel.cpp:35:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from genotyping.h:21,
                 from genotyping.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from assembly.h:21,
                 from assembly.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from search_MEI.cpp:15:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from search_MEI_util.h:26:0,
                 from search_MEI_util.cpp:21:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from fn_parameters.cpp:10:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from user_defined_settings.cpp:9:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from pindel_read_reader.h:13:0,
                 from pindel_read_reader.cpp:9:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from ifstream_line_reader.cpp:3:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from read_buffer.h:13:0,
                 from read_buffer.cpp:8:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from bddata.h:14,
                 from bddata.cpp:3:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from searchdeletions.cpp:22:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from searchshortinsertions.cpp:22:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from search_variant.cpp:29:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from farend_searcher.cpp:23:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from output_sorter.cpp:27:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_tandem_duplications_nt.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_tandem_duplications.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from bam2depth.cpp:11:0:
/mnt/nexenta/haars001/projects/yeast_evolution/samtools/bam.h:54:25: fatal error: htslib/bgzf.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_inversions_nt.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_inversions.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_deletions_nt.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from control_state.cpp:22:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from refreader.cpp:26:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from searcher.cpp:22:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.cpp:32:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reader.cpp:29:0:
/mnt/nexenta/haars001/projects/yeast_evolution/samtools/bam.h:54:25: fatal error: htslib/bgzf.h: No such file or directory
compilation terminated.
In file included from pindel.cpp:35:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
make[1]: Leaving directory `/mnt/nexenta/haars001/projects/yeast_evolution/pindel/src'
make[1]: Entering directory `/mnt/nexenta/haars001/projects/yeast_evolution/pindel/src'
In file included from control_state.h:29:0,
                 from genotyping.h:21,
                 from genotyping.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from assembly.h:21,
                 from assembly.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from search_MEI.cpp:15:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from search_MEI_util.h:26:0,
                 from search_MEI_util.cpp:21:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from fn_parameters.cpp:10:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from user_defined_settings.cpp:9:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from pindel_read_reader.h:13:0,
                 from pindel_read_reader.cpp:9:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from ifstream_line_reader.cpp:3:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from read_buffer.h:13:0,
                 from read_buffer.cpp:8:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from bddata.h:14,
                 from bddata.cpp:3:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from searchdeletions.cpp:22:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from searchshortinsertions.cpp:22:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from search_variant.cpp:29:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from farend_searcher.cpp:23:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from output_sorter.cpp:27:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_tandem_duplications_nt.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_tandem_duplications.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from bam2depth.cpp:11:0:
/mnt/nexenta/haars001/projects/yeast_evolution/samtools/bam.h:54:25: fatal error: htslib/bgzf.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_inversions_nt.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_inversions.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_deletions_nt.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from control_state.cpp:22:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from refreader.cpp:26:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from searcher.cpp:22:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.cpp:32:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reader.cpp:29:0:
/mnt/nexenta/haars001/projects/yeast_evolution/samtools/bam.h:54:25: fatal error: htslib/bgzf.h: No such file or directory
compilation terminated.
In file included from pindel.cpp:35:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from genotyping.h:21,
                 from genotyping.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from assembly.h:21,
                 from assembly.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from search_MEI.cpp:15:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from search_MEI_util.h:26:0,
                 from search_MEI_util.cpp:21:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from fn_parameters.cpp:10:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from user_defined_settings.cpp:9:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from pindel_read_reader.h:13:0,
                 from pindel_read_reader.cpp:9:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from ifstream_line_reader.cpp:3:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from read_buffer.h:13:0,
                 from read_buffer.cpp:8:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from bddata.h:14,
                 from bddata.cpp:3:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from searchdeletions.cpp:22:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from searchshortinsertions.cpp:22:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from search_variant.cpp:29:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from farend_searcher.cpp:23:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from output_sorter.cpp:27:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_tandem_duplications_nt.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_tandem_duplications.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from bam2depth.cpp:11:0:
/mnt/nexenta/haars001/projects/yeast_evolution/samtools/bam.h:54:25: fatal error: htslib/bgzf.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_inversions_nt.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_inversions.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.h:27:0,
                 from search_deletions_nt.cpp:23:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from control_state.h:29:0,
                 from control_state.cpp:22:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from refreader.cpp:26:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from searcher.cpp:22:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reporter.cpp:32:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
In file included from reader.cpp:29:0:
/mnt/nexenta/haars001/projects/yeast_evolution/samtools/bam.h:54:25: fatal error: htslib/bgzf.h: No such file or directory
compilation terminated.
In file included from pindel.cpp:35:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
g++  -Wall -g -c -O3 -fopenmp -I/mnt/nexenta/haars001/projects/yeast_evolution/samtools/ pindel.cpp -o pindel.o
In file included from pindel.cpp:35:0:
pindel.h:34:19: fatal error: khash.h: No such file or directory
compilation terminated.
make[1]: *** [pindel.o] Error 1
make[1]: Leaving directory `/mnt/nexenta/haars001/projects/yeast_evolution/pindel/src'
make: *** [pindel] Error 2

INSTALL failed
Possible reasons:
1. 'cannot cd to [path]
->the samtools path provided was incorrect
(so '../samtools/' was used instead of '/home/user/samtools/')

2. 'cannot find -lbam'
->samtools was not properly installed, in that case go to the samtools directory
and run 'make'.

For futher help, see the pindel wiki and its FAQ on https://trac.nbic.nl/pindel/wiki/PindelFaq
Or contact us on [email protected]

pindel2vcf error: the -co setting

Now I'm using the pindel2vcf to change the file into .vcf, however there's a problem when it's running:

Warning! The SV at chromosome chr2, position 89075984 is of size 7525316. It won't be put in the VCF in base sequence detail, but in the format 'chrom pos firstrefbase '. This behaviour can be overridden by setting the -co parameter to -1 or to a larger value than the current SV size.

However when I set the -co to -1, another problem raises up:

argument of -co seems erroneous.
Required parameter -R/--reference_name The name and version of the reference genome needs to be set.
Required parameter -d/--reference_date The date of the version of the reference genome used needs to be set.

And when I set the -co to a large number such as 10000000, the first problem still exist.
Anyone knows how I can deal with it? Thx a lot.

a bug of Pindel : -p does not exist

Hello,
when I run "../pindel2vcf -r hs_ref_chr20.fa -R HUMAN_G1K_V2 -d 20100101 -P colontumor_E -e 5" on demo folder ,I met this bug:"The pindel file (-p) does not exist".

so what happened ?
Pindel version 0.2.5b6, 20150915.

Erreur de segmentation (core dumped)

Hi,

I am using Pindel version 0.2.5b9, 20160729 and I got a segmentation fault.
I have a genome with 2 chromosomes and it works fine with the first but when starting the 2nd, it throws an error right away:
pindel-f bEYY1013_combined.fa -i BC72_pindel.txt -o output_bEYY1013/bEYY1013

[...]
Looking at chromosome chr2 bases 1 to 1072315 of the bed region: chromosome chr2:1-1072315
Erreur de segmentation (core dumped)

I tried to take a smaller window down to 0.5 but same problem. I tried also -l false -k false but same problem.
My BC72_pindel.txt is just this (with tab separation):
BC72_bEYY1013_sorted.bam 300 BC72_DEL

have you ever seen this before?

pindel

What does the output pindel_RP refer to ?

Multiple samples

Dear Kai,

I have a question about the running with multiple samples. I found the memary usage was increased with sample numbers increased in configuration file, so I plan to input small number of samples in configuration file per run. I am wondering if the results by inputing multiple samples and single/few sample(s) will be different with pindel.

Thanks,
MIng Fang

[SOLVED] issue with compiling pindel on OS X

Hello! I am having some issues with compiling pindel. I have searched Google many times and read your FAQ hoping to resolve the issues, but to no luck.

I have re-installed the 6.1.0 versions of gcc and g++ via homebrew --without-multilib

I am on an OS X 10.11 El Capitan.
Tried to make sure I was using the newer version of gcc by the command:

[$alias g++='g++-6' 
$gcc='gcc-6', 


$cd src,
make clean
make CXX=g++-6, 

$./INSTALL ~/coolstuff/samtools-1.3.1/

path is now: /Users/elisaur/coolstuff/samtools-1.3.1/
make -C src pindel
c++  -I/Users/elisaur/coolstuff/htslib -Wall -g -c -O3 -fopenmp bddata.cpp -o bddata.o
clang: error: unsupported option '-fopenmp'
make[1]: *** [bddata.o] Error 1
make: *** [pindel] Error 2

mv: rename src/pindel to ./pindel: No such file or directory
mv: rename src/pindel2vcf to ./pindel2vcf: No such file or directory
mv: rename src/sam2pindel to ./sam2pindel: No such file or directory
mv: rename src/pindel2vcf4tcga to ./pindel2vcf4tcga: No such file or directory

Pindel successfully compiled. The pindel executable can be found in this directory.


Thanks for your help!](url)

Warn user when other reference genome is used for alignment

Dear Pindel team,

When I was running my analysis, I found the following error:

BAM file index  0
Bam file name   /data/700232000002.dedup.realign.bam
Number of split-reads so far    197901

The number of one end mapped read: 197901
There are 1507751 reads supporting the reference allele.
There are 1 samples.
SampleName2Index done
declaring g_RefCoverageRegion for 1 samples and 5000001 positions.
There are 197901 split-reads for this chromosome region.

There are 0 split-reads mapped by aligner.
search far ends
terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::at
/LTstorage/script.sh: line 4: 39014 Aborted                 (core dumped) '/usr/local/pindel/pindel-0.2.5b8/pindel' '--fasta' '/usr/local/Genomes/H.Sapiens/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna' '--config-file' '/LTstorage/700232000002.pindel.cfg' '--output-prefix' '/LTstorage/700232000002/sample' '--number_of_threads' '4'

(also reported here: https://www.biostars.org/p/174201/#174353)

After analysing the problem, it showed that I was using a wrong reference fasta with pindel.

To prevent such errors in the future, it would be nice to warn the user that the supplied --reference_fasta is not the same as used during alignment. Pindel should check on the sizes of the chromosomes (in de @SQ fields of the bamheader) and match them with the supplied --reference_fasta

Insert size with decimal points creates confusion in output

Hi,

I was not sure how important the insert size was in the algorithm, so I took the mean insert size calculated by breakdancer which has two digits after the decimal point. However, I just realized that pindel seemed to consider decimal points a separator, therefore, the two digits were actually regarded as sample id. This would be a problem for somatic filtering because we will have no idea which column is normal or tumor unless we went back to the config files because they were sorted in lexical order. Could you add a reminder in the manual or FAQ about not having decimal points for insert size?

Running in parallele (Per Chromosome) yields different Output

Hello Authors,

I am running Pindel on the 30X WGS illumina trio data, and running into a weird issue.
So I followed steps from the manual that is,
Step 1. Prepared pindel files from mapped BAM files using "/sam2pindel" say sample.wgs.pindel.txt
Step 2. Ran Pindel using "sample.wgs.pindel.txt"
Step 3. Finally pindel2vcf to get VCF

I tried two different ways to perform step1 & 2,
Method 1. Generate separate pindel file for each chromosomes in step1 and in step2 run pindel separately for Each Chromosome
Method 2. Generate a single pindel file in step1 and in step2 run pindel separately for Each Chromosome

I expected these both methods to yield identical results but it is not the case. With Method 2 I had almost twice the number Variants as compare to Method 1. Is it expected ?
What is the difference between these 2 methods ?
Which method is correct /preferred ?

Thanks,
Nick

segfaults when m_breakDancerMask is accessed outside itself

when -Q reporting breakdancer events is turned on pindel segfaults in BDData::isBreakDancerEvent in bddata.cpp, also reported by 'aguitarfreak' on biostarts. for both of us it's happening at m_breakDancerMask[ rawLeftPosition - m_currentWindow.getStart() ](or rawRightPosition - m_currentWindow.getStart%28%29) and rawLeft/RightPosition is outside m_breakDancerMask[ m_currentWindow.getStart ] array size.

isBreakDancerEvent: initialized : rawLeft 17734564 window.getstart: 15097000 rawRight: 17734644
isBreakDancerEvent: initialized : rawLeft 17888942 window.getstart: 15097000 rawRight: 17889750
isBreakDancerEvent: initialized : rawLeft 8017842 window.getstart: 15097000 rawRight: 18783914
[segfaults here of course]
for the biostars person it was rawLeft 4239970296 - window.getstart 55097000

Pindel Output files

Hi,
I have two output files ending in '_INT_final' and '_CloseEndMapped' which I cannot seem to find any information about. What are these files and what should they contain (both are empty)?

Thanks

release tarball without test and demo data

Hi

I am helping to develop bcbio-nextgen: https://github.com/chapmanb/bcbio-nextgen, and we like to add you tool as part of our workflow of structural variant detection.

We have an automatic deployment for the tools we used, and would like to add this one as well. We want to avoid to add test data as part of the intallation, and we would like to know if you would think about create a release of pindel without this folder, so we can create a homebrew formula for the installation of only the tool, and not the test data?

thanks a lot!

Question: Disagreement in the coordinates of VCF and internal formats for Pindel

I am observing strange discrepancies between the information present in the VCF created by Pindel and the internal file format. Here is what I did:

  • I downloaded Pindel from github 2 days ago with command:
    git clone https://github.com/genome/pindel.git

  • I ran pindel on a sample BAM file using as reference the human GRCh37 g1k_v37 decoy genome sequence.

  • I noticed that the VCF file does not have the support information, score, or quality measures, so I decided to recover it by merging the VCF file with the lines from the internal format. I recovered those lines with grep ChrID internal_file

  • However, not all the coordinates matched. For one particular case I had in the VCF:

1 10290621 . ATAGCTGGGATTACAGGTGTGTGCCACCACACCTGGTTAATTTTTGTATTTTTAATAGAGACGGGGTTTCACCGTGTTGGCTAGGCTGGTCTTGAT GTACTTGGGATTACTGGCGTACGCCACCACGCCCAGCTAATTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTCAACCAGGCTGGTCTCGAA . PASS END=10290715;HOMLEN=0;SVLEN=-96;SVTYPE=RPL;NTLEN=96 GT:AD 0/0:0,1

and the internal format:

3305 D 96 NT 96 "GTACTTGGGATTACTGGCGTACGCCACCACGCCCAGCTAATTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTCAACCAGGCTGGTCTCGAA" ChrID 1 BP 10290620 10290717 BP_range 10290620 10290717 Supports 1 1 + 0 0 - 1 1 S1 2 SUM_MS 99 1 NumSupSamples 1 1 pFDA_simTruth_76x_0.4_FEMALE 0 0 0 0 1 1
  • Notice that in the VCF the position for the variant 10290621 and the internal file says 10290620. I went to UCSC Genome Browser and checked that the REF sequence is 96 bp and starts at 10290621 and ends at 10290716.

  • So now I have the following discrepancies for the (begin, end): reference (10290621,10290716). VCF (10290621, 10290715). internal (10290620, 10290717)

  • I repeated the exercise with a line where the start position matches:

1 10289908 . GGAGTGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAAAGAAAATTAGGGGCCAGACGTGGTGGCTCACACCTATAATCCCAGC GGAGTGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAAAGAAAATTAGGGGCCAGACGTGGTGGCTCACACCTATAATCCCAGCTATTCAGGAGGCTGAGGCAGGAGAATCACTTGAACCCAGGAGGTGGAGGTTGCAGTGAGCTGAGATCGCACCACTGCACTCCAGCCTGGGTCACAGAGTGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAAAGAAAATTAGGGGCCAGACGTGGTGGCTCACACCTATAATCCCAGC . PASS END=10290003;HOMLEN=0;SVLEN=95;SVTYPE=DUP:TANDEM;NTLEN=95 GT:AD 0/0:0,1 168 TD 95 NT 95 "TATTCAGGAGGCTGAGGCAGGAGAATCACTTGAACCCAGGAGGTGGAGGTTGCAGTGAGCTGAGATCGCACCACTGCACTCCAGCCTGGGTCACA" ChrID 1 BP 10289908 10290004 BP_range 10289908 10290004 Supports 1 1 + 0 0 - 1 1 S1 2 SUM_MS 99 1 NumSupSamples 1 1 pFDA_simTruth_76x_0.4_FEMALE 0 0 0 0 1 1

and in this case the (begin, end) coordinates are reference (10289908, 10290003), VCF (10289908, 10290003), and internal format (10289908, 10290004)

  • I cannot make sense of it.
    • In the first case the VCF coordinates are wrong and the internal format coordinates seem to be flanking.
    • In the second case the VCF coordinates are correct but the internal format coordinates are not flanking.

The user manual does not explain anything of this, so I am clueless. Any help is appreciated.

Fix for segmentation fault if chromosome name not in reference fasta file

In file src/user_defined_settings.cpp lines 148 and 149,
m_endDefined = true; and correctParse = true; are defined after the loop, regardless of whether the sequence name was found. If the sequence name is not found, m_end will remain at 1 causing a later segmentation fault. They should only be set to true inside the loop if a match is found:
if (g_ChrNameAndSizeAndIndex[index].ChrName == m_targetChromosomeName) {
m_end = g_ChrNameAndSizeAndIndex[index].ChrSize;
m_endDefined = true;
correctParse = true;
}

How to call somatic Indels with using Pindel

HI ,
I try to use Pindel to call Pindel ,but until now ,I have find any useful information to call somatic Indels , can you tell me some details how to use Pindel to call Indels ? give me some pipline command ,I feel that will be better .
Now I have a tumor.sort.bam file and a normal.sort.bam file , how to write commands in order to call
somatic Indels(just Indels ) ????
Thank you very much !
@liangkaiye thanks !

unique supporting read counts in pindel >v0.2.5b5

I recently did a comparison of pindel v0.2.5b8 and v0.2.5a7. I noticed that for deletions, small insertions, and tandem duplications, the unique supporting read counts always match the total supporting read counts in v0.2.5b8, but this is not the case in v0.2.5a7. I believe the source of this issue is the below line in a commit from 7/26/15, which requires reads to have the same name in order to be considered duplicates. This line is in a function MarkDuplicates in reporter.cpp which is later used for SI, TD, and deletions/DI, but not inversions, which is consistent with the observed behavior.
33943f7#diff-535281b8d88bf426c63d1f7988dd461dR966

It's simple for me to delete the line and recompile, but I wanted to alert you to this issue and double check that I'm not misunderstanding something. Thank you for your help!

Bloated ref counts for variant size <= 2

Dear Kai and other Pindel authors,

Recently I have spent much time fixing allele fraction values for Pindel calls. This may apply equally to old and new versions of Pindel and depends only on the size of indel in question, being a problem only for indel sizes <= 2

Ref count in the isRefRead function is calculated based on edit distance value of individual reads. If edit distance <= NM (default of 2), the read is considered a reference read. This means a read with a 1bp indel (and no other edits) is also considered reference read. This bloats reference read counts and often homozygous nulls are reported as heterozygous.

I'm trying to fix this by calculating total read count (rather than ref read count) using the same code flow as for the alt allele read handling. I've duplicated the build_record_SR function where I'm including not just the "isWeirdRead" reads. I've duplicated the flush function too with the following change:

            if (m_rawreads[i].hasCloseEnd()) {
                    updateReadAfterCloseEndMapping(m_rawreads[i]);
            } // I update the read but irrespective of having a close end I add it to the total read count
                    #pragma omp critical
                    {
                            m_rawreads[i].SampleName2Number.insert(std::pair <std::string, unsigned> (m_rawreads[i].Tag, 1));
                            m_filteredReads.push_back(m_rawreads[i]);
                    }

In short, I update the read but, irrespective of having a close end I add it to the total read count. This helps me include both reference and alternate allele containing reads. However, many reads may get filtered out while calculating total alt count which are not being filtered out now. Hence I still get a small skew for AF values (peaks close to 40% and 80%).

Can somebody advise on how best I can modify the code base for a more accurate assessment of AF values. I'm happy to share my changes if I have something that is worth incorporating.

Thanks,
--Angad.

translocations only reported between 2 chromosomes

on my rice samples, translocations are only being reported between Chr1 and Chr10, in the stdout I see near equal numbers of lines like
adding interchr Chr2 25750649 25751533 884 Chr3 23349705 23350589
involving all chromosomes so i'm sure it's not a true result, the available reference is a different strain and translocations are expected.

I can make example data available but I don't have a good idea on how to cut the size down for translocations. a bam file is ~15gb

pindel_RP

Hello,

I understand the pindel_* output files represent different SV events. However, I am not sure what the pindel_RP file contains. Here's a sample:

$ head pindel_RP
I	4998701	4999851	-	1150	I	4999696	5000346	-	650	995	Support: 52		sample 52
I	735811	737111	+	1300	I	736678	737828	+	1150	867	Support: 6		sample 6
I	3602740	3604040	+	1300	I	3603403	3604553	+	1150	663	Support: 10		sample 10
I	4998701	4999851	-	1150	I	4999716	5000366	-	650	1015	Support: 22		sample 22
I	3602705	3604005	+	1300	I	3603538	3604688	+	1150	833	Support: 35		sample 35

Would appreciate clarification.

Thanks,
Ahmad

Running pindel individually per sample (1500 samples high coverage)

Hi everyone,

I would like to identify SVs using 1.5k whole genome samples using Pindel. It was impossible to run all them at once, I even tried with individual chromosomes but did not work.

After getting all output files per sample, I merged ouput files into one file and based on start, stop, chrID, Svtype and LengthOfSV: I have merged all SVs with supporting samples.

For examp :
chrId1 start1 stop1 lenght1 {sample x}
chrId2 start2 stop2 lenght2 {sample x}
chrId1 start1 stop1 lenght1 {sample y}

I merge them into :
chrId1 start1 stop1 lenght1 {sample x} {sample y} (merged)
chrId2 start2 stop2 lenght2 {sample x}

So far so good, it kinda works. But what I have figured it out is: there are several SVs I suppose they should be the same but because there are a few nucleotides difference (for example: start, stop, length of SV) I got them as if they are different structural variations.

Is there anyone who can give me some advices what should I do? Does the way that I do whether makes sense or not?

Best,
Mehmet

Pindel memory usage

Hello,

I am trying to call indels on a paired tumor/normal BAM file, and am running into very high memory usage issues (currently at 46Gb). I am wondering if this is a memory leak, or if there is some way I could tune the parameters to avoid this. I am running as follows:

python MakePinDelConfig.py -B <tum.bam> -N <norm.bam> -T TumorSample -S NormalSample -I 500

pindel -f <ref.fasta> -i bam_configuration_file.txt -o raw_pindel_format -c ALL --window_size 0.1 -E 0.99 -k true -l true -N true

Many thanks for any help / suggestions!
Jeremiah

Pindel-C

I saw your excellent paper in Nature Medicine (doi:10.1038/nm.4002) introducing Pindel-C, but it is unclear where to find this software.

I am currently using pindel 0.2.5b8, which is calling complex indels in my data. Are the algorithmic improvements of Pindel-C available in this version, or is Pindel-C a different package entirely? And if so, from where can I obtain it?

many thanks!

Allow index to be FILE.bai, not only FILE.bam.bai

My BAMs are indexed, and the files are named FILE.bam and FILE.bai. Running pindel gives this:

I cannot find the bam index-file 'FILE.bam.bai' that should accompany the file FILE.bam mentioned in the configuration file FILE.pindelconfig. Please run samtools index on FILE.bam.

It would be great if pindel could look for bai files with this naming convention, which is default coming out of GATK tools upstream.

Version 0.2.5b8 OUTPUT interpreting

How do I interpret the outputs of Pindle(verion 0.2.5b8). It's different from former version.
eg.:

0 D 1 NT 0 "" ChrID MT BP 317 319 BP_range 317 326 Supports 9 9 + 6 6 - 3 3 S1 28 SUM_MS 540 2 NumSupSamples 2 2 W-20G 4465 4465 4 4 1 1 W-C 2939 2939 2 2 2 2

The last few columns "W-20G 4465 4465 4 4 1 1 W-C 2939 2939 2 2 2 2", what "4465" means? 4465 reads mapping upstream of the Deletion?

Thanks in advance!

Does the depth impact the number of SVs(reverse)?

Dear Sir,

Before using pindel, I make a test, with 1 sample in 10X-50X

And what I find is that the number of SVs is decresing with the incresing of the sequencing depth.

Which distrubs me a lot, and I hope you can tell me why?

Thanks and I truly hope I will get the solution.

Small Test deprication

Are the results in the Small Test outdated? I am getting different results when I am running the commands inside the make file than are output into the TargetOutput file. If the results are no longer accurate, if you could make note of that, it would be appreciated.

https://github.com/genome/pindel/tree/master/test/SmallTest

However, when I run the commands and compare the results from the devtools run_regressiontests.sh, it matches.

Two weird problems about Pindel

Hi I'm running 21 samples at the same time using Pindel.
Very weird things:

  1. in my config file I have 21 samples,but it turns out Pindel2vcf results only contain genotype information for 20 samples! Why one sample less?
    The following is my code

/project/umw_robert_brown/guang/pindel/pindel-0.2.5b8/pindel -T 20 -f /project/umw_robert_brown/guang/data/hg19/g1k_37.fasta -i config_020_ctrl20.txt -c $CHR -x 5 -o als020_ctrl20_$CHR
/project/umw_robert_brown/guang/pindel/pindel-0.2.5b8/pindel2vcf -P als020_ctrl20_$CHR -r /project/umw_robert_brown/guang/data/hg19/g1k_37.fasta -R ALSGenomes -d 20160307 -v als020_ctrl20_$CHR.vcf

I read the std_output from Pindel, which says "there are 20 samples" (but clearly there are 21)

  1. Why genotypes are ALWAYS 0/0 for all samples in Pindel2vcf output?

12 57954451 . CT C . PASS END=57954452;HOMLEN=4;HOMSEQ=TTTT;SVLEN=-1;SVTYPE=DEL GT:AD 0/0:16,0 0/0:12,1 0/0:9,0 0/0:7,0 0/0:7,0 0/0:8,0 0/0:15,0 0/0:8,0 0/0:9,0 0/0:6,0 0/0:4,0 0/0:5,0 0/0:8,0 0/0:9,0 0/0:6,0 0/0:3,0 0/0:17,0 0/0:5,0 0/0:4,0 0/0:6,0

I guess there should be at least one of them showing "0/1" ?

Thanks

The format of pindel output file

Dear Kai and other Pindel authors

In the variant reported, what is the meaning of 'NGS160714_01 2 2 10 10 5 5' ?
It lies in the last part of the header line, for example
`####################################################################################################
18 D 1 NT 0 "" ChrID NC_000001.11 BP 2437459 2437461 BP_range 2437459 2437463 Supports 15 15 + 10 10 - 5 5 S1 66 SUM_MS 900 1 NumSupSamples 1 1 NGS160714_01 2 2 10 10 5 5
CTAGGGTTGCTCCATGCAGTGCCCAGCTCCTACTCCTGTCCAAGACTGACTTAGACCTCCTCTGGCCAGCTGGACAGCTCTGCCCAAATCTCAAATTCATCATCCCTGAGGACTCAACCTCAGACCCTGACTCCAGGCCTCCCTGCTGGGCaAATAGCACCCACCGCAACTAGGGGGCCCAGATCCTGGGAACACCCTCCCGCCCACCATCCGACTCAGCCTGGGGGTTTCTCTCCTGTCCCATGTCCCACTCTTTGCTCCACCCCTGCAGCCTCCACCCCTTCAGAACCACCCTCAAGTCA
ATCATCCCTGAGGACTCAACCTCAGACCCTGACTCCAGGCCTCCCTGCTGGGC AATAGCACCCACCGCAACTAGGGGGCCCAGATCCTGGGAACACCCTCCCGCCCACCATCCGACTCAGCCTGGGGGTTTCTCTCCTGTCCCATGTCCCA + 2437258 60 NGS160714_01 @M04057:26:000000000-ARA6H:1:2113:26461:10237/1
AATTCATCATCCCTGAGGACTCAACCTCAGACCCTGACTCCAGGCCTCCCTGCTGGGC AATAGCACCCACCGCAACTAGGGGGCCCAGATCCTGGGAACACCCTCCCGCCCACCATCCGACTCAGCCTGGGGGTTTCTCTCCTGTCCCATG + 2437378 60 NGS160714_01 @M04057:26:000000000-ARA6H:1:2110:2867:17132/1 .....'

Thanks!

Suboptimal gap opening

I've been running into situations with pindel where it seemed to call short indels at wrong loci but in the general vicinity of the true variant. At a glance, they seemed to be ambiguous representations and I ignored them. Now I can see that they are actually incorrect. Why does it pick a deletion accompanied by four mismatches, while a perfectly-matching deletion of the same size is sitting next door? Does this algorithm pick the first solution that satisfies all criteria? How can I persuade it to do the right thing without the loss of sensitivity?

Example (the first alignment is by pindel, the second is a perfect match):

AATTTTTCCTataATAAAACAAATAATGTTAAAATGTTA
AATTTTTCCT---ATAATAAAACAAATGTTAAAATGTTA
AATTTTTCCTATAATAAAACA---AATGTTAAAATGTTA

Pindel version 0.2.5b9, 20160729, with default parameters

feature request: support space in filepaths in BAM config file

We've worked around this by making symlinks, but it would be nice if the BAM config file were more robust when parsing the first column w/ the BAM filepaths. perhaps it could either respect escaping spaces (Spa\ ce), or allow quoting the filepath?

What is location/position of NT_seq in the Tandem Duplication ?

I am trying to convert pindel Tandem duplication output to vcf record.
Q1] The NT_sequence i.e Not-template sequence , is this always in between the two duplicated sequences ?
For example:
Pindel Tandem duplication output:
6 TD 335 NT 5 "AGGAA" ChrID chr1 BP 121485088 121485424 BP_range 121485088 121485424 Supports 4 3 + 2 1 - 2 2 S1 9 SUM_MS 199 1 NumSupSamples 1 1 sample 2 1 2 2

So, the tandem duplication, will it have following arrangement ?

DNA_stretch(121485088-121485424)-AGGAA-DNA_stretch(121485088-121485424)

As per my understanding, the script pindel2VCF , does not report breakpoints in vcf file. It will report Ref sequence and Alt sequence.

Q2.] When pindel reports start position of tandem duplication, is it inclusive or exclusive position? I want to know whether the allele at the start position is also duplicated ? According to piden2vcf script's output. The allele at the start position is not duplicated.
Can please explain this point ?
For example,
The Allele at start position '121485088' is 'T'. This is not duplicated if we observe the ALT column.
According to pindel2vcf output , NT_seq is in between two duplicated sequence.
Pindel Tandem duplication output:
6 TD 335 NT 5 "AGGAA" ChrID chr1 BP 121485088 121485424 BP_range 121485088 121485424 Supports 4 3 + 2 1 - 2 2 S1 9 SUM_MS 199 1 NumSupSamples 1 1 sample 2 1 2 2

Pindel2VCF output for the above record:
chr1 121485088 . tgaattctcagtaacttccttgtgttgtgtgtattcaactcacagagttgaacgatcctttacacagagcagacttgaaacactctttttgtggaatttgcaagtggagatttcagccgctttgaggtcaatggtagaaaaggaaatatcttcgtatagaaacaagacagaatgattctcagaaactcctttgtgatgtgtgcgttcaactcacagagtttaacctttcttttcatagagcagttaggaaacactctgtttgtaaagtctgcaagtggatattcagacctccttgaggccttcgttggaaacgggatttcttcatattctgctaga tgaattctcagtaacttccttgtgttgtgtgtattcaactcacagagttgaacgatcctttacacagagcagacttgaaacactctttttgtggaatttgcaagtggagatttcagccgctttgaggtcaatggtagaaaaggaaatatcttcgtatagaaacaagacagaatgattctcagaaactcctttgtgatgtgtgcgttcaactcacagagtttaacctttcttttcatagagcagttaggaaacactctgtttgtaaagtctgcaagtggatattcagacctccttgaggccttcgttggaaacgggatttcttcatattctgctagaAGGAAgaattctcagtaacttccttgtgttgtgtgtattcaactcacagagttgaacgatcctttacacagagcagacttgaaacactctttttgtggaatttgcaagtggagatttcagccgctttgaggtcaatggtagaaaaggaaatatcttcgtatagaaacaagacagaatgattctcagaaactcctttgtgatgtgtgcgttcaactcacagagtttaacctttcttttcatagagcagttaggaaacactctgtttgtaaagtctgcaagtggatattcagacctccttgaggccttcgttggaaacgggatttcttcatattctgctaga . PASS END=121485424;HOMLEN=0;SVLEN=335;SVTYPE=DUP:TANDEM;NTLEN=5 GT:PINDEL_DP 0/1:4

Run in parallel chromosome by chromosome

Hello Author,

Now, I am using Pindel and like to increase its speed in addition to multithreading. I was wondering what Pindel results would be lost/missed if I ran Pindel chromosome by chromosome.

I am reading your paper. It seems that results would not be affected by running each chromosome separately on different CPUs because Pindel processes one chromosome at a time. Is it correct?

Thank you for the advice and great software.

Cheers, James

How about different libraries in the same sample

Pindel requires a configure file as input:

file.bam insert-size sample-name

Well, what should I do if my sample has multiple libraries of different insert-sizes but in a single bam file(produced by samtools merge)?

Pindel build fails every time on htslib

Every attempt to ./INSTALL fails with this error:

g++ -L/config/binaries/htslib/1.3.1/ -Wl,-rpath /config/binaries/htslib/1.3.1//lib pindel.o reader.o reporter.o searcher.o parameter.o refreader.o control_state.o search_deletions_nt.o search_inversions.o search_inversions_nt.o bam2depth.o search_tandem_duplications.o search_tandem_duplications_nt.o output_sorter.o farend_searcher.o search_variant.o searchshortinsertions.o searchdeletions.o output_file_data.o bddata.o shifted_vector.o read_buffer.o line_reader.o ifstream_line_reader.o gz_line_reader.o pindel_read_reader.o user_defined_settings.o fn_parameters.o logstream.o search_MEI_util.o search_MEI.o assembly.o genotyping.o -O3 -fopenmp -lhts -lm -lz -o pindel
/usr/bin/ld: cannot find -lhts
collect2: error: ld returned 1 exit status
make[1]: *** [pindel] Error 1
make[1]: Leaving directory `/config/source/pindel/src'
make: *** [pindel] Error 2

INSTALL failed
Possible reasons:

  1. 'cannot cd to [path]
    ->the htslib path provided was incorrect
  2. 'cannot find -lbam'
    ->htslib was not properly compiled/made, in that case, go to the htslib directory and follow the htslib installation instructions
    and run 'make'.

Things I have tried:

  • first attempt used our samtools installation (/config/binaries/samtools/1.3.1/) failed with above
  • installed htslib from .tgz on the github site, failed with same error
  • installed htslib from git clone, switched to 1.3.1, failed with the same error
  • installed htslib from git clone, back on head, failed with the same error
  • edited Makefile.local to put an "=" between rpath and the lib path, removed the extraneous /. Failed with the same error.

Running
ld --verbose -L/config/binaries/htslib/1.3.1/lib/ -lhts
on the command line gets me the error

attempt to open /config/binaries/htslib/1.3.1/lib//libhts.so succeeded
-lhts (/config/binaries/htslib/1.3.1/lib//libhts.so)
(lots of "found libs" msgs removed)
ld: warning: cannot find entry symbol _start; not setting start address

Is there something obvious I'm doing wrong?

pindel install fail

My install steps are:

  1. install htslib-develop
    unzip htslib-develop.zip
    cd /home/sclirl/Tools/htslib-develop
    ./configure --prefix=/home/sclirl/Tools/htslib-develop
    make
    make install
    (This is ok!)
  2. install pindel
    cd /home/sclirl/Tools/pindel-master
    ./INSTALL /home/sclirl/Tools/htslib-develop
    but fail as follows, any help would be appreciated! Thanks!

The fail result:
path is now: /home/sclirl/Tools/htslib-develop/
make: *** No rule to make target Makefile.local', needed bypindel'. Stop.
If this is the first time you're running this install script please wait a moment as we create the Makefile.local
make -C src pindel
make[1]: Entering directory /home/sclirl/Tools/pindel-master/src' make[1]: Leaving directory/home/sclirl/Tools/pindel-master/src'
make[1]: Entering directory /home/sclirl/Tools/pindel-master/src' g++ -I/home/sclirl/Tools/htslib-develop/include -Wall -g -c -O3 -fopenmp pindel.cpp -o pindel.o g++ -I/home/sclirl/Tools/htslib-develop/include -Wall -g -c -O3 -fopenmp reader.cpp -o reader.o In file included from reader.cpp:40: refreader.h:6:1: warning: null character(s) ignored reader.cpp: In function 'void GetOneChrSeq(std::ifstream&, std::string&, bool)': reader.cpp:168: error: 'RefReader' was not declared in this scope reader.cpp:168: error: 'rr' was not declared in this scope reader.cpp:168: error: expected type-specifier before 'RefReader' reader.cpp:168: error: expected ';' before 'RefReader' reader.cpp:170: error: type '<type error>' argument given to 'delete', expected pointer make[1]: *** [reader.o] Error 1 make[1]: Leaving directory/home/sclirl/Tools/pindel-master/src'
make: *** [pindel] Error 2

INSTALL failed
Possible reasons:

  1. 'cannot cd to [path]
    ->the htslib path provided was incorrect
  2. 'cannot find -lbam'
    ->htslib was not properly compiled/made, in that case, go to the htslib directory and follow the htslib installation instructions
    and run 'make'.

For further help, see the FAQ file included in this directory
Or contact us on [email protected]

-j option easily leads to segfault

A misformed BED file passed as an argument to the -j argument leads to segmentation faults, which can be hard to diagnose.

  • Specifying a chrom in the BED which is not in reference will lead to seg fault
  • Having a comment ("# comment") in BED file will lead to seg fault (such comments are allowed in BED file specifications)

In general, implementing basic error checking when parsing this file and reporting errors more explicitly would be very helpful to the user.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.