andyrimmer / platypus Goto Github PK

View Code? Open in Web Editor NEW

105.0 105.0 38.0 54.7 MB

Platypus Variant Caller

License: GNU General Public License v3.0

Shell 0.31% Python 75.31% C 23.10% C++ 0.99% Makefile 0.29%

platypus's People

Contributors

Stargazers

Watchers

platypus's Issues

Best version to use?

For a production run of Platypus, is it best to download the master from github and compile or use the tar.gz release 0.8.1.?

Reference call qualities not well calibrated, and not conservative enough

The QUAL values reported in REFCALL records need some work. These should always be conservative, i.e. a low score should be reported if there is any doubt about the quality of the data or if the coverage is low, or if there is any evidence for variation.

These scores currently get smeared out if the call block size is too high. One solution is to report, for the whole block, the score of the part we are least confident about.

Version on site http://www.well.ox.ac.uk/platypus not up-to-date

I know that is is probably my fault.. but I unsucessfully tried to build the version on the wellcome site instead of the most up-to-date version on github

Maybe it would be usefull to put a link on that website instructing people to come here instead?

In part the reason I could not build the version contained on the website was because I had to set up a few environmemntal variables pointing to htslib .. those instructions are also missing on the indicated homepage

Maybe you could point everyone here instead of having a few clueless individual like be asking dumb questions on the users forums :)

Many thanks

Duarte

isSomaticCandidate() not defined in somaticMutationDetector.py

used in line 562. thanks!

Platypus.py shebang line

Suggest using this shebang in Platypus.py driver file

!/usr/bin/env python

https://github.com/andyrimmer/Platypus/blob/1b7d5b5ada68d0df4f6212d097325bf5e6f18e68/src/python/Platypus.py

It's not dependent on particular installation. It will work for example with virtualenv setups and will allow python to be found by searching the PATH variable, thus allowing users, who are not root, to supply the python they'd like to use when running platypus.

Simple clean up

The downloadable is littered with OSX style ._ files.

Please run

$ find . -iname '._*' -exec rm -rf {} \;

before zipping and uploading.

Error: something is screwy

Hello,
I am running Platypus 0.8.1 as follows:

python Platypus.py callVariants --bamFiles=tumor.bam,normal.bam --output=out.vcf --refFile=$REFHG19 --assemble=1

and am receiving errors like:
ERROR - Exception in region 1:0-100000. Error was Something is screwy here.

The program successfully completes, but the VCF file is empty. It only gives this error on some BAM files. Any suggestions on what could be causing this in my input data?

Best,
Jeremiah

HLA genotyping

hello,

I am trying to genotyping HLA alleles using Platypus and IMGT database, but i don't know how to use
the parameter --HLATyping=HLATYPING ?
Is somebody has already done that ?

thank you --

Inconsistency of genotype and allele number

I don't understand why Platypus output contains two alternative alleles if none of the GT fields includes allele number two?
Platypus version 0.7.9.1 has been launched with option "--minFlank=0". Could this be the cause?
Below is one of the questionable lines from the vcf file.
chr1 3889721 . AG A,GG 124 PASS BRF=0.16;FR=0.0990,0.0990;HP=20;HapScore=2;MGOF=70;MMLQ=27;MQ=56.86;NF=148,4;NR=195,1;PP=124,89;QD=0.370262390671;SC=AAAAAAAAAAAGAAAGGACAT;SbPval=0.66;Source=Platypus;TC=1604;TCF=786;TCR=818;TR=343,5;WE=3889737;WS=3889711
GT:GL:GOF:GQ:NR:NV
0/0:-1,-1,-1:31:35:27,30:3,0
0/1:-1,-1,-1:32:2:33,36:10,0
0/0:-1,-1,-1:33:38:34,36:3,0
0/0:-1,-1,-1:15:8:31,32:4,1
0/0:-1,-1,-1:37:19:21,23:7,0
0/0:-1,-1,-1:33:52:30,32:3,0
0/0:-1,-1,-1:32:12:37,38:10,0
0/0:-1,-1,-1:39:9:24,25:6,0
0/0:-1,-1,-1:38:2:16,17:5,0
0/0:-1,-1,-1:34:17:20,20:4,0
0/1:-1,-1,-1:32:3:22,24:8,0
0/0:-1,-1,-1:31:3:25,26:8,0
0/1:-1,-1,-1:48:3:21,21:8,0
0/0:-1,-1,-1:41:46:24,24:2,0
0/0:-1,-1,-1:44:26:13,13:1,0
0/0:-1,-1,-1:21:38:25,26:1,0
0/0:-1,-1,-1:33:2:23,22:7,0
0/0:-1,-1,-1:34:3:26,26:8,0
0/0:-1,-1,-1:31:28:20,22:2,0
0/0:-1,-1,-1:45:3:26,26:10,0
0/0:-1,-1,-1:43:33:15,18:3,0
0/0:-1,-1,-1:22:25:24,26:2,0
0/0:-1,-1,-1:42:23:36,42:14,0
0/0:-1,-1,-1:26:23:23,24:2,0
0/1:-1,-1,-1:22:2:27,28:6,0
0/1:-1,-1,-1:32:2:26,26:15,0
0/0:-1,-1,-1:20:5:20,23:6,0
0/0:-1,-1,-1:27:14:23,22:4,0
0/1:-1,-1,-1:45:2:20,20:10,0
0/0:-1,-1,-1:36:17:31,30:4,0
0/1:-1,-1,-1:45:3:22,24:5,0
0/0:-1,-1,-1:30:27:23,22:2,0
0/0:-1,-1,-1:28:24:17,18:1,0
0/0:-1,-1,-1:37:11:15,17:3,0
0/0:-1,-1,-1:55:6:25,25:4,0
0/0:-1,-1,-1:37:42:21,23:3,0
0/1:-1,-1,-1:25:2:20,23:5,0
0/0:-1,-1,-1:12:22:11,11:0,0
0/0:-1,-1,-1:23:46:30,34:3,1
0/0:-1,-1,-1:26:8:17,19:5,0
0/0:-1,-1,-1:38:50:29,33:2,0
0/0:-1,-1,-1:18:19:28,29:4,0
0/0:-1,-1,-1:33:13:25,26:2,0
0/0:-1,-1,-1:28:65:36,38:4,0
0/0:-1,-1,-1:56:8:24,24:7,0
0/0:-1,-1,-1:23:2:32,32:9,0
0/0:-1,-1,-1:29:38:33,35:6,0
0/0:-1,-1,-1:31:64:39,43:4,0
0/0:-1,-1,-1:28:21:40,38:7,0
0/0:-1,-1,-1:27:24:31,32:6,0
0/0:-1,-1,-1:41:28:12,15:1,0
0/0:-1,-1,-1:48:21:20,21:4,0
0/0:-1,-1,-1:44:6:25,26:5,2
0/0:-1,-1,-1:38:25:17,17:1,0
0/0:-1,-1,-1:42:50:30,30:7,1
0/0:-1,-1,-1:57:21:22,23:7,0
0/0:-1,-1,-1:24:3:22,22:8,0
0/0:-1,-1,-1:33:31:20,21:3,0
0/1:-1,-1,-1:39:3:24,26:10,0
0/0:-1,-1,-1:28:20:27,28:5,0
0/0:-1,-1,-1:32:29:25,27:4,0
0/0:-1,-1,-1:69:2:23,24:9,0
0/0:-1,-1,-1:29:5:25,25:5,0
0/1:-1,-1,-1:70:3:20,21:9,0
0/0:-1,-1,-1:36:40:31,36:7,0

Refernce Call block size shoud be configurable

Currently the size of reference call blocks is hard-coded to 1kb. This should be configurable.

Is there any way to set ploidy?

Is it possible to specify if the samples are haploid or diploid? Or is Platypus only suitable for variant calling in diploid samples?

How to get the GL for multi variants (multi-allelic)

According to for format of GL(GL # Genotype log-likelihoods (natural log) for AA,AB and BB genotypes, where A = ref and B = variant. Only applicable for bi-allelic sites). How to get GL for multi-allelic variants? I can not understand, Platypus can output GT for multi-allelic without GL.

Add an option for Platypus to exit if an error is encountered

Currently Platypus will log an error and continue if, e.g. a variant cannot be left-aligned or if a variant is too large. Add an option that will cause Platypus to exit with an error message if any error is encountered.

Import Error on MacOSX 10.9

The following error has been reported by several people trying to use Platypus on Mac OSX 10.9.

Traceback (most recent call last):

File "Platypus.py", line 7, in

import runner

File "/Volumes/DELL MAC OSX/PLATYPUS/Platypus_0.7.2 2/runner.py", line 10, in

import variantcaller

File "cpopulation.pxd", line 12, in init variantcaller (variantcaller.c:12302)

File "chaplotype.pxd", line 13, in init cpopulation (cpopulation.c:9961)

ImportError: dlopen(/Volumes/DELL MAC OSX/PLATYPUS/Platypus_0.7.2 2/chaplotype.so, 2): Symbol not found: _extract0

Referenced from: /Volumes/DELL MAC OSX/PLATYPUS/Platypus_0.7.2 2/chaplotype.so

Expected in: flat namespace

in /Volumes/DELL MAC OSX/PLATYPUS/Platypus_0.7.2 2/chaplotype.so

Validate input VCf

When using the --source=FILE.vcf, Platypus should check for correct formatting, correct reference base, etc. For example, at least one ALT allele must be supplied. Sensible warnings/errors should be given if there are problems.

Merge multi-process VCF output in memory

Currently the VCF output from a multi-process run gets merged at the end of the job, which means the whole output must be written to disk, then read again and output to the final location (which might be a pipe to stdout or a file). It should be possible to buffer the output in memory and then merge regularly without writing to disk.

CRAM support using htslib/Samtools

Switch to newest Samtools version (or just use htslib directly), and test Platypus using CRAM and BAM
from the same dataset.

Argument parsing: silently ignores misformed arguments

I typed an argument with the wrong dashes: "–-" instead of "--" and the corresponding argument was silently ignored. Should probably throw a warning.

Very High memory use and allocation in 0.7.4 (and 0.7.2)

Contents of email describing issue:

I’ve recently been trying to run Platypus (0.7.2, 0.7.4) and have been having some issues. Previously I had had some success in running version 0.5.2.

I’ve been trying to run Platypus on a ~120 cattle sequenced to ~24x depth (2.8Gb Genome), this is part of a larger 700 individual dataset (I eventually would like to run it on every thing). Previously with 0.5.2 I had been running a slightly smaller dataset of 100 individuals (same depth) by chromosome and with nCPU=10 on our cluster (Node specs are: Debian, python 2.7.3, 32core, 128GB RAM) with the assemble option turned on and everything worked though it was a little hard to estimate the amount of ram needed.

However now when I try this with 0.7.x everything crashes with out of memory errors, looking at the job while it’s running I see that the python process is requesting insane amounts of VIRT memory often tens of terabytes, while only using a few GB of RES memory (see error message below). After playing around with the various options it seems —nCPU is causing the problem. If I disable that while leaving the assemble=1 option on platypus runs for a while but it’s VIRT requests soon reaches the 10’s of terabytes again while it’s RES memory grows to greater than >50GB (for a single thread top line="13482 aeonsim 20 0 79.7t 41g 5128 R 100 32.9 20:31.99 python" ).

Even with both assemble and ncpu disabled the VIRT memory request soon goes crazy, and the RES memory quickly followed it to >50GB and was still growing when I had to kill it:
python ~/tools/Platypus_0.7.4/Platypus.py callVariants --refFile=/home/aeonsim/refs/bosTau6.fasta --regions=chr1 -o manual-run-test.vcf.gz --bamFiles=combined.bam.list
2014-07-24 14:21:00,250 - INFO - Beginning variant calling
2014-07-24 14:21:00,250 - INFO - Output will go to manual-run-test.vcf.gz
2014-07-24 14:21:37,598 - INFO - Processing region chr1:1-100001. (Only printing this message every 10 regions of size 100000)

As such is this a bug with the memory allocation? It seems much higher than I would expect is needed and the insane VIRT requests is just impossible. Also with assemble off how exactly is Platypus calling larger indels, is it still doing some form of local denovo assembly for them or has it shifting to some other approach?

Secondly can the issue with —nCPU be fixed, I’s much simpler to set up jobs running where platypus is given 1 chromosome and 20 threads to process it than having to manually split the chromosomes up and start each platypus job with a different 10-100MB window.

Finally would it be possible to get a better idea of how much memory is needed as a function of threads and depth of coverage or is it possible to limit the amount of memory being used?

Also do you have a google group or mailing list setup for Platypus so other issues can be seen and to help keep track of the current version of Platypus (or use github?) while when I downloaded the software last year I asked to be emailed when a new version was released that function of your current website doesn’t seem to be working. Therefore to check for a newer version I have to download the -latest.tar.bz2 file and unzip it to see if it’s a new version.

Any way thanks in advance and let me know if you need any more information.

VIRT 87.6t
python ~/tools/Platypus_0.7.4/Platypus.py callVariants --nCPU=4 --refFile=/home/aeonsim/refs/bosTau6.fasta --regions=chr1 -o manual-run-test.vcf --bamFiles=combined.bam.list
2014-07-24 13:41:51,297 - INFO - Beginning variant calling
2014-07-24 13:41:51,298 - INFO - Output will go to manual-run-test.vcf
Traceback (most recent call last):
File "/home/aeonsim/tools/Platypus_0.7.4/Platypus.py", line 44, in
possCommandscommand
File "/home/aeonsim/tools/Platypus_0.7.4/runner.py", line 542, in callVariants
runVariantCaller(options)
File "/home/aeonsim/tools/Platypus_0.7.4/runner.py", line 446, in runVariantCaller
process.start()
File "/usr/lib/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/usr/lib/python2.7/multiprocessing/forking.py", line 120, in init
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

Installation issues.

Installation instructions gives be this output on Linux. Any ideas how to compile under linux? What am i missing?

> ./buildPlatypus.sh
> Building Platypus
> running build
> running build_py
> running build
> running build_py
> running build
> running build_py
> running build
> running build_py
> running build
> running build_py
> running build
> running build_py
> running build
> running build_py
> running build
> running build_py
> running build
> running build_ext
> building 'htslibWrapper' extension
> /apps/gcc/5.3.0/bin/gcc -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/apps/python/2.7.8/include/python2.7 -c htslibWrapper.c -o build/temp.linux-x86_64-2.7/htslibWrapper.o -msse2 -msse3 -funroll-loops -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -fPIC
> gcc -pthread -shared build/temp.linux-x86_64-2.7/htslibWrapper.o -lhts -o build/lib.linux-x86_64-2.7/htslibWrapper.so
> /usr/bin/ld: cannot find -lhts
> collect2: error: ld returned 1 exit status
> error: command 'gcc' failed with exit status 1
> 
> Setup failed. Check previous lines for errors

Make Platypus output bgzipped VCF by default

Currently the output is gzipped VCF. This cannot be indexed with tabix. It should be straight-forward to switch to bgzip.

highly inbred species

Hi,

I work on highly inbred diploid plant species, in many plants we would expect >90-95% homozygosity. I have been testing Platypus and it seems to be faring quite well in this scenario. Do you think it is a legitimate application?

Agnieszka

>1bp SNPs?

Hi,
I have an issue with the output from platypus which I think is problematic. When running platypus with default settings on my data, I get quite some lines of the following format:

chr3 71561603 . TATTACCTTA AATTACCTTG 665 PASS someInfo
chr3 156248949 . TT CC 1654 PASS someInfo

Looking at the alignment (and in the output of three other variant callers), there are well supported variants at these positions, but the way platypus is printing them is simply wrong. It should rather be:

chr3 71561603 . T A 665 PASS someInfo
chr3 71561612 . A G 665 PASS someInfo
chr3 156248949 . T C 1654 PASS someInfo
chr3 156248950 . T C 1654 PASS someInfo

Is there a way to fix this using different parameters?
Best,
Urs

long deletion error messages

Hi,
I'm using Platypus to discover indels, in particular long deletions.
Testing it I found the long deletion I'm looking for in the error messages but not in the result file.

The errors, showing the long deletion, are the following:

ERROR - cpos = 111259961 end pos = 111259959. Variants are (DEL(chr9:111258473-111259959 -TGTC...TATT nReads = 0, Source= 4),)
ERROR - cpos = 111259961 end pos = 111259959. Variants are (SNP(chr9:111258459-111258459 -G +C nReads = 0, Source= 4), DEL(chr9:111258473-111259959 -TGTC...TATT nReads = 0, Source= 4))

Does anyone have suggestions about it? (Why is the deletion displayed by errors and not results?)

Thank you in advance.

BED end position should not be included

Hi Andy,

Running Platypus on a bed file would eventually give me variants at end positions of my bed.
Following these rules the end position should be excluded from the analyses.
If I am correct, this only needs to change this line in the code by adding -1.

BTW very good tool! Running time is very impressive.. 👍

Python 3 support

Neither Platypus nor the de novo calling script have been tested with Python 3. The de novo script definitely does not work due to syntax errors.

Modify Platypus and scripts to work with Python3 and test for any differences in output

Missing variants at the beginning of a region when using only assembler

When I tried using only the assembler to call variants in a region, Platypus failed to make the call of an obvious variant right at the beginning of the region. Shifting the window by 15bp to the left (=k-mer length) enables it to call correctly.

We need to understand correctly what the exact problem is. Andy's thought is that it needs at least a k-mer length cover prior to the starting point of a var to be reported from the bubble graph.

Platypus should down-sample by default

Coverage can be very variable across certain genomes due to collapsed repeats, and variability in the sequencing. Rather than skipping regions when the coverage is too high, Platypus should automatically down-sample coverage per-sample, to a sensible level.

This is already implemented for the alternative haplotype selection model, so it should be straightforward to implement this.

Variant calling on 454 RNAseq data

Hi,
I'm running platypus on some 454 RNAseq data I have. This however, required a minor modification of the read data and the cython source and I would like to know whether you think this will have any significant influence on the results (besides the normal problems arising with variant calling on RNAseq data...)

I read that platypus has problems with the long N's in the cigar from spliced alignments. I therefore used the GATK splitNcigar tools to get rid of those. Am I correct in assuming that platypus does not care about the vast amount of redundant read names and H's in the cigar that do result from this?!
Trying to run platypus on the split reads resulted in a couple of lines reading something like "... cannot fit into short ..." (doesn't remember the correct expression, but it was only a warning, so I ignored it initially) and although it continued to run and completed with "variant calling finished successfully", only a minor part of chr1 was processed. I therefore simply changed all short datatype's to int's in the cython source, recompiled and everything is running smoothly now. However, I don't really understand why the data is not fitting into short as the longest read I have is 600bp. Does platypus maybe use the cigar to calculate the read length and does it wrongly include the hard clipping operator (H) in this calculation? Btw: I'm curious to understand the problem as I will get some pacbio data in the near future :)
Best,
Urs

Cannot configure include and lib paths; defaults to system include and lib paths

Setting the C_INCLUDE_PATH, LIBRARY_PATH, and LD_LIBRARY_PATH variables to the directory of my htslib project prior to compilation doesn't seem to change anything as compilation always terminates with "htslib/bgzf.h is missing".

Manually changing incDirs and libraryDirs in setup.py also didn't change anything, nor did changing the paths in htslibWrapper.pxd.

The best parameters for platypus

Dear andy,
We have got a few trios sequenced by high depth with Illumina Hiseq200 (100bp PE). Each individual has data from 180bp pair-end libraries (30x), 500bp pair-end library (10x), 800bp pair-end libraires (10x), 2kbp mate pair libraries (10x) and 10kbp mate pair libraries (5x). Do you have the best parameter recommendations for calling snps and indels from this dataset.
I can see http://www.well.ox.ac.uk/platypus-doc with the default parameters.

However, it seems that you do not prefer us to enable the assembly option. Is there any reason for this?

--assemble Whether to use the assembler to generate candidate haplotypes 0

Thank you very much.

Chromosome size in Platypus?

Hi,

I'd like to know is there any limit for the size of chromosome in Platypus? I'm working with chromosomes larger than 500Mb and Platypus did not call variants after 530Mb!

Thanks

Interpretation of FORMAT/NR and FORMAT/NV for multiallelic sites

Dear Andy,

How do I interpret the NR and NV fields in the presence of multi-allelic alleles?

Consider the following line, for example.

hs37d5 35158918 . TGCGGC CGCGGC,CGCGGT 1553 PASS FR=0.25,0.25;MMLQ=33.0;TCR=357;HP=1;WE=35158931;Source=Platypus;WS=35158881;PP=1090.0,1553.0;TR=241,117;NF=88,46;TCF=230;NR=153,71;TC=587;MGOF=40;SbPval=0.56;MQ=58.9;QD=4.667100467;SC=GCCCTGGAGATGCGGCCCCCA;BRF=0.1;HapScore=3 GT:GL:GOF:GQ:NR:NV 1/0:-1.0,-1.0,-1.0:35.0:99:235,235:99,34 2/0:-1.0,-1.0,-1.0:29.0:99:89,89:35,33 2/0:-1.0,-1.0,-1.0:37.0:99:183,183:73,33 1/0:-1.0,-1.0,-1.0:40.0:99:80,80:34,17

What is given here?

Convert argparse options too booleans

Hi @andyrimmer,

would it be possible to use booleans in your argparse/optparse definitions?
For example here https://github.com/andyrimmer/Platypus/blob/master/src/python/runner.py#L517

parser.add_option("--filterDuplicates", action='store_true', type = 'boolean', default=False, required=False, help="If set to 1, duplicate reads will be removed based on the read-pair start and end")

I will create a PR if you like.
Bjoern

Platypus option details

Hi
First off, we are having a lot of geeky fun with platypus - run times are amazing!
But, do you have some more detailed descriptions of what and how all the various option work?
And which option are the best go-to option for tweaking in order to change calling sensitivities?
Cheers
Steve

Documentation on sanityChecks.py usage

Can sanityChecks.py be used to test the functionality of a recently recompiled Platypus.py?

Thanks,
Ivan De Dios

Filtering in genotyping mode

When run in genotyping mode, most sites fail Q20. Perhaps site filters should be turned off for variants supplied from bams?

the same allele in multi-allelic variants & multiple records with same position

Hi Andy,
I met some very strange variants in platypus vcf.
Firstly, I am confused that there will be two or more identical alleles in the multi-allelic variants, like "GTT" below
20 3277031 . GT G,GTT,GTT,TT 2364 PASS

Secondly, I thought "4:9364116" below is not allowed. On the other hand, if they have multiple records in the same position, that should be merged but I have found that platypus sometimes shows two records with same position.
4 9364116 . N . 0 REFCALL
4 9364116 . T C 0 Q20;badReads;HapScore;MQ;QD
Are these bugs or am I missing anything?
Thanks

No genotype likelihood for multi-allelic loci

Hi Andy,
Thanks for developing the cool platypus.
I discover that there is no genotype likelihood for multi-allelic loci and this causes problem when applying beagle to phase the variants.
Do you think not calculating genotype likehood for multi-allelic loci a bug or a particular design by platypus?

Best,
Siyang

Merge gVCFs (N+1 Calling)

Platypus should be able to produce gVCFs for many individual samples, and then merge them to create a single VCF with appropriate genotypes across all samples. This is an alternative to joint calling on many samples, and should be much cheaper. Since Platypus can already output gVCF, it's only the merging that needs doing.

Compiling Platypus on POWER7

I got the following error about lack of SSE2 and SSE3 options in GCC on my CentOS 7.2 POWER7 machine. What can I do to compile this without using those instruction sets?
Platypus-compile-error.txt

Mapping quality issue with STAR

STAR assigns 255 to good reads, which is supposed to be reserved for "unknown." From the manual:

The mapping quality MAPQ (column 5) is 255 for uniquely mapping reads, and int(-10*log10(1-1/Nmap)) for multi-mapping reads.

Platypus should check for this value and, optionally, allow these reads through.

Assembler suggests variants that fail minReads filter

The --minReads filter should prevent any variants from being reported if they are supported by fewer than minReads reads. This fails with variants suggested by the assembler, as it uses a filter based on accumulated Phred scores rather than simple read counts.

http://www.well.ox.ac.uk/platypus-paper-data is defunct

Hello,

I am interested in the Platypus, GATK, and Samtools calls on NA12878, and the fosmid calls, described in the Nature Genetics paper. However, the link above is defunct and I am not able to find where I can download these.

Many thanks for any help,

Jeremiah

segmentation fault, exit code 137

I have been testing platypus on a dataset with about 100 samples, a mixture of genomes and exomes. Some chromosomes (mainly smaller ones) generated output, but for many chromosomes, the output is only a temporary file. A log from one chromosome is given below. The BAM files are bwa mem aligned with realignment and dups marked. Any suggestions?

2014-07-25 15:45:53,456 - INFO - Beginning variant calling
2014-07-25 15:45:53,464 - INFO - Output will go to variant/germline/platypus/all_chr15.vcf
2014-07-25 15:54:43,006 - INFO - Processing region chr15:1-100001. (Only printing this message every 10 regions of size 100000)
2014-07-25 17:30:46,446 - INFO - Processing region chr15:1000001-1100001. (Only printing this message every 10 regions of size 100000)
2014-07-25 18:59:40,763 - INFO - Processing region chr15:2000001-2100001. (Only printing this message every 10 regions of size 100000)
2014-07-25 20:25:16,401 - INFO - Processing region chr15:3000001-3100001. (Only printing this message every 10 regions of size 100000)
2014-07-25 21:51:18,000 - INFO - Processing region chr15:4000001-4100001. (Only printing this message every 10 regions of size 100000)
2014-07-25 23:14:53,453 - INFO - Processing region chr15:5000001-5100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 00:38:22,261 - INFO - Processing region chr15:6000001-6100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 02:01:18,457 - INFO - Processing region chr15:7000001-7100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 03:25:50,589 - INFO - Processing region chr15:8000001-8100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 04:50:07,827 - INFO - Processing region chr15:9000001-9100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 06:13:07,911 - INFO - Processing region chr15:10000001-10100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 07:33:42,899 - INFO - Processing region chr15:11000001-11100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 08:55:50,466 - INFO - Processing region chr15:12000001-12100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 10:16:53,685 - INFO - Processing region chr15:13000001-13100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 11:39:30,251 - INFO - Processing region chr15:14000001-14100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 13:15:17,079 - INFO - Processing region chr15:15000001-15100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 15:05:56,443 - INFO - Processing region chr15:16000001-16100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 16:56:50,272 - INFO - Processing region chr15:17000001-17100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 18:47:20,994 - INFO - Processing region chr15:18000001-18100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 20:57:44,911 - INFO - Processing region chr15:19000001-19100001. (Only printing this message every 10 regions of size 100000)
2014-07-26 22:52:42,236 - INFO - Processing region chr15:20000001-20100001. (Only printing this message every 10 regions of size 100000)
/bin/bash: line 1:  1359 Segmentation fault      Platypus callVariants --bamFiles=bam/IHRT3408/DNA/IHRT3408_Normal.realigned.md.bam,bam/PANGRW/DNA/PANGRW_Normal.realigned.md.bam,bam/NAAEDH/DNA/NAAEDH_Normal.realigned.md.bam,bam/PASEFS/DNA/PASEFS_Normal.realigned.md.bam,bam/PASUUH/DNA/PASUUH_Normal.realigned.md.bam,bam/PAMJXS/DNA/PAMJXS_Normal.realigned.md.bam,bam/PALKDP/DNA/PALKDP_Normal.realigned.md.bam,bam/0A4HY5/DNA/0A4HY5_Normal.realigned.md.bam,bam/PANVJJ/DNA/PANVJJ_Normal.realigned.md.bam,bam/0A4HYB/DNA/0A4HYB_Normal.realigned.md.bam,bam/PALZGU/DNA/PALZGU_Normal.realigned.md.bam,bam/PAVCLP/DNA/PAVCLP_Normal.realigned.md.bam,bam/NAAECZ/DNA/NAAECZ_Normal.realigned.md.bam,bam/PAVALD/DNA/PAVALD_Normal.realigned.md.bam,bam/PAUBIT/DNA/PAUBIT_Normal.realigned.md.bam,bam/PAKFVX/DNA/PAKFVX_Normal.realigned.md.bam,bam/IHRT3232/DNA/IHRT3232_Normal.realigned.md.bam,bam/PARJXU/DNA/PARJXU_Normal.realigned.md.bam,bam/IHRT2660/DNA/IHRT2660_Normal.realigned.md.bam,bam/0A4I3S/DNA/0A4I3S_Normal.realigned.md.bam,bam/PATAWV/DNA/PATAWV_Normal.realigned.md.bam,bam/0A4HXG/DNA/0A4HXG_Normal.realigned.md.bam,bam/IHRT1665/DNA/IHRT1665_Normal.realigned.md.bam,bam/IHRT1667/DNA/IHRT1667_Normal.realigned.md.bam,bam/0A4HXS/DNA/0A4HXS_Normal.realigned.md.bam,bam/PAKZZK/DNA/PAKZZK_Normal.realigned.md.bam,bam/PATPBS/DNA/PATPBS_Normal.realigned.md.bam,bam/PANGPE/DNA/PANGPE_Normal.realigned.md.bam,bam/IHRT1184/DNA/IHRT1184_Normal.realigned.md.bam,bam/0A4I0W/DNA/0A4I0W_Normal.realigned.md.bam,bam/0A4I0Q/DNA/0A4I0Q_Normal.realigned.md.bam,bam/0A4I0S/DNA/0A4I0S_Normal.realigned.md.bam,bam/IHRT2817/DNA/IHRT2817_Normal.realigned.md.bam,bam/PAKUZU/DNA/PAKUZU_Normal.realigned.md.bam,bam/PARKAF/DNA/PARKAF_Normal.realigned.md.bam,bam/PAUTWB/DNA/PAUTWB_Normal.realigned.md.bam,bam/IHRT2389/DNA/IHRT2389_Normal.realigned.md.bam,bam/PASNZV/DNA/PASNZV_Normal.realigned.md.bam,bam/PASKZZ/DNA/PASKZZ_Normal.realigned.md.bam,bam/0A4I8U/DNA/0A4I8U_Normal.realigned.md.bam,bam/PATKSS/DNA/PATKSS_Normal.realigned.md.bam,bam/PALKGN/DNA/PALKGN_Normal.realigned.md.bam,bam/PAPVYW/DNA/PAPVYW_Normal.realigned.md.bam,bam/PAPWWC/DNA/PAPWWC_Normal.realigned.md.bam,bam/NAAEDG/DNA/NAAEDG_Normal.realigned.md.bam,bam/NAAEDA/DNA/NAAEDA_Normal.realigned.md.bam,bam/PATEEM/DNA/PATEEM_Normal.realigned.md.bam,bam/0A4I6O/DNA/0A4I6O_Normal.realigned.md.bam,bam/NAAEDI/DNA/NAAEDI_Normal.realigned.md.bam,bam/0A4HX8/DNA/0A4HX8_Normal.realigned.md.bam,bam/0A4I9K/DNA/0A4I9K_Normal.realigned.md.bam,bam/0A4I9I/DNA/0A4I9I_Normal.realigned.md.bam,bam/IHRT1713/DNA/IHRT1713_Normal.realigned.md.bam,bam/PAUXPZ/DNA/PAUXPZ_Normal.realigned.md.bam,bam/IHRT3798/DNA/IHRT3798_Normal.realigned.md.bam,bam/PASEBY/DNA/PASEBY_Normal.realigned.md.bam,bam/PANSEN/DNA/PANSEN_Normal.realigned.md.bam,bam/IHRT1482/DNA/IHRT1482_Normal.realigned.md.bam,bam/PAPKWD/DNA/PAPKWD_Normal.realigned.md.bam,bam/0A4HZR/DNA/0A4HZR_Normal.realigned.md.bam,bam/PANMIG/DNA/PANMIG_Normal.realigned.md.bam,bam/PANPUM/DNA/PANPUM_Normal.realigned.md.bam,bam/PASFCV/DNA/PASFCV_Normal.realigned.md.bam,bam/PAVECB/DNA/PAVECB_Normal.realigned.md.bam,bam/PATUXZ/DNA/PATUXZ_Normal.realigned.md.bam,bam/IHRT2795/DNA/IHRT2795_Normal.realigned.md.bam,bam/PAUYTT/DNA/PAUYTT_Normal.realigned.md.bam,bam/PAMHYN/DNA/PAMHYN_Normal.realigned.md.bam,bam/PASYUK/DNA/PASYUK_Normal.realigned.md.bam,bam/IHRT1161/DNA/IHRT1161_Normal.realigned.md.bam,bam/PARDAX/DNA/PARDAX_Normal.realigned.md.bam,bam/0A4I65/DNA/0A4I65_Normal.realigned.md.bam,bam/PARGTM/DNA/PARGTM_Normal.realigned.md.bam,bam/PAUVUL/DNA/PAUVUL_Normal.realigned.md.bam,bam/IHRT1318/DNA/IHRT1318_Normal.realigned.md.bam,bam/PAMYYJ/DNA/PAMYYJ_Normal.realigned.md.bam,bam/PANZHX/DNA/PANZHX_Normal.realigned.md.bam,bam/PASSLM/DNA/PASSLM_Normal.realigned.md.bam,bam/NAAEDB/DNA/NAAEDB_Normal.realigned.md.bam,bam/PAPNVD/DNA/PAPNVD_Normal.realigned.md.bam,bam/PAMTCM/DNA/PAMTCM_Normal.realigned.md.bam,bam/0A4HMC/DNA/0A4HMC_Normal.realigned.md.bam,bam/PATJVI/DNA/PATJVI_Normal.realigned.md.bam,bam/PALHRL/DNA/PALHRL_Normal.realigned.md.bam,bam/PATMPU/DNA/PATMPU_Normal.realigned.md.bam,bam/PAVDTY/DNA/PAVDTY_Normal.realigned.md.bam,bam/PAUUML/DNA/PAUUML_Normal.realigned.md.bam,bam/IHRT1953/DNA/IHRT1953_Normal.realigned.md.bam,bam/PAPXGT/DNA/PAPXGT_Normal.realigned.md.bam,bam/0A4I4E/DNA/0A4I4E_Normal.realigned.md.bam,bam/0A4HLQ/DNA/0A4HLQ_Normal.realigned.md.bam,bam/0A4I4M/DNA/0A4I4M_Normal.realigned.md.bam,bam/0A4I4O/DNA/0A4I4O_Normal.realigned.md.bam,bam/PAPIJR/DNA/PAPIJR_Normal.realigned.md.bam,bam/PANXSC/DNA/PANXSC_Normal.realigned.md.bam,bam/PAMEKS/DNA/PAMEKS_Normal.realigned.md.bam,bam/PATMXR/DNA/PATMXR_Normal.realigned.md.bam,bam/0A4HLD/DNA/0A4HLD_Normal.realigned.md.bam,bam/PATMIF/DNA/PATMIF_Normal.realigned.md.bam,bam/PATXFN/DNA/PATXFN_Normal.realigned.md.bam,bam/PALECC/DNA/PALECC_Normal.realigned.md.bam,bam/PAMLKS/DNA/PAMLKS_Normal.realigned.md.bam,bam/IHRT2576/DNA/IHRT2576_Normal.realigned.md.bam,bam/PAMHLF/DNA/PAMHLF_Normal.realigned.md.bam,bam/PALWWX/DNA/PALWWX_Normal.realigned.md.bam,bam/0A4I40/DNA/0A4I40_Normal.realigned.md.bam,bam/PASRNE/DNA/PASRNE_Normal.realigned.md.bam,bam/0A4I42/DNA/0A4I42_Normal.realigned.md.bam,bam/0A4I44/DNA/0A4I44_Normal.realigned.md.bam,bam/0A4I48/DNA/0A4I48_Normal.realigned.md.bam,bam/PARFTG/DNA/PARFTG_Normal.realigned.md.bam,bam/PAKXLD/DNA/PAKXLD_Normal.realigned.md.bam,bam/PAPFLB/DNA/PAPFLB_Normal.realigned.md.bam,bam/0A4I5B/DNA/0A4I5B_Normal.realigned.md.bam,bam/PARBGW/DNA/PARBGW_Normal.realigned.md.bam,bam/PAMRHD/DNA/PAMRHD_Normal.realigned.md.bam,bam/PAUTYB/DNA/PAUTYB_Normal.realigned.md.bam,bam/PANZZJ/DNA/PANZZJ_Normal.realigned.md.bam,bam/PALFYN/DNA/PALFYN_Normal.realigned.md.bam --regions=chr15 --output=variant/germline/platypus/all_chr15.vcf --refFile=/data/CCRBioinfo/public/GATK/bundle/2.3/hg19/ucsc.hg19.fasta

Segmentation fault on multiple bam files

We have some TCGA exome bam files, that have been realigned with Stampy (after adding read groups with piccard). The files were then sorted and indexed with samtools. Platypus is running on each individual file correctly but when there is more than one sample we get a segmentation fault error. I am wondering if this might be a sorting error? Thanks in advance.

Increased number of variants with HapScore filter tag

The number of variants with Hapscore filter tag (variants failed for Haploscore) have increased between the two versions I am using. The difference is huge compared to other filter parameters and it throws away a lot of good variants.

Here is the count of filter tags for indels from an Exome called by Platypus_Version_0.7.4. For simplicity I removed the variants with multiple filter tags.

   1671 alleleBias
   5375 badReads
    312 HapScore
   6262 MQ
  71988 PASS
   8368 Q20
   1511 QD
  11479 SC
      2 strandBias

And for the same sample using Platypus_Version_0.8.1

   1633 alleleBias
   6770 badReads
   6509 HapScore
   6030 MQ
  86891 PASS
   9519 Q20
   1598 QD
  11534 SC
      2 strandBias

And below are the filter tags from Platypus_Version_0.7.4 for variants with HapScore score tag in Platypus_Version_0.8.1. A major portion of them have passed all the filters.

     49 alleleBias
     86 badReads
      6 badReads;alleleBias
      2 badReads;MQ
     18 badReads;QD
      1 badReads;QD;alleleBias
    272 HapScore
      3 HapScore;alleleBias
      8 HapScore;badReads
      2 HapScore;badReads;alleleBias
      3 HapScore;badReads;QD
      2 HapScore;badReads;QD;strandBias
      9 HapScore;MQ
     23 HapScore;QD
    126 MQ
      4 MQ;QD
   4414 PASS
    110 Q20
      1 Q20;badReads
      6 Q20;badReads;QD
      1 Q20;HapScore
      2 Q20;HapScore;QD
      3 Q20;MQ
      1 Q20;MQ;QD
     54 Q20;QD
    142 QD
      1 QD;alleleBias
      4 SC
      1 SC;alleleBias
      1 SC;badReads;QD

I checked with couple of samples, and all have the same issue. I couldn't find any information on this HapScore issue.

Run-time error: cerrormodel.so: undefined symbol: min

I am trying to build and run Playpus, but am having a run-time issue.

System: RHEL6, Python-2.7, Cython-0.23, GCC-5.1

"make" executes successfully, but when I run something as lite as: python Platypus.py --help

I get:

Traceback (most recent call last):
  File "/xchip/gistic/Jeremiah/GIT/Platypus/bin/Platypus.py", line 9, in <module>
    import runner
  File "/xchip/gistic/Jeremiah/GIT/Platypus/bin/runner.py", line 9, in <module>
    import variantcaller
  File "variant.pxd", line 17, in init variantcaller (cython/variantcaller.c:16975)
  File "variant.pyx", line 1, in init variant (cython/variant.c:10330)
ImportError: /xchip/gistic/Jeremiah/GIT/Platypus/bin/cerrormodel.so: undefined symbol: min

Has anyone on the Platypus team seen this before / know a workaround? I'm puzzled.

Best,
Jeremiah

EDIT: More info on the link command for cerrormodel.c , which executes just fine:

gcc -pthread -shared -L/broad/software/free/Linux/redhat_6_x86_64/pkgs/tcltk8.5.9/lib -Wl,-rpath,/broad/software/free/Linux/redhat_6_x86_64/pkgs/tcltk8.5.9 -L/broad/software/free/Linux/redhat_6_x86_64/pkgs/sqlite_3.7.5/lib -Wl,-rpath,/broad/software/free/Linux/redhat_6_x86_64/pkgs/sqlite_3.7.5/lib -Wl,-rpath,/broad/software/free/Linux/redhat_6_x86_64/pkgs/db_4.7.25/lib build/temp.linux-x86_64-2.7/cython/cerrormodel.o build/temp.linux-x86_64-2.7/c/tandem.o -L/broad/software/free/Linux/redhat_6_x86_64/pkgs/python_2.7.1-sqlite3-rtrees/lib -lpython2.7 -o build/lib.linux-x86_64-2.7/cerrormodel.so

Crash when running with Tophat BAMs

Hi Andy,

The error is in parsing the bam header.
It works on the same bam file when I run it individually.

Here is the log file -

    2014-09-19 14:54:11,972 - INFO - Beginning variant calling
    2014-09-19 14:54:11,974 - INFO - Output will go to
    VariantCalls.vcf
    2014-09-19 14:54:12,195 - DEBUG - The following regions will be
    searched: [('chr16', 89985900, 89986000)]
    2014-09-19 14:54:12,200 - DEBUG - The following genomic regions
    will be searched: [('chr16', 89985900, 89986000)]
    2014-09-19 14:54:12,200 - INFO - Searching for variants in the
    following regions: [('chr16', 89985900, 89986000)]
    2014-09-19 14:54:12,224 - DEBUG - Error in BAM header sample
    parsing. Error was
    'RG'

    2014-09-19 14:54:12,224 - DEBUG - Adding sample name
    accepted_hits, from BAM file
    WTCHG_123485_268_uniq_tophat/accepted_hits.bam
    2014-09-19 14:54:12,396 - DEBUG - Error in BAM header sample
    parsing. Error was
    'RG'

    2014-09-19 14:54:12,396 - DEBUG - Adding sample name
    accepted_hits, from BAM file
    WTCHG_123485_270_uniq_tophat/accepted_hits.bam
    2014-09-19 14:54:12,404 - DEBUG - Max haplotypes used for
    initial haplotype filtering = 50
    2014-09-19 14:54:12,404 - DEBUG - Max haplotypes used for
    genotype generation = 50
    2014-09-19 14:54:12,404 - DEBUG - Max genotypes = 1275
    2014-09-19 14:54:12,407 - INFO - Processing region
    chr16:89985900-89986000. (Only printing this message every 10
    regions of size 100000)

HLA Genotyping

Dear Sirs,

I am trying to make HLA Genotyping for HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DRB2 and HLA-DQA1. Please tell me where can I find the HLA reference files in VCF format, as it is required for one of the input parameters.

Thank you.

andyrimmer / platypus Goto Github PK

platypus's People

Contributors

Stargazers

Watchers

Forkers

platypus's Issues

!/usr/bin/env python

Contents of email describing issue:

Recommend Projects

Recommend Topics

Recommend Org

Jobs