GithubHelp home page GithubHelp logo

dellytools / delly Goto Github PK

View Code? Open in Web Editor NEW
416.0 33.0 136.0 38.32 MB

DELLY2: Structural variant discovery by integrated paired-end and split-read analysis

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.29% C++ 96.26% Dockerfile 0.33% R 0.68% C 1.60% Python 0.84%
structural-variation sv-discovery delly-users delly rearrangement genomic cancer-genomics germline tumor svs

delly's Introduction

Delly

install with bioconda Anaconda-Server Badge C/C++ CI Docker CI GitHub license GitHub Releases

Delly is an integrated structural variant (SV) prediction method that can discover, genotype and visualize deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read and long-read massively parallel sequencing data. It uses paired-ends, split-reads and read-depth to sensitively and accurately delineate genomic rearrangements throughout the genome.

Installing Delly

Delly is available as a statically linked binary, a singularity container (SIF file), a docker container or via Bioconda. You can also build Delly from source using a recursive clone and make.

git clone --recursive https://github.com/dellytools/delly.git

cd delly/

make all

There is a Delly discussion group delly-users for usage and installation questions.

Delly multi-threading mode

Delly supports parallel computing using the OpenMP API (www.openmp.org).

make PARALLEL=1 src/delly

You can set the number of threads using the environment variable OMP_NUM_THREADS.

export OMP_NUM_THREADS=2

Delly primarily parallelizes on the sample level. Hence, OMP_NUM_THREADS should be always smaller or equal to the number of input samples.

Running Delly

Delly needs a sorted, indexed and duplicate marked bam file for every input sample. An indexed reference genome is required to identify split-reads. Common workflows for germline and somatic SV calling are outlined below.

delly call -g hg38.fa input.bam > delly.vcf

You can also specify an output file in BCF format.

delly call -o delly.bcf -g hg38.fa input.bam

bcftools view delly.bcf > delly.vcf

Example

A small example is included for short-read, long-read and copy-number variant calling.

delly call -g example/ref.fa -o sr.bcf example/sr.bam

delly lr -g example/ref.fa -o lr.bcf example/lr.bam

delly cnv -g example/ref.fa -m example/map.fa.gz -c out.cov.gz -o cnv.bcf example/sr.bam

More in-depth tutorials for SV calling are available here:

Somatic SV calling

  • At least one tumor sample and a matched control sample are required for SV discovery

delly call -x hg38.excl -o t1.bcf -g hg38.fa tumor1.bam control1.bam

  • Somatic pre-filtering requires a tab-delimited sample description file where the first column is the sample id (as in the VCF/BCF file) and the second column is either tumor or control.

delly filter -f somatic -o t1.pre.bcf -s samples.tsv t1.bcf

  • Genotype pre-filtered somatic sites across a larger panel of control samples to efficiently filter false postives and germline SVs. For performance reasons, this can be run in parallel for each sample of the control panel and you may want to combine multiple pre-filtered somatic site lists from multiple tumor samples.

delly call -g hg38.fa -v t1.pre.bcf -o geno.bcf -x hg38.excl tumor1.bam control1.bam ... controlN.bam

  • Post-filter for somatic SVs using all control samples.

delly filter -f somatic -o t1.somatic.bcf -s samples.tsv geno.bcf

Germline SV calling

  • SV calling is done by sample for high-coverage genomes or in small batches for low-coverage genomes

delly call -g hg38.fa -o s1.bcf -x hg38.excl sample1.bam

  • Merge SV sites into a unified site list

delly merge -o sites.bcf s1.bcf s2.bcf ... sN.bcf

  • Genotype this merged SV site list across all samples. This can be run in parallel for each sample.

delly call -g hg38.fa -v sites.bcf -o s1.geno.bcf -x hg38.excl s1.bam

delly call -g hg38.fa -v sites.bcf -o sN.geno.bcf -x hg38.excl sN.bam

  • Merge all genotyped samples to get a single VCF/BCF using bcftools merge

bcftools merge -m id -O b -o merged.bcf s1.geno.bcf s2.geno.bcf ... sN.geno.bcf

  • Apply the germline SV filter which requires at least 20 unrelated samples

delly filter -f germline -o germline.bcf merged.bcf

Delly for long reads from PacBio or ONT

Delly also supports long-reads for SV discovery.

delly lr -y ont -o delly.bcf -g hg38.fa input.bam

delly lr -y pb -o delly.bcf -g hg38.fa input.bam

Read-depth profiles and copy-number variant calling

You can generate read-depth profiles with delly. This requires a mappability map which can be downloaded here:

Mappability Maps

The command to count reads in 10kbp mappable windows and normalize the coverage is:

delly cnv -a -g hg38.fa -m hg38.map -c out.cov.gz -o out.bcf input.bam

The output file out.cov.gz can be plotted using R to generate normalized copy-number profiles and segment the read-depth information:

Rscript R/rd.R out.cov.gz

Instead of segmenting the read-depth information, you can also visualize the CNV calls.

bcftools query -f "%CHROM\t%POS\t%INFO/END\t%ID[\t%RDCN]\n" out.bcf > seg.bed

Rscript R/rd.R out.cov.gz seg.bed

With -s you can output a statistics file with GC bias information.

delly cnv -g hg38.fa -m hg38.map -c out.cov.gz -o out.bcf -s stats.gz input.bam

zcat stats.gz | grep "^GC" > gc.bias.tsv

Rscript R/gcbias.R gc.bias.tsv

Germline CNV calling

Delly uses GC and mappability fragment correction to call CNVs. This requires a mappability map.

  • Call CNVs for each sample and optionally refine breakpoints using delly SV calls

delly cnv -o c1.bcf -g hg38.fa -m hg38.map -l delly.sv.bcf input.bam

  • Merge CNVs into a unified site list

delly merge -e -p -o sites.bcf -m 1000 -n 100000 c1.bcf c2.bcf ... cN.bcf

  • Genotype CNVs for each sample

delly cnv -u -v sites.bcf -g hg38.fa -m hg38.map -o geno1.bcf input.bam

bcftools merge -m id -O b -o merged.bcf geno1.bcf ... genoN.bcf

  • Filter for germline CNVs

delly classify -f germline -o filtered.bcf merged.bcf

  • Optional: Plot copy-number distribution for large number of samples (>>100)

bcftools query -f "%ID[\t%RDCN]\n" filtered.bcf > plot.tsv

Rscript R/cnv.R plot.tsv

Somatic copy-number alterations (SCNAs)

  • For somatic copy-number alterations, delly first segments the tumor genome (-u is required). Depending on the coverage, tumor purity and heterogeneity you can adapt parameters -z, -t and -x which control the sensitivity of SCNA detection.

delly cnv -u -z 10000 -o tumor.bcf -c tumor.cov.gz -g hg38.fa -m hg38.map tumor.bam

  • Then these tumor SCNAs are genotyped in the control sample (-u is required).

delly cnv -u -v tumor.bcf -o control.bcf -g hg38.fa -m hg38.map control.bam

  • The VCF IDs are matched between tumor and control. Thus, you can merge both files using bcftools.

bcftools merge -m id -O b -o tumor_control.bcf tumor.bcf control.bcf

  • Somatic filtering requires a tab-delimited sample description file where the first column is the sample id (as in the VCF/BCF file) and the second column is either tumor or control.

delly classify -p -f somatic -o somatic.bcf -s samples.tsv tumor_control.bcf

  • Optional: Plot the SCNAs using bcftools and R.

bcftools query -s tumor -f "%CHROM\t%POS\t%INFO/END\t%ID[\t%RDCN]\n" somatic.bcf > segmentation.bed

Rscript R/rd.R tumor.cov.gz segmentation.bed

FAQ

  • Visualization of SVs
    You may want to try out wally to plot candidate structural variants. The paired-end coloring is explained in wally's README file.

  • What is the smallest SV size Delly can call?
    For short-reads, this depends on the sharpness of the insert size distribution. For an insert size of 200-300bp with a 20-30bp standard deviation, Delly starts to call reliable SVs >=300bp. Delly also supports calling of small InDels using soft-clipped reads only, the smallest SV size called is 15bp. For long-reads, delly calls SVs >=30bp.

  • Can Delly be used on a non-diploid genome?
    Yes and no. The SV site discovery works for any ploidy. However, Delly's genotyping model assumes diploidy (hom. reference, het. and hom. alternative). The CNV calling allows to set the baseline ploidy on the command-line.

  • Delly is running too slowly what can I do?
    You should exclude telomere and centromere regions and also all unplaced contigs (-x command-line option). In addition, you can filter input reads more stringently using -q 20 and -s 15. Lastly, -z can be set to 5 for high-coverage data.

  • Are non-unique alignments, multi-mappings and/or multiple split-read alignments allowed?
    Delly expects two alignment records in the bam file for every paired-end, one for the first and one for the second read. Multiple split-read alignment records of a given read are allowed if and only if one of them is a primary alignment whereas all others are marked as secondary or supplementary. This is the default for bwa, minimap2 and many other aligners.

  • What pre-processing of bam files is required?
    Bam files need to be sorted, indexed and ideally duplicate marked.

  • Usage/discussion mailing list?
    There is a delly discussion group delly-users.

  • Docker/Singularity support?
    There is a delly docker container and singularity container (*.sif file) available.

  • How can I compute a mappability map?
    A basic mappability map can be built using dicey, samtools and bwa with the below commands (as an example for the sacCer3 reference):

dicey chop sacCer3.fa
bwa index sacCer3.fa
bwa mem sacCer3.fa read1.fq.gz read2.fq.gz | samtools sort -@ 8 -o srt.bam -
samtools index srt.bam 
dicey mappability2 srt.bam 
gunzip map.fa.gz && bgzip map.fa && samtools faidx map.fa.gz 
  • Bioconda support?
    Delly is available via bioconda.

Citation

Tobias Rausch, Thomas Zichner, Andreas Schlattl, Adrian M. Stuetz, Vladimir Benes, Jan O. Korbel.
DELLY: structural variant discovery by integrated paired-end and split-read analysis.
Bioinformatics. 2012 Sep 15;28(18):i333-i339.
https://doi.org/10.1093/bioinformatics/bts378

License

Delly is distributed under the BSD 3-Clause license. Consult the accompanying LICENSE file for more details.

delly's People

Contributors

chapplec avatar jvhaarst avatar lindenb avatar mhyfritz avatar smoe avatar tobiasrausch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

delly's Issues

Please vendor dependencies instead of git submodules

Hi,

please vendor the dependencies instead of making submodules. For a packager it is quite hard to work with the current build system you have.

The current releases on github are actually more harmful than helpful because after downloading it one ends up with basically broken code.

This is also in reference to: #32

Naming of output files

Hi Tobi,

any particular reason why you remove the suffix of the filename twice, for example here?
Because that leads to the situation that output files get overwritten when I run

delly merge -t INV -o merge.INV.bcf a.bcf a.bcf
delly merge -t TRA -o merge.TRA.bcf a.bcf a.bcf

Thanks

Segmentation fault

Hi

I got segmentation fault with the latest delly Version: 0.7.3

0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|


18187 Segmentation fault (core dumped)

OS: Ubuntu 15.1

Command :
/home/gwenneg/bin/delly/src/delly call -t DEL -q 13 -o $OUTDIR/$SAMPLENAME.delly.bcf -g $REF $TMPDIR/$TUMOR.sorted.bam $TMPDIR/$NORMAL.sorted.bam

My bam files are sorted and indexed

Thank's for helping

Gwenneg

delly_v0.7.1

Hi Tobias,

Have you mandated the split read analysis option -g with this release of delly?
I don't seem to be able to run delly otherwise?

Thanks,
Sirisha

Confusion about SV genotypes in merged delly results

Hi Tobias,

Recently, I have tried your "Germline SV calling" pipeline by Delly on human NGS data.
Step 1: Using "delly call" to generate the individual-separated BCF files;
Step 2: Using "delly merge" to generate a unified site list;
Step 3: Using "delly call" again to regenerate the new individual-separated BCF files;
Step 4: Using "bcftools merge" to merge all the new individual-separated BCF files together;
Step 5: Using "delly filter" to filter out bad SVs.

According to the definition of VCF file (http://samtools.github.io/hts-specs/VCFv4.2.pdf), it's easily to understand that "./." stands for "missing allele" , "0/0" stands for "homo-reference allele", "0/1" stands for "heterozygote" and "1/1" stands for "homo-alternative allele"

And then my confusions are:
(1) During Step1, if a SV was coded "./." in one individual-separated BCF file, is the "./." stands for "This SV happened in this individual, but we just don't know its genotype" or "This SV didn't happen in this individual"? (In fact, for many other SV Callers who didn't generate SV genotype, it seems they just use include or exclude SV in the BCF file to tell us weather one individual has one SV or not.)

(2) During Step3+Step4, when we try to merge all the individual-separated BCF file together based on a unified site list, many individuals must don't have some SVs instead of genotype missing, so how did you code those locus? (Did you just remove those SVs away just because some individual don't have them or you just code those locus as "./." ?)

Looking for your response!

Best regards,
Leilei

Visualising Delly output + Fusion detection

Hi,

Thank you for processing ICGC data. I have obtained about ~250 patient vcfs from ICGC and I would to call a specific fusion based on delly output caused by a translocation.

1)What should be the best method for this kind of querry?
—> I thought visualising and deciding by my eye. This raised visualisation problem. Neither bedpe output nor vcf output helped out with the visualisation. (IGV and other genome browsers works but its really hard to asses translocation so I wanted to draw a circos plot.)

  1. I have read about using snpEff or VEP(I am not sure about this). But as far as I know, snpEff does not support translocations so it won't work for my case.

  2. Lastly, forgive my ignorance about the following terminology. Bedpe and reedname. Could you give me at least a source about this two file extensions ? (For example what is the difference btw bedpe and vcf. I know what is bed file but I couldnt link these together in structural variation subject.)

Sorry for asking bunch of questions. My graduate project is based on this data therefore, I am trying to process this information as good as possible.

Thank you for your patience,

Best,

Tunc/.

Error running suave

127.0.0.1 - - [16/Mar/2016 12:32:44] "GET /depth/L1T/L1/1?n=10000 HTTP/1.1" 200 -
127.0.0.1 - - [16/Mar/2016 12:32:48] "GET /calls/1 HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1836, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1820, in wsgi_app
    response = self.make_response(self.handle_exception(e))
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1403, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/delly/vis/suave/suave_server.py", line 56, in calls
    chrom2 = chr2_re.search(row[7]).group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Questions about DELLY0.7.3

After change my delly to version 0.7.3, I compare the result of v0.7.2 and the new version0.7.3, I was confused about some result, I am appreciated if anyone can answer my questions.

  1. new result on vcf file(CE:Consensus sequence entropy), I know it provide accuracy of consensus sequence, however I donot understand how to judge it, the smaller the better? and can anyone give me the range of it?
  2. With the same data, I found that the version0.7.2 can got insertion events, but the new version0.7.3 cannot. I test 4 times for different data, and the new version0.7.3 always donot give me any insertion events, but version0.7.2 provide its.
  3. I also confused about the filter, is there any change in it. Because I found that with same arguments, 0.7.3version give us more SV events than version0.7.2. Can anyone explain it? I am also interested about the filter mechanism. I try to filter it by genotype (somaticsample:0/1;germlinesample:0/0), but it much larger than you filter, is there anyother codition?
  4. about merge. I am confused about it. which condition should I use it ? merge different samples? or others? can I use it to merge different types of SV files(bcf)?

Thanks a lot

Confusion about merge output of different SV types (DEL/DUP/INV/TRA/INS)

Hi, Tobias. Thanks for such a methodical software for SV calling!

Recently, I had tried Delly (delly_v0.7.3_linux_x86_64bit) to call several different types of SV (DEL/DUP/INV/TRA/INS) from my WGS data in the Germline SV Calling mode, here is the command I used to call DEL :

$delly merge -t DEL -m 0 -n 1000000 -o MAGIC.DEL1.bcf -b 1000 -r 0.8 IND1.bcf IND2.bcf IND3.bcf IND4.bcf IND5.bcf IND6.bcf IND7.bcf IND8.bcf IND9.bcf IND10.bcf

And then I generate a DEL site list (MAGIC.DEL1.bcf and MAGIC.DEL1.bcf.csi). The merge output files are similar when I call DEL, DUP and INS, but when I turn to merge INV and TRA, I got more files:
INV: MAGIC.3to3.bcf and MAGIC.5to5.bcf
TRA: MAGIC.3to3.bcf, MAGIC.3to5.bcf, MAGIC.5to3.bcf and MAGIC.5to5.bcf

Q1: Could you please explain me a bit more about these files above?
Q2: As for DEL, I can just use this command to re-genotype merged DEL site list like:
$delly call -t DEL -g MAGIC.fa -v MAGIC.DEL1.bcf -o Ind1.geno.bcf -x MT_CP.excl Ind.bcf
But for INV and TRA, how could I re-genotype with the list files above?

Best,
Leilei

Deviation from vcf spec

Hello,

According to the header Delly (we are using v0.7.2) follows the VCFv4.1 format. Unfortunately delly seems not to follow the vcf specs on some points, this makes downstream analysis of delly results difficult.

SVTYPE

TRA is not a valid svtype according to the vcf-spec.
Delly: Values: DEL, DUP, INV, TRA, INS.
VCF-spec: ##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant"> Value should be one of DEL, INS, DUP, INV, CNV, BND.

Translocations / breakends

Translocations / breakends should be using the following format according to the vcf spec.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
2   321681  bnd W   G   G]17:198982]    6   PASS    SVTYPE=BND
17  198982  bnd Y   A   A]2:321681] 6   PASS    SVTYPE=BND

It would be very useful if Delly would output these variants according to the vcf specifications.

read orientation on TRA function

Hi Tobias,
We are trying to figure out how delly decides the type of read orientation on a translocation event. Inversions seem to be straight forward (3to3 or 5to5), however duplication type (5to3) or deletion type (3to5) don't seem to be obvious to us. How do you decide which one is read-1 and which one is read-2 in a translocation event? Next, how do you decide whether this was a deletion or duplication type read orientation?

The trouble we are having is that by flipping read-1 and read-2, the translocation can be deletion or a duplication type depending on your frame of reference.

Much appreciated. Thanks.

Delly run time for large WGS file

Many thanks for such a great SV tool combining split-read and paired-end discordancy signals!

I'd like to look for SV for 20 WGS of patient samples of complex disease plus some controls from 1000 genome project.
Each patient sample has size around 70-80GB; and control smaller around 20GB

I'm wondering what's the run time cost of such bam file? I've done small test seems it runs relatively slow. To set -t INV/DEL/DUP to run separately? Or any possible way to run each chromosome separately?
Also, any possibility to run 20 WGS +20 control together? (That'll be cost lots of memory and super slow?) Or should I run each sample independently?

Hope to hear any suggestion for running Delly for multiple big WGS bam file

Many thanks!

GATK error parsing DELLY2 VCF

Hello,

The DELLY VCFs write per-sample genotypes roughly like the following for no-call samples:

./.:.,.,.:0:LowQual:0:0:0:-1:0:0:0:0

note the ".,.,." for genotype likelihoods. GATK chokes on this with a parse error saying "." is not an integer. In similar no-call situations GATK writes these as 0,0,0. I can see an argument that "." ought to be acceptable, but it would be helpful if DELLY conformed to GATK to allow their variant manipulation tools to work. I fixed my outputs w/ sed for now.

Thanks,
Ben

v0.7.3 Segmentation fault

Hi,

The new features in v0.7.3 sound great, however I'm getting seg faults when I try to run input that worked with v0.7.2.

Build on linux(CentOS) HPC cluster using:

git clone --recursive https://github.com/tobiasrausch/delly.git
cd delly/
make all

Any ideas?

(duplicate post on delly-users)
Thanks,
Jacob

Trouble with somatic filtering

Hi Tobias,

I have run the delly call for del, dup, inv and tra. And I would now like to filter somatic variants.

This is my line of script
delly filter -t DEL -f somatic -o /data/240del.pre.bcf -s /data/samples.tsv -g /data/hg19.fa /data/240del.bcf

My samples.tsv file looks like this:
240control control
240tumor tumor

But I am getting an error message:
Sample type for 240_control is neither tumor nor control

Are you please able to advise how I could fix this?

Thanks,
Rachel

Use local read-depth to further filter Delly results

Hi
I know Delly is based on read-pair and split-read, but I'm wondering any auxiliary package for local read-depth check for the callings in order for further filter?

I read from supplementary document of 1000Genome Project:
"Read-depth (RD) of all candidate deletions was annotated using ‘cov’, an auxiliary tool from the Delly package. The raw read-depth values...."

Obviously there's some read-depth filter package, but I'm wondering where I could find from Delly software? THX!

potential bug causes long runtimes for translocations

Here is what i suspect is happening.

On line 1350 in src/delly.cpp:

if (reader.SetRegion(regionChr, regionStart, regionChr, regionEnd)) {

SetRegion occasionally gets called with regionChr as -1 when in translocation mode. This causes delly to run through many alignments in the bam file, with no consequence but increased runtime.

regionChr is likely -1 because an unmapped read with MateRefID set to -1 was not filtered, as it perhaps should have been.

Segmentation fault

Hi,

running DELLY (Version: 0.6.7) with a set of about 25samples all I get is an Error.

command: delly -t DEL -g hs37d5.fa -o delly_DEL_sv.vcf -x human.hg19.excl.tsv 1.bam 2.bam 3.bam .... 25.bam

[2015-Sep-07 15:19:17] Paired-end clustering

0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
*Segmentation fault

Is it possible to get a more verbose error message or do you maybe know possible reasons? I've run the same command with even more samples and all went well.

Thanks.
Marten

segmentation fault

Hi

I got segmentation fault with the latest delly v0.6.6 (delly_v0.6.6_linux_x86_64bit):
./delly -t DEL -o del.vcf -g hg19.fasta tmp.sorted.bam
[2015-Jul-07 16:57:39] ./delly -t DEL -o del.vcf -g hg19.fasta tmp.sorted.bam
[2015-Jul-07 16:57:39] Paired-end clustering

0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
**zsh: segmentation fault (core dumped) ./delly -t DEL -o del.vcf -g hg19.fasta

File tmp.sorted.bam can be downloaded from:
http://rain.ifmo.ru/~svkazakov/files/tmp.sorted.bam

OS:
Linux 3.13.0-43-generic, Ubuntu 14.04, 64-bit

Sergey.

Delly won't call any structural variants when the "R-" count dominates

Hi Tobias,

It seems that Delly most definitely won't call "any" structural variants if the FR- and RF- population is higher than the R+, F+ or F- pools.

I have almost never had a problem calling known gene-fusion events even with a very high proportion of the "R-" species, as long as the counts are still lower than the R+ pools. The only structural variants of interest to me are gene fusion events that occur over large distances (>10 Mb or so).

I still don't understand this very well, but any insights into why Delly completely draws a blank when the R- pools have the highest counts?

Attached is a plot with insert-sizes on the x-axis and counts on the y-axis.

s5_17178

Strand specific orientations
FF+ (0): 17528
FF- (1): 25675
FR+ (2): 6909342
FR- (3): 19978535
RF+ (4): 7492191
RF- (5): 22308715
RR+ (6): 4996
RR- (7): 4686

Strand independent orientations
F+ (0): 22524
F- (1): 30361
R+ (2): 14401533
R- (3): 42287250

Thanks a lot,
Sirisha

Translocation file doesn't have anything for chr1

I have ran the delly translocation analysis and for some reason nothing is showing up on chr1. Every other chromosome has at least some low quality or pass results, but nothing is showing up for chromosome 1. Chromosome 1 results are showing up for the deletion, duplication, and inversion analysis. Is this a bug?

somatic filtering python scripts

Hi, I have been able to run the latest version successfully on our data set. The latest update seems to have fixed the segmentation issue I was running into as well. I am trying now to run the python somatic filtering scripts but I am running into some issues installing the required python modules specifically (vcf and re):

import argparse
import vcf
import numpy
import re
import banyan

any help would be appreciated,

Thanks,
Fouad

Error with Sauve visualization

I'm having this error when trying to run Suave on paired WES samples. I ran delly to locate translocations, and converted the .bcf to .vcf. Here is the output:

10.7.203.62 - - [14/Jul/2016 13:03:09] "GET / HTTP/1.1" 200 -
10.7.203.62 - - [14/Jul/2016 13:03:09] "GET /static/googlefonts/stylesheet.css HTTP/1.1" 304 -
10.7.203.62 - - [14/Jul/2016 13:03:09] "GET /static/suave.css HTTP/1.1" 304 -
10.7.203.62 - - [14/Jul/2016 13:03:09] "GET /static/bootstrap/css/bootstrap.min.css HTTP/1.1" 304 -
10.7.203.62 - - [14/Jul/2016 13:03:09] "GET /static/jquery.min.js HTTP/1.1" 304 -
10.7.203.62 - - [14/Jul/2016 13:03:09] "GET /static/font-awesome/css/font-awesome.min.css HTTP/1.1" 304 -
10.7.203.62 - - [14/Jul/2016 13:03:09] "GET /static/d3.min.js HTTP/1.1" 304 -
10.7.203.62 - - [14/Jul/2016 13:03:09] "GET /static/bootstrap/js/bootstrap.min.js HTTP/1.1" 304 -
10.7.203.62 - - [14/Jul/2016 13:03:09] "GET /static/underscore-min.js HTTP/1.1" 304 -
10.7.203.62 - - [14/Jul/2016 13:03:09] "GET /static/typeahead.js/typeahead.bundle.min.js HTTP/1.1" 304 -
10.7.203.62 - - [14/Jul/2016 13:03:09] "GET /static/suave.js HTTP/1.1" 304 -
10.7.203.62 - - [14/Jul/2016 13:03:09] "GET /static/bootstrap/fonts/glyphicons-halflings-regular.woff2 HTTP/1.1" 304 -
10.7.203.62 - - [14/Jul/2016 13:03:09] "GET /static/font-awesome/fonts/fontawesome-webfont.woff2?v=4.3.0 HTTP/1.1" 304 -
10.7.203.62 - - [14/Jul/2016 13:03:16] "GET /chroms/GB1_tumor/control HTTP/1.1" 200 -
1 249250621 10000 0 2492506
suave_server.py:149: RuntimeWarning: invalid value encountered in true_divide
for r in np.log2(x_sum / y_sum / cfg['norm'])
suave_server.py:149: RuntimeWarning: divide by zero encountered in log2
for r in np.log2(x_sum / y_sum / cfg['norm'])
10.7.203.62 - - [14/Jul/2016 13:03:18] "GET /depth/GB1_tumor/control/1?n=10000 HTTP/1.1" 200 -
index /home/dmelnekoff/Delly_Output/GB1_test.vcf.gz.tbi not found
10.7.203.62 - - [14/Jul/2016 13:03:18] "GET /calls/1 HTTP/1.1" 500 -

Do I need to generate a .tbi file from my Delly generated .vcf for this to work?

Thanks

error: boost/multiprecision/cpp_dec_float.hpp: No such file or directory

Hi Tobias,
I'm having trouble building delly. I have some of the boost headers, but not the one listed in the title. is there perhaps a specific version of boost that i need?
cheers,
Mark

$ make all
cd src/htslib && make
make[1]: Entering directory `/home/marcow/apps/delly/src/htslib'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/marcow/apps/delly/src/htslib'
cd src/bamtools && mkdir -p build && cd build && cmake .. && make
-- Configuring done
-- Generating done
-- Build files have been written to: /home/marcow/apps/delly/src/bamtools/build
make[1]: Entering directory `/home/marcow/apps/delly/src/bamtools/build'
make[2]: Entering directory `/home/marcow/apps/delly/src/bamtools/build'
make[3]: Entering directory `/home/marcow/apps/delly/src/bamtools/build'
make[3]: Leaving directory `/home/marcow/apps/delly/src/bamtools/build'
make[3]: Entering directory `/home/marcow/apps/delly/src/bamtools/build'
[  0%] Exporting SharedHeaders
make[3]: Leaving directory `/home/marcow/apps/delly/src/bamtools/build'
[  1%] Built target SharedHeaders
make[3]: Entering directory `/home/marcow/apps/delly/src/bamtools/build'
make[3]: Leaving directory `/home/marcow/apps/delly/src/bamtools/build'
make[3]: Entering directory `/home/marcow/apps/delly/src/bamtools/build'
[  1%] Exporting APIHeaders
make[3]: Leaving directory `/home/marcow/apps/delly/src/bamtools/build'
[  2%] Built target APIHeaders
make[3]: Entering directory `/home/marcow/apps/delly/src/bamtools/build'
make[3]: Leaving directory `/home/marcow/apps/delly/src/bamtools/build'
make[3]: Entering directory `/home/marcow/apps/delly/src/bamtools/build'
[  2%] Exporting AlgorithmsHeaders
make[3]: Leaving directory `/home/marcow/apps/delly/src/bamtools/build'
[  3%] Built target AlgorithmsHeaders
make[3]: Entering directory `/home/marcow/apps/delly/src/bamtools/build'
make[3]: Leaving directory `/home/marcow/apps/delly/src/bamtools/build'
[ 41%] Built target BamTools
make[3]: Entering directory `/home/marcow/apps/delly/src/bamtools/build'
make[3]: Leaving directory `/home/marcow/apps/delly/src/bamtools/build'
[ 79%] Built target BamTools-static
make[3]: Entering directory `/home/marcow/apps/delly/src/bamtools/build'
make[3]: Leaving directory `/home/marcow/apps/delly/src/bamtools/build'
[ 82%] Built target jsoncpp
make[3]: Entering directory `/home/marcow/apps/delly/src/bamtools/build'
make[3]: Leaving directory `/home/marcow/apps/delly/src/bamtools/build'
[ 86%] Built target BamTools-utils
make[3]: Entering directory `/home/marcow/apps/delly/src/bamtools/build'
make[3]: Leaving directory `/home/marcow/apps/delly/src/bamtools/build'
[100%] Built target bamtools_cmd
make[2]: Leaving directory `/home/marcow/apps/delly/src/bamtools/build'
make[1]: Leaving directory `/home/marcow/apps/delly/src/bamtools/build'
g++ -isystem /g/solexa/bin/software/boost_1_53_0/include -isystem /home/marcow/apps/delly/src/bamtools/include -isystem /home/marcow/apps/delly/src/htslib/htslib -pedantic -W -Wall -Wno-unknown-pragmas -DNOPENMP -O9 -DNDEBUG src/delly.cpp -o src/delly -L/g/solexa/bin/software/boost_1_53_0/lib -lboost_iostreams -lboost_filesystem -lboost_system -lboost_program_options -lboost_date_time -L/home/marcow/apps/delly/src/bamtools/lib -lbamtools -lz -Wl,-rpath,/home/marcow/apps/delly/src/bamtools/lib,-rpath,/g/solexa/bin/software/boost_1_53_0/lib
In file included from /usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/backward/hash_set:60,
                 from /usr/include/boost/graph/adjacency_list.hpp:25,
                 from src/delly.cpp:27:
/usr/lib/gcc/x86_64-redhat-linux/4.4.6/../../../../include/c++/4.4.6/backward/backward_warning.h:28:2: warning: #warning This file includes at least one deprecated or antiquated header which may be removed without further notice at a future date. Please use a non-deprecated interface with equivalent functionality instead. For a listing of replacement headers and interfaces, consult the file backward_warning.h. To disable this warning use -Wno-deprecated.
src/delly.cpp:38:50: error: boost/multiprecision/cpp_dec_float.hpp: No such file or directory
In file included from src/delly.cpp:65:
src/junction.h:32:44: error: boost/multiprecision/cpp_int.hpp: No such file or directory
In file included from src/delly.cpp:55:
src/util.h: In function 'void torali::getLibraryParams(const boost::filesystem::path&, TLibraryMap&, double, short unsigned int)':
src/util.h:157: error: 'const struct boost::filesystem::basic_path<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::filesystem::path_traits>' has no member named 'c_str'
In file included from src/delly.cpp:65:
src/junction.h: In function 'void torali::annotateJunctionReads(const TFiles&, const TGenome&, uint16_t, TSampleLibrary&, TSVs&, TCountMap&, torali::SVType<TTag>)':
src/junction.h:324: error: 'multiprecision' is not a member of 'boost'
src/junction.h:324: error: 'multiprecision' is not a member of 'boost'
src/junction.h:324: error: template argument 1 is invalid
src/junction.h:324: error: template argument 2 is invalid
src/junction.h:324: error: template argument 3 is invalid
src/junction.h:324: error: invalid type in declaration before ';' token
src/junction.h:327: error: expected initializer before '<' token
src/junction.h:328: error: 'TUniqueKmers' was not declared in this scope
src/junction.h:328: error: expected ';' before 'uniqueKmers'
src/junction.h:330: error: 'uniqueKmers' was not declared in this scope
src/junction.h:334: error: 'TUniqueKmers' is not a class or namespace
src/junction.h:334: error: expected ';' before 'itK'
src/junction.h:335: error: 'itK' was not declared in this scope
src/junction.h:347: error: expected ';' before 'uniqueRefKmers'
src/junction.h:350: error: 'uniqueRefKmers' was not declared in this scope
src/junction.h:355: error: 'TUniqueKmers' is not a class or namespace
src/junction.h:355: error: expected ';' before 'itK'
...

The headers that I do have:

$ ls /usr/include/boost/
accumulators/        bind.hpp                 concept_check/          dynamic_bitset.hpp           functional/                   integer_fwd.hpp        last_value.hpp    multi_index/                   parameter.hpp              property_tree/  regex.hpp                      smart_ptr.hpp        token_iterator.hpp  variant/
algorithm/           blank_fwd.hpp            concept_check.hpp       dynamic_property_map.hpp     functional.hpp                integer.hpp            lexical_cast.hpp  multi_index_container_fwd.hpp  pending/                   proto/          scoped_array.hpp               spirit/              tokenizer.hpp       variant.hpp
aligned_storage.hpp  blank.hpp                config/                 enable_shared_from_this.hpp  function_equal.hpp            integer_traits.hpp     limits.hpp        multi_index_container.hpp      pointee.hpp                ptr_container/  scoped_ptr.hpp                 spirit.hpp           tr1/                vector_property_map.hpp
any.hpp              call_traits.hpp          config.hpp              exception/                   function.hpp                  interprocess/          logic/            next_prior.hpp                 pointer_cast.hpp           python/         scope_exit.hpp                 statechart/          tuple/              version.hpp
archive/             cast.hpp                 crc.hpp                 exception.hpp                function_output_iterator.hpp  intrusive/             make_shared.hpp   noncopyable.hpp                pointer_to_other.hpp       python.hpp      serialization/                 static_assert.hpp    type.hpp            visit_each.hpp
array.hpp            cerrno.hpp               cregex.hpp              exception_ptr.hpp            function_types/               intrusive_ptr.hpp      math/             nondet_random.hpp              pool/                      random/         shared_array.hpp               strong_typedef.hpp   typeof/             wave/
asio/                checked_delete.hpp       cstdint.hpp             filesystem/                  fusion/                       io/                    math_fwd.hpp      none.hpp                       preprocessor/              random.hpp      shared_container_iterator.hpp  swap.hpp             type_traits/        wave.hpp
asio.hpp             circular_buffer/         cstdlib.hpp             filesystem.hpp               generator_iterator.hpp        io_fwd.hpp             mem_fn.hpp        none_t.hpp                     preprocessor.hpp           range/          shared_ptr.hpp                 system/              type_traits.hpp     weak_ptr.hpp
assert.hpp           circular_buffer_fwd.hpp  current_function.hpp    flyweight/                   get_pointer.hpp               iostreams/             memory_order.hpp  non_type.hpp                   program_options/           range.hpp       signal.hpp                     test/                units/              xpressive/
assign/              circular_buffer.hpp      date_time/              flyweight.hpp                gil/                          is_placeholder.hpp     mpi/              numeric/                       program_options.hpp        rational.hpp    signals/                       thread/              unordered/
assign.hpp           compatibility/           date_time.hpp           foreach.hpp                  graph/                        iterator/              mpi.hpp           operators.hpp                  progress.hpp               ref.hpp         signals2/                      thread.hpp           unordered_map.hpp
bimap/               compressed_pair.hpp      detail/                 format/                      implicit_cast.hpp             iterator_adaptors.hpp  mpl/              optional/                      property_map/              regex/          signals2.hpp                   throw_exception.hpp  unordered_set.hpp
bimap.hpp            concept/                 dynamic_bitset/         format.hpp                   indirect_reference.hpp        iterator.hpp           multi_array/      optional.hpp                   property_map.hpp           regex_fwd.hpp   signals.hpp                    timer.hpp            utility/
bind/                concept_archetype.hpp    dynamic_bitset_fwd.hpp  function/                    integer/                      lambda/                multi_array.hpp   parameter/                     property_map_iterator.hpp  regex.h         smart_ptr/                     token_functions.hpp  utility.hpp
$ ls /usr/lib64/libboost_*
/usr/lib64/libboost_date_time-mt.so     /usr/lib64/libboost_filesystem.so.5    /usr/lib64/libboost_iostreams.so              /usr/lib64/libboost_program_options-mt.so.5  /usr/lib64/libboost_regex-mt.so            /usr/lib64/libboost_serialization.so.5  /usr/lib64/libboost_system.so                    /usr/lib64/libboost_unit_test_framework.so.5
/usr/lib64/libboost_date_time-mt.so.5   /usr/lib64/libboost_graph-mt.so        /usr/lib64/libboost_iostreams.so.5            /usr/lib64/libboost_program_options.so       /usr/lib64/libboost_regex-mt.so.5          /usr/lib64/libboost_signals-mt.so       /usr/lib64/libboost_system.so.5                  /usr/lib64/libboost_wave-mt.so
/usr/lib64/libboost_date_time.so        /usr/lib64/libboost_graph-mt.so.5      /usr/lib64/libboost_prg_exec_monitor-mt.so    /usr/lib64/libboost_program_options.so.5     /usr/lib64/libboost_regex.so               /usr/lib64/libboost_signals-mt.so.5     /usr/lib64/libboost_thread-mt.so                 /usr/lib64/libboost_wave-mt.so.5
/usr/lib64/libboost_date_time.so.5      /usr/lib64/libboost_graph.so           /usr/lib64/libboost_prg_exec_monitor-mt.so.5  /usr/lib64/libboost_python-mt.so             /usr/lib64/libboost_regex.so.5             /usr/lib64/libboost_signals.so          /usr/lib64/libboost_thread-mt.so.5               /usr/lib64/libboost_wserialization-mt.so
/usr/lib64/libboost_filesystem-mt.so    /usr/lib64/libboost_graph.so.5         /usr/lib64/libboost_prg_exec_monitor.so       /usr/lib64/libboost_python-mt.so.5           /usr/lib64/libboost_serialization-mt.so    /usr/lib64/libboost_signals.so.5        /usr/lib64/libboost_unit_test_framework-mt.so    /usr/lib64/libboost_wserialization-mt.so.5
/usr/lib64/libboost_filesystem-mt.so.5  /usr/lib64/libboost_iostreams-mt.so    /usr/lib64/libboost_prg_exec_monitor.so.5     /usr/lib64/libboost_python.so                /usr/lib64/libboost_serialization-mt.so.5  /usr/lib64/libboost_system-mt.so        /usr/lib64/libboost_unit_test_framework-mt.so.5  /usr/lib64/libboost_wserialization.so
/usr/lib64/libboost_filesystem.so       /usr/lib64/libboost_iostreams-mt.so.5  /usr/lib64/libboost_program_options-mt.so     /usr/lib64/libboost_python.so.5              /usr/lib64/libboost_serialization.so       /usr/lib64/libboost_system-mt.so.5      /usr/lib64/libboost_unit_test_framework.so       /usr/lib64/libboost_wserialization.so.5

Run for a large number of WGS samples

Hi Tobias,

Thanks for the nice tool. I tried to post this question on the google group but cannot get permission for that, so leave a question here.

I plan to run a large number of ~70 WGS trios using Delly. When I ran them with chr22 bam (using multi thread option), it worked properly. But when it comes to the entire genome, I don't have any idea for computational requirement. Also, I am not too sure calling accuracy between multiple samples vs. single sample run. Could I get your advice? Or can I probably run them by trios, or subset of samples?

SV type arguments

Hi Tobias,

Do we still have to run each SV types analyses seperately with the latest version of DELLY.

Thanks

Raj

Question on Delly's Inversion detection

Hello Delly developers,

I am trying to detect if there are any large inversions (around 1 mbp to 10 mbp) that may have resulted from large deletions. I am relatively new to command-line usage. So far, I have tried other SV prediction software such as pindel, lumpy and sv-detect, but they have only detected inversions of less than 200 bp. I am not sure if it is because of the Alignment or SV-prediction softwares.

Our mate-pair library has mean insert size of around 2,000 - 3000 bp, and I am attempting to detect a very large inversion ( 2,000,000 bp +).

My question is, would Delly be able to detect an inversion that large?
I am not sure if split-read analysis would be able to predict such a large inversion, or if the algorithm would work well with larger insert sizes.

Thanks!

Germline filtering gives no calls

I used delly filter to filter calls, it turns out there is no call was output. This is the command I used:

delly filter -t DEL -f germline -o a.bcf -g hg.fa rawcalls.bcf

There are calls in the rawcalls.bcf.

Any idea? Thanks.

dockerfile problem version chardet==2.0.1

hello,
When use the dockerfile, i have an error with pip install chardet==2.0.1 in delly/complexVariants/requirements.txt.
pip can't find this version:

" Could not find a version that satisfies the requirement chardet==2.0.1 (from -r delly/complexVariants/requirements.txt (line 3)) (from versions: 1.0, 1.0.1, 1.1, 2.1.1, 2.2.1, 2.3.0)
No matching distribution found for chardet==2.0.1 (from -r delly/complexVariants/requirements.txt (line 3))
You are using pip version 8.1.1, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command."

I am trying to use another version of chardet as 2.1.1 but i have another error.
perhaps upgrading pip ?

Cygwin support for bcftools

Hello,

Had an error of "undefined reference" in bcftools:

/home/UCSF YU LAB/coolstuff/delly/src/bcftools/version.c:35: undefined reference to `hts_version'
/home/UCSF YU LAB/coolstuff/delly/src/bcftools/version.c:35:(.text+0x13): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `hts_version'
/tmp/ccggD26L.o: In function `init':
/home/UCSF YU LAB/coolstuff/delly/src/bcftools/plugins/fill-AN-AC.c:43: undefined reference to `bcf_hdr_append'
/home/UCSF YU LAB/coolstuff/delly/src/bcftools/plugins/fill-AN-AC.c:43:(.text+0x2d): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `bcf_hdr_                                              append'
/home/UCSF YU LAB/coolstuff/delly/src/bcftools/plugins/fill-AN-AC.c:44: undefined reference to `bcf_hdr_append'
/home/UCSF YU LAB/coolstuff/delly/src/bcftools/plugins/fill-AN-AC.c:44:(.text+0x40): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `bcf_hdr_                                              append'
/tmp/ccggD26L.o: In function `process':
/home/UCSF YU LAB/coolstuff/delly/src/bcftools/plugins/fill-AN-AC.c:51: undefined reference to `bcf_calc_ac'
/home/UCSF YU LAB/coolstuff/delly/src/bcftools/plugins/fill-AN-AC.c:51:(.text+0x80): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `bcf_calc                                              _ac'
/home/UCSF YU LAB/coolstuff/delly/src/bcftools/plugins/fill-AN-AC.c:56: undefined reference to `bcf_update_info'
/home/UCSF YU LAB/coolstuff/delly/src/bcftools/plugins/fill-AN-AC.c:56:(.text+0xe7): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `bcf_upda                                              te_info'
/home/UCSF YU LAB/coolstuff/delly/src/bcftools/plugins/fill-AN-AC.c:57: undefined reference to `bcf_update_info'
/home/UCSF YU LAB/coolstuff/delly/src/bcftools/plugins/fill-AN-AC.c:57:(.text+0x11b): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `bcf_upd                                              ate_info'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:125: plugins/fill-AN-AC.so] Error 1
make[1]: Leaving directory '/home/UCSF YU LAB/coolstuff/delly/src/bcftools'
make: *** [Makefile:52: .bcftools] Error 2

I found another thread with the same problem, samtools/bcftools#377.
I think maybe the bcftool plugins cannot be built on Cygwin?

On the same thread there is a commit that added Cygwin dll support onto htslib:
samtools/htslib@2cddfb3

I am not a programmer, and I need some guidance on how I could implement that commit onto my delly files to get it working?

Thanks for your time!

-Odoland

Question: how to run the delly_parallel version

Hi,
Originally, I thought it should be run like this:
export OMP_NUM_THREADS=3
mpirun -n 3 $path/delly_parallel -t ...

Note: this will use 3 processes and each process has multiple threads (dynamic). I'm not sure the processes were duplicated or real in parallel

or just like this:
export OMP_NUM_THREADS=3
$path/delly_parallel -t ...

I'm confused. Many thanks in advance.

delly returns 1 when calling help

Hello Tobias,

delly returns 1 when the help is called (my expecation is 0). I need this feature for an automated sanity check when delly is installed.

Thanks!
Oliver

$ delly -?; echo $?
**********************************************************************
Program: Delly
This is free software, and you are welcome to redistribute it under
certain conditions (GPL); for license details use '-l'.
This program comes with ABSOLUTELY NO WARRANTY; for details use '-w'.

Delly (Version: 0.7.1)
Contact: Tobias Rausch ([email protected])
**********************************************************************

Usage: delly [OPTIONS] -g <ref.fa> <sample1.sort.bam> <sample2.sort.bam> ...

Generic options:
  -? [ --help ]                       show help message
  -t [ --type ] arg (=DEL)            SV type (DEL, DUP, INV, TRA, INS)
  -o [ --outfile ] arg (="sv.vcf")    SV output file
  -x [ --exclude ] arg (="")          file with chr to exclude

PE options:
  -q [ --map-qual ] arg (=1)          min. paired-end mapping quality
  -s [ --mad-cutoff ] arg (=9)        insert size cutoff, median+s*MAD 
                                      (deletions only)
  -f [ --flanking ] arg (=90)         quality of the consensus alignment

SR options:
  -g [ --genome ] arg                 genome fasta file
  -m [ --min-flank ] arg (=13)        minimum flanking sequence size
  -n [ --noindels ]                   no small InDel calling
  -i [ --indelsize ] arg (=500)       max. small InDel size

Genotyping options:
  -v [ --vcfgeno ] arg (="site.vcf")  input vcf file for genotyping only
  -u [ --geno-qual ] arg (=5)         min. mapping quality for genotyping

1

Trouble getting Delly to find obvious deletion in test data

I generated a simple test data set where I introduced a 1kb deletion and simulated paired-end read library prep and sequencing using simNGS, following by alignment using bwa:
https://www.dropbox.com/sh/308kqoihum4ccwu/BJcVBq-kWa

I ran Delly against the data, but no deletions were reported. The exact command used:

delly -g output.fa -t DEL sorted.bam

As a sanity check, I tried using Pindel and was able to successfully find the deletion.

I've played around with other Delly parameters but was unsuccessful in finding a working set.

Delly 0.5.3 segfaults on TRA

I have problems getting Delly to analyse my sample, with type "TRA" :

<SNIP> (I have removed the output above, as that doesn't seem to relate to the problem)
[pid 22642] read(4, "GCACTTTCTATCTC\nGGAAACCTTTCAACTAA"..., 16384) = 16384
[pid 22642] read(4, "GAAAGTTGGAGTTTTTCAGCGTTTGCGTTCCA"..., 16384) = 16384
[pid 22642] write(1, "*", 1*)            = 1
[pid 22642] write(1, "*", 1*)            = 1
[pid 22642] write(1, "*", 1*)            = 1
[pid 22642] futex(0x2b1cbaa451b0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 22642] write(2, "terminate called after throwing "..., 48terminate called after throwing an instance of ') = 48
[pid 22642] write(2, "std::length_error", 17std::length_error) = 17
[pid 22642] write(2, "'\n", 2'
)          = 2
[pid 22642] write(2, "  what():  ", 11  what():  ) = 11
[pid 22642] write(2, "basic_string::_S_create", 23basic_string::_S_create) = 23
[pid 22642] write(2, "\n", 1
)           = 1
[pid 22642] rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
[pid 22642] tgkill(22642, 22642, SIGABRT) = 0
[pid 22642] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=22642, si_uid=1005} ---
[pid 22642] +++ killed by SIGABRT (core dumped) +++
<... wait4 resumed> [{WIFSIGNALED(s) && WTERMSIG(s) == SIGABRT && WCOREDUMP(s)}], 0, NULL) = 22642
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2570, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2af29506d000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2570
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [CHLD], 8) = 0
brk(0x214b000)                          = 0x214b000
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [CHLD], 8) = 0
brk(0x214c000)                          = 0x214c000
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
read(3, "", 4096)                       = 0
close(3)                                = 0
munmap(0x2af29506d000, 4096)            = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
open("/usr/share/locale/en_US/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/en_US/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/en/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
open("/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, "./run_delly.sh: line 29: 22642 A"..., 152./run_delly.sh: line 29: 22642 Aborted                 (core dumped) delly --type  ${type} --outfile ${type}.vcf --genome $REFERENCE ${LOCATION}/${GLOB}
) = 152
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=22642, si_status=SIGABRT, si_utime=1291884, si_stime=122927} ---
wait4(-1, 0x7fff1a1ef7d8, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn()                          = 0
rt_sigaction(SIGINT, {0x456570, [], SA_RESTORER, 0x2af2956c9ff0}, {0x43f800, [], SA_RESTORER, 0x2af2956c9ff0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
kill(0, SIGTERMProcess 22618 detached
 <detached ...>
Terminated

The script I used to generate this :

 cat run_delly.sh
#!/bin/bash

# Make the script stop if something goes wrong
set -o errexit
# Kill all child processes on exit (see http://stackoverflow.com/questions/360201/kill-background-process-when-shell-script-exit )
# This doesn't remove the temporary directory we create below.
trap "kill 0" SIGINT SIGTERM EXIT # with gnu paralell this doesnt work properly, too early a sigterm is recieved.

# Debugging
#set -o verbose
#set -o xtrace

#settings and binaries
threads=40

mkdir -p /mnt/nexenta/haars001/projects/yeast_evolution/delly_running && cd $_
export LD_LIBRARY_PATH=/mnt/nexenta/haars001/projects/yeast_evolution/delly/bamtools/lib:$LD_LIBRARY_PATH
export PATH=/mnt/nexenta/haars001/projects/yeast_evolution/delly/delly/src:$PATH
export OMP_NUM_THREADS=$threads

LOCATION=/mnt/nexenta/haars001/projects/yeast_evolution/mapping
REFERENCE=${LOCATION}/S288C_chromosomes_reference_genome_Current_Release.fasta
EXTENSION='.sam.gz.bam.rocksort.bam'
GLOB='Sample_*'${EXTENSION}
SV_TYPE="DEL DUP INV TRA"
SV_TYPE="TRA"

# Loop through different types
for type in $SV_TYPE
do
    echo $type
    delly --type ${type} --outfile ${type}.vcf  --genome $REFERENCE ${LOCATION}/${GLOB}
done

The version I used was compiled with debug from this commit:

git log | head
commit e513dcaeed79cf7cd448e7471633b6d6baf01f72
Merge: 04b6a22 3772d4c
Author: tobiasrausch <[email protected]>
Date:   Fri May 9 10:11:10 2014 +0200

    Merge pull request #5 from jvhaarst/patch-1

    Point to Heng Li's Github repository of seqtk.

commit 04b6a22f7d64193adb3418ecf0f5aa2097d34ef4

If you need me to run other tests, feel free to ask, I would really like to run Delly on my data.

Compilation problem: getStrandSpecificOrientation doubly defined

Hi,

I'm trying to compile Delly latest commit (july 25) on Linux, gcc 4.7.1. The following error happens.

[rxc48@cyberstar21 delly]$ make clean
rm -f src/delly src/iover src/cov src/spancov src/iMerge src/extract  src/delly.o src/iover.o src/cov.o src/spancov.o src/iMerge.o src/extract.o
[rxc48@cyberstar21 delly]$ make
g++ -I/gpfs/cyberstar/p/pzm11/nobackup/rxc48/.linuxbrew/include -L/gpfs/cyberstar/p/pzm11/nobackup/rxc48/.linuxbrew/lib -I/gpfs/home/rxc48/include -L/gpfs/home/rxc48/lib -isystem /usr/global/intel-13/boost/1.54.0/include -isystem /gpfs/home/rxc48/tools/bamtools//include -isystem /gpfs/home/rxc48/tools/kseq -pedantic -W -Wall -Wno-unknown-pragmas -DNOPENMP -O9 -DNDEBUG src/delly.cpp -o src/delly -L/gpfs/cyberstar/p/pzm11/nobackup/rxc48/.linuxbrew/lib  -L/gpfs/home/rxc48/lib -L/usr/global/intel-13/boost/1.54.0/lib -lboost_iostreams -lboost_filesystem -lboost_system -lboost_program_options -lboost_date_time -L/gpfs/home/rxc48/tools/bamtools//lib -lbamtools -lz
In file included from src/align_gotoh.h:32:0,
                 from src/delly.cpp:58:
src/record.h:467:9: error: redefinition of ‘template<class TRecord> int torali::getStrandSpecificOrientation(const TRecord&)’
In file included from src/util.h:28:0,
                 from src/delly.cpp:55:
src/tags.h:90:5: error: ‘template<class TBamRecord> int torali::getStrandSpecificOrientation(const TBamRecord&)’ previously declared here
In file included from src/delly.cpp:65:0:
src/junction.h: In function ‘int ks_getuntil(kstream_t*, int, kstring_t*, int*)’:
src/junction.h:39:1: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
make: *** [src/delly] Error 1

Indeed, it seems that `getStrandSpecificOrientation`` is defined in two headers, with different template names.

Faulty Makefile?

I pulled the latest version today, but it won't build from source.

I think the issue is that bootstrap.sh is listed in the Makefile but doesn't seem to exist.

suggestion: adding dependencies documentation

Dear Tobias,

just a suggestion.

maybee you can add in the documentatin that the python helper scripts relies on python module numpy, pyvcf and banyan and that >= python-2.7 is needed (2.6 can be used with the ordereddict module installed)

regards

Eric

Delly install

I'm trying to install delly and I keep having this error

g++ -isystem /Users/julieballard/Downloads/delly/delly/src/modular-boost -isystem /Users/julieballard/Downloads/delly/delly/src/bamtools/include -isystem /Users/julieballard/Downloads/delly/delly/src/htslib/htslib -pedantic -W -Wall -Wno-unknown-pragmas -DNOPENMP -O9 -DNDEBUG src/delly.cpp -o src/delly -L/Users/julieballard/Downloads/delly/delly/src/modular-boost/stage/lib -lboost_iostreams -lboost_filesystem -lboost_system -lboost_program_options -lboost_date_time -L/Users/julieballard/Downloads/delly/delly/src/bamtools/lib -lbamtools -lz -Wl,-rpath,/Users/julieballard/Downloads/delly/delly/src/bamtools/lib,-rpath,/Users/julieballard/Downloads/delly/delly/src/modular-boost/stage/lib
error: invalid value '9' in '-O9'
make: *** [src/delly] Error 1

any clue?
thanks
yosr

Inversion in VCF file

I see entries indicating an inversion in the INV.vcf file where the filter is set to PASS. However, the genotype is also set to 0/0 for this site.

How should the output for these type of sites be interpreted? Is there evidence for an inversion in this individual or not? Why is there a line generated if there is no genotype called for the inversion.

Here is an excerpt of an example line.

PASS PRECISE;CIEND=-49,49;CIPOS=-49,49;SVTYPE=INV;SVMETHOD=EMBL.DELLY;END=44336073;SVLEN=670954;CT=5to5;PE=3;MAPQ=37;SR=7;SRQ=0.824324;CONSENSUS=GGGGGGGTGGGAATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGAGATAGAGTCTCACTTGGTCACCCAGGCTGGAGTGCAGTTGCGCAATCTTGGCTCACTGCA GT:GL:GQ:FT:RC:DR:DV 0/0:0,-9.30627,-108.074:93:PASS:295705:31:0

John

Confusing about CT

INFO=<ID=CT,Number=1,Type=String,Description="Paired-end signature induced connection type">

I am using delly, however I am confusing about the argument ‘CT’---3to3,3to5,5to3,5to5

I have read the reference and I know that it is about in order&invert, However I also confuse about that.

can anyone give me a specific concept about these 4 types?(where the 2 reads aligned? the orientation of the reads? if we determined the 3 relevant breakpoints in reference sequence as From_start、From_end(=From_start+length)、To, which two breakpoints is corresponding to which type? )

Thanks

Two different REF allele for same position

Delly generated the following calls from two different bam files in 2 different runs. The same reference was used for both runs. The issue is the two vcf files actually ended up with 2 different reference alleles for the same position (which should not happen). Delly version: v0.7.5 is being used.

1       724817  DEL00000000     A       <DEL>   0       PASS    PRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv0.7.5;CHR2=1;END=224200118;INSLEN=0;HOMLEN=53;PE=3;MAPQ=37;CT=3to5;CIPOS=-53,53;CIEND=-53,53;SR=8;SRQ=0.990476;CONSENSUS=ATGCAATGTAATGGACTCGAATGGAACGGAATGGAATGGACAAGAATTGAATTGAATGGACTGGAATGGAATGGAATGGAATGCAATGGAATGCACTCGAACGGA;CE=1.86058    GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV   0/1:-19.0868,0,-13.3173:133:PASS:0:61331624:0:-1:34:3:5:9
1       724817  DEL00000000     T       <DEL>   0       PASS    PRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv0.7.5;CHR2=1;END=224200118;INSLEN=0;HOMLEN=53;PE=2;MAPQ=37;CT=3to5;CIPOS=-53,53;CIEND=-53,53;SR=7;SRQ=0.990741;CONSENSUS=CAATGTAATGGACTCGAATGGAACGGAATGGAATGGACAAGAATTGAATTGAATGGACTGGAATGGAATGGAATGGAATGCAATGGAATGCACTCGAACGGATTGGAA;CE=1.85864 GT:GL:GQ:FT:RCL:RC:RCR:CN:DR:DV:RR:RV   1/1:-18.8828,-2.08997,0:21:PASS:0:79611714:0:-1:52:2:0:7

BCF output samplename

I'm using version:

delly_v0.7.3_linux_x86_64bit

The samplename reported in the BCF/VCF is based on the filename of the inputBAM. This is not the desired name. I'd expect the sample name would be derived from:

@RG ID:70001-5 PU:na LB:9834643 SM:700014 PL:illumina

The SM field?

requiring very large mem for INS

Dear Delly Developers,

I've been trying to use your tool to call variants. It works well for DEL DUP INV and TRA. However, I run into problems when I do INS. It consumes more than 64G memory. Is it expected? I am running it as below

.src/delly -t ${sv}
-o ${out}
-g ${ref}
-x ${exclude}
${normal1}
${normal2}
${normal3}
${normal4}
${tumour}

Thanks very much,
Zhihao

Algorithm of read-depth filter function in Delly

I have a couple of question about the newly-added filter function for germline SV in Delly.

  1. What's the algorithm for RD filter? (Gaussian Mixture Model was applied to model the read-depth distribution and assign copy number states? This is what I found from 1000Genome Project supplementary).
    And what information you are retrieving? Seems filter function only relies on vcf output file, NOT bam; so you must retrieve information only from vcf file
  2. The filter is conducted sample by sample? I mean filtering for patient A is independent from patient B, if I called SV for two patients within one vcf file?
  3. For my sample, the raw Delly-deletion gave me 7000 calls, but after "filter", only 400 are remained. Is that normal?
  4. If I think the filtering is too stringent, what parameters would you recommend to tweak? And I found:
    -e [ --rddel ] arg (=0.800000012) max. read-depth ratio of carrier vs. non-carrier for a deletion

Could you please explain a little bit what's "carrier" and "non-carrier" here?

Many many thanks!

suggestion :: test suite

dear Tobias,

would it be possible to embed a test suite in the delly distribution ?

this will allow to check if compilation and installation went smoothly

regards

Eric

#include-ing htslib headers

HTSlib's headers and installation scripts are set up so that the public headers are all installed into an htslib subdirectory of your include directory. The intention is that source code should say e.g.

#include "htslib/kseq.h"

and then if HTSlib headers are installed in /usr/include/htslib/kseq.h you won't need any -I or -isystem compiler options at all.

It would be easier to build delly against an already-installed HTSlib if delly followed this convention too. This would just mean changing the makefile to point to HTSlib's top-level root:

diff --git a/Makefile b/Makefile
-SEQTK_ROOT    = ${PWD}/src/htslib/htslib
+SEQTK_ROOT    = ${PWD}/src/htslib

and changing src/extract.h and src/junction.h to #include "htslib/kseq.h" instead.


Alternatively, if kseq.h is the only part of HTSlib that you plan to use, I would suggest just copying kseq.h from its original attractivechaos/klib into the delly source tree (this is how klib is designed to be used) and avoiding the HTSlib dependency entirely.

segmentation fault

Hi

When I tried to run the latest version (cloned the git repo) of delly with 3 bacterial genomes, I got a segmentation fault. This isn't a very large genome ~ 6MB and the BAM files ~ 1GB

delly-0.1.2 -t DEL -o Exp6_LJF.del.vcf -g RefSeq-Pseudomonas_aeruginosa_PA01.genomic.fna Exp6_LJF_ampXPA01.merged.sorted.bam Exp6_LJF_B5a.merged.sorted.bam Exp6_LJF_A4a.merged.sorted.bam

[2013-Dec-07 14:54:19] Paired-end clustering

0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|


Segmentation fault (core dumped)

Here's some specs on the machine running it..
RedHat 6.4, x86_64, 24 CPUs, 48 GB of RAM

cheers
steve

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.