I use svtyper in smoove . After reading

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

comparison to svtyper about paragraph HOT 10 CLOSED

illumina commented on July 19, 2024

comparison to svtyper

from paragraph.

Comments (10)

brentp commented on July 19, 2024 1

hi, thanks for the response, is a VCF of the LRGT set available?
I don't see svtyper deletions on GiaB set in the supplement, only Manta and Paragraph in table S2.

on lumpy output (where svtyper genotypes result in ~90% precision and 84% recall), paragraphs gives only 20% recall and 98% precision.

I will likely leave this for now, feel free to close the issue. It would have saved me some time to know that paragraph currently won't work with lumpy output (due to lack of precision in break-points?)

from paragraph.

brentp commented on July 19, 2024

and here is the source of add_ci to get around svtyper requiring CIPOS and CIEND and a header parsing bug in svtyper:

import hts/vcf

var ivcf:VCF
var ovcf:VCF

if not open(ivcf, "/dev/stdin"):
  quit "bad"

doAssert ivcf.header.add_info("CIPOS", "2", "Integer", "ci") == Status.OK
doAssert ivcf.header.add_info("CIEND", "2", "Integer", "ci") == Status.OK
doAssert ivcf.header.add_info("CIPOS95", "2", "Integer", "ci") == Status.OK
doAssert ivcf.header.add_info("CIEND95", "2", "Integer", "ci") == Status.OK
doAssert ivcf.header.remove_info("MultiTechExact") == Status.OK
doAssert ivcf.header.remove_info("MultiTech") == Status.OK
doAssert ivcf.header.remove_info("DistPASSHG2gt49Minlt1000") == Status.OK


if not open(ovcf, "with-ci.vcf", mode="w"):
  quit "bad"


ovcf.copy_header(ivcf.header)
doAssert ovcf.write_header()

var ci = @[-2'i32, 2]
for v in ivcf:
  doAssert v.info.set("CIPOS", ci) == Status.OK
  doAssert v.info.set("CIEND", ci) == Status.OK
  doAssert v.info.set("CIPOS95", ci) == Status.OK
  doAssert v.info.set("CIEND95", ci) == Status.OK
  discard v.info.delete("MultiTech")
  discard v.info.delete("MultiTechExact")
  discard v.info.delete("DistPASSHG2gt49Minlt1000")
  doAssert ovcf.write_variant(v)

ovcf.close()

from paragraph.

traxexx commented on July 19, 2024

Hi Brent,

Thanks for looking into this!

For the recall of Paragraph, we got similar results to yours on NIST truth set
v0.6 (see supplementary materials). NIST truth set is not guaranteed
to be breakpoint accurate, so some Paragraph FNs are likely due to inaccurate
breakpoints.

We also observed that svtyper has a good recall for >300bp deletions (Fig 2a)
but for smaller deletions its performance appears to drop sharply. Since a large
fraction of deletions is smaller than 300bp (Fig 2b) we get a much lower
estimate for the overall recall (Table 1).

I'd say the test set matters too. NIST tier1 contains only confident regions of
the genome. And deletion size is important when making such comparisons.

And we indeed added CIPOS & CIEND when evaluating svtyper.

from paragraph.

fritzsedlazeck commented on July 19, 2024

Hey @brentp
thanks for looking at this. Always interesting to see how things run in other peoples hands.
Our calls are here: https://github.com/Illumina/paragraph/blob/master/data/download-instructions.txt

Would be interesting to know why the calls from Lumpy are showing reduced performance.
Thanks
Fritz

from paragraph.

brentp commented on July 19, 2024

hi Fritz, thanks for the reply. If you make the cram/bam that you used available, I'll be glad to retry the evaluation on those variants+alignments.

from paragraph.

fritzsedlazeck commented on July 19, 2024

you mean the Pacbio data?
Or the illumina reads?

from paragraph.

brentp commented on July 19, 2024

I mean the illumina hg002 ~35X sample.

from paragraph.

traxexx commented on July 19, 2024

@brentp Yes, HG002 Long-read ground truth is available in data/ directory. min event length = 30bp. del+ins+inv+dup. Note that in the paper we only used 50~10kbp del & ins events for evaluation.

And in Table S2 we only tested Manta & Paragraph. We didn't test everything since such comparison has been done on LRGT.

It's interesting to see such a different recall on lumpy calls. We never tested our method on lumpy calls before. For now, I guess it's mostly because of breakpoints, as the PE method is unlikely to achieve base-pair accuracy. But we're going to double check.

from paragraph.

fritzsedlazeck commented on July 19, 2024

@traxexx I think he is asking for the Illumina reads that we used. I don't know where they are currently hosted.

from paragraph.

traxexx commented on July 19, 2024

@brentp that's not public yet. We'll finally make it public. For now please send me an email at [email protected] and I'll share the bam with you via Basespace.

from paragraph.

comparison to svtyper about paragraph HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs