GithubHelp home page GithubHelp logo

comparison to svtyper about paragraph HOT 10 CLOSED

illumina avatar illumina commented on July 19, 2024
comparison to svtyper

from paragraph.

Comments (10)

brentp avatar brentp commented on July 19, 2024 1

hi, thanks for the response, is a VCF of the LRGT set available?
I don't see svtyper deletions on GiaB set in the supplement, only Manta and Paragraph in table S2.

on lumpy output (where svtyper genotypes result in ~90% precision and 84% recall), paragraphs gives only 20% recall and 98% precision.

I will likely leave this for now, feel free to close the issue. It would have saved me some time to know that paragraph currently won't work with lumpy output (due to lack of precision in break-points?)

from paragraph.

brentp avatar brentp commented on July 19, 2024

and here is the source of add_ci to get around svtyper requiring CIPOS and CIEND and a header parsing bug in svtyper:

import hts/vcf

var ivcf:VCF
var ovcf:VCF

if not open(ivcf, "/dev/stdin"):
  quit "bad"

doAssert ivcf.header.add_info("CIPOS", "2", "Integer", "ci") == Status.OK
doAssert ivcf.header.add_info("CIEND", "2", "Integer", "ci") == Status.OK
doAssert ivcf.header.add_info("CIPOS95", "2", "Integer", "ci") == Status.OK
doAssert ivcf.header.add_info("CIEND95", "2", "Integer", "ci") == Status.OK
doAssert ivcf.header.remove_info("MultiTechExact") == Status.OK
doAssert ivcf.header.remove_info("MultiTech") == Status.OK
doAssert ivcf.header.remove_info("DistPASSHG2gt49Minlt1000") == Status.OK


if not open(ovcf, "with-ci.vcf", mode="w"):
  quit "bad"


ovcf.copy_header(ivcf.header)
doAssert ovcf.write_header()

var ci = @[-2'i32, 2]
for v in ivcf:
  doAssert v.info.set("CIPOS", ci) == Status.OK
  doAssert v.info.set("CIEND", ci) == Status.OK
  doAssert v.info.set("CIPOS95", ci) == Status.OK
  doAssert v.info.set("CIEND95", ci) == Status.OK
  discard v.info.delete("MultiTech")
  discard v.info.delete("MultiTechExact")
  discard v.info.delete("DistPASSHG2gt49Minlt1000")
  doAssert ovcf.write_variant(v)

ovcf.close()

from paragraph.

traxexx avatar traxexx commented on July 19, 2024

Hi Brent,

Thanks for looking into this!

For the recall of Paragraph, we got similar results to yours on NIST truth set
v0.6 (see supplementary materials). NIST truth set is not guaranteed
to be breakpoint accurate, so some Paragraph FNs are likely due to inaccurate
breakpoints.

We also observed that svtyper has a good recall for >300bp deletions (Fig 2a)
but for smaller deletions its performance appears to drop sharply. Since a large
fraction of deletions is smaller than 300bp (Fig 2b) we get a much lower
estimate for the overall recall (Table 1).

I'd say the test set matters too. NIST tier1 contains only confident regions of
the genome. And deletion size is important when making such comparisons.

And we indeed added CIPOS & CIEND when evaluating svtyper.

from paragraph.

fritzsedlazeck avatar fritzsedlazeck commented on July 19, 2024

Hey @brentp
thanks for looking at this. Always interesting to see how things run in other peoples hands.
Our calls are here: https://github.com/Illumina/paragraph/blob/master/data/download-instructions.txt

Would be interesting to know why the calls from Lumpy are showing reduced performance.
Thanks
Fritz

from paragraph.

brentp avatar brentp commented on July 19, 2024

hi Fritz, thanks for the reply. If you make the cram/bam that you used available, I'll be glad to retry the evaluation on those variants+alignments.

from paragraph.

fritzsedlazeck avatar fritzsedlazeck commented on July 19, 2024

you mean the Pacbio data?
Or the illumina reads?

from paragraph.

brentp avatar brentp commented on July 19, 2024

I mean the illumina hg002 ~35X sample.

from paragraph.

traxexx avatar traxexx commented on July 19, 2024

@brentp Yes, HG002 Long-read ground truth is available in data/ directory. min event length = 30bp. del+ins+inv+dup. Note that in the paper we only used 50~10kbp del & ins events for evaluation.

And in Table S2 we only tested Manta & Paragraph. We didn't test everything since such comparison has been done on LRGT.

It's interesting to see such a different recall on lumpy calls. We never tested our method on lumpy calls before. For now, I guess it's mostly because of breakpoints, as the PE method is unlikely to achieve base-pair accuracy. But we're going to double check.

from paragraph.

fritzsedlazeck avatar fritzsedlazeck commented on July 19, 2024

@traxexx I think he is asking for the Illumina reads that we used. I don't know where they are currently hosted.

from paragraph.

traxexx avatar traxexx commented on July 19, 2024

@brentp that's not public yet. We'll finally make it public. For now please send me an email at [email protected] and I'll share the bam with you via Basespace.

from paragraph.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.