GithubHelp home page GithubHelp logo

moka-guys / dnanexus_happy Goto Github PK

View Code? Open in Web Editor NEW
0.0 5.0 0.0 47 KB

DNAnexus app comparing a query VCF to a truth VCF to calculate performance metrics including sensitivity and precision using hap.py and vcfeval

Shell 89.61% Dockerfile 10.39%

dnanexus_happy's Introduction

vcfeval_hap.py

hap.py version

v0.3.9 (Docker: https://hub.docker.com/r/pkrusche/hap.py/)

What does this app do?

Compares a query VCF to a truth VCF to calculate performance metrics including sensitivity and precision using hap.py and vcfeval. It is equivalent to running the precisionFDA GA4GH benchmarking app in 'vcfeval-partialcredit' mode with other options left as default. More information available at the following links:

What are typical use cases for this app?

Validating an NGS workflow using the NA12878 (NIST Genome in a Bottle) benchmarking sample.

What data are required for this app to run?

Input files:

  1. A query VCF (.vcf | .vcf.gz) - output from the workflow being validated
  2. A truth VCF (.vcf | .vcf.gz)
  3. A panel BED file (.bed) - region covered in query vcf
  4. A high confidence region BED file (.bed) - high confidence region for truth set

Parameters:

  1. Skip - default = false. If set to true will exit without performing any analysis
  2. Output files prefix (required)
  3. Output folder (optional)
  4. Indication if additional stratification for NA12878 samples should be performed (default = False)
    • If truth set is NA12878, additional stratification of results can be performed and output in extended.csv file
    • HOWEVER the instance type will need to be upgraded to have at least 7GB of RAM, and the app will take significantly longer to run
  5. Reference Genome build GRCh37 (default) or GRCh38

Note:

  • The BED file names must not contain spaces or characters such as + and -

What does this app output?

This app outputs:

  1. Summary csv file containing separate performance metrics for SNPs and Indels
  2. Summary report HTML (generated using ga4gh rep.py https://github.com/ga4gh/benchmarking-tools/tree/master/reporting/basic)
  3. Detailed results folder containing:
    • Extended csv file - Including results stratification and confidence intervals
    • VCF file - annotated vcf showing TP, FP and FN variants
    • runinfo JSON - detailed information about hap.py run
    • version log - version numbers of software used in app
    • metrics JSON - JSON file containing all computed metrics and tables

How does this app work?

What are the limitations of this app

  • Only works with inputs mapped to GRCh37 or GRCh38

This app was made by Viapath Genome Informatics

dnanexus_happy's People

Contributors

andyb3 avatar graeme-smith avatar amyamelia avatar rachelduffin avatar rebeccahaines1 avatar woook avatar

Watchers

James Cloos avatar  avatar Aled Jones avatar  avatar  avatar

dnanexus_happy's Issues

Get rid of /work directory and symbolic home links

In original app provided to us all inputs are transfered to a /work directory under root, and then a symbolic link is set up to redirect home/in to work/in, and the HOME variable is set to the work folder:

#Move inputs into 'work' folder and create symbolic links mkdir /work cd /work export HOME=/work mv /home/dnanexus/in in ln -sf /work/in /home/dnanexus/in

Never quite understood why this was done like this, possibly to do with installation of hap.py. Now hap.py is run in docker remove these steps to make code less confusing.

precsionFDA version of hap.py ignores X chromosome

This causes problems particularly with our indel data which is largely on the X chromosome

From precisionFDA site:

It should be noted that chromosome X is not included in the confident regions of the GiaB truth data for HG002, so the comparisons have not evaluated that chromosome.

upgrade os

add:
"distribution" : "Ubuntu",
"release":"14.04",
to the runspec section of dxapp.json

Bug when assessing identical files

DNAnexus project 003_170713_GATKReproducibility

Query VCF: 3_WES3_Test2_48_136819_PN_WES_3_S2_R1_001.vcf.gz
Truth VCF: 1_WES3_Test2_48_136819_PN_WES_3_S2_R1_001.vcf.gz

High conf & panel BED files: Agilent WES

If this is run through PrecisionFDA vcfeval -> 100% sensitivity
If run through DNAnexus VCFeval_happy -> 2 FP
If run through DNAnexus VCFeval -> 22 FP

Running a diff on the vcfs -> identical except timestamp

Might be worth updating vcfeval_happy to catch up with PrecisionFDA version?

Output confidence intervals

R code:

install.packages("epiR")
library("epiR")

#Provide numbers in order TP,FP,FN,TN   
dat <- as.table(matrix(c(11068,36,105,14141122), nrow = 2, byrow = TRUE))
rval <- epi.tests(dat, conf.level = 0.95)
print(rval); summary(rval)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.