GithubHelp home page GithubHelp logo

vcf-descriptions's Introduction

VCF-descriptions

Description of VCF fields for different variant-callers
Platypus. See more at: http://www.well.ox.ac.uk/platypus

Example of a multiple-sample VCF (5 samples; adapted from a 45-samples original VCF)

--------------------------------------------
#CHR POS ID REF ALT QUAL FILTER INFO FORMAT sample1 sample2 sample3 sample4 sample5 etc.
chr1 93257374 . A G 936.64 PASS AC=79;AN=90;BRF=0.0;FR=1.0000;HP=1;HapScore=2;MGOF=8;MMLQ=41;MQ=60.0;NF=0;NR=12;PP=457;QD=40.9809343789;SC=AAATTTATGTAGCTTTTATTA;SF=0,1,2,3,4,5,etc.;SbPval=1.0;Source=File;TC=14;TCF=0;TCR=14;TR=12;WE=93257382;WS=93257364 GT:GL:GQ:GOF:NR:NV 1/1:-49.2,-3.31,0.0:33:8:14:12 0/1:-68.72,0.0,-55.72:99:31:39:21 0/1:-49.9,0.0,-36.6:99:23:31:16 1/1:-102.5,-7.5,0.0:75:12:26:26 1/1:-233.6,-17.71,0.0:99:12:60:60 etc.

AC=79, Allele count in genotypes.
AN=90, Total number of alleles in called genotypes.
BRF=0.0, Fraction of reads around this variant that failed filters.
FR=1.0000, Estimated population frequency of variant.
HP=1, Homopolymer run length around variant locus.
HapScore=2, Haplotype score measuring the number of haplotypes the variant is segregating into in a window.
MGOF=8, Worst goodness-of-fit value reported across all samples.
MMLQ=41, Median minimum base quality for bases around variant.
MQ=60.0, RMS Mapping Quality.
NF=0, Total number of forward reads containing this variant.
NR=12, Total number of reverse reads containing this variant.
FS, Fisher's exact test for strand bias (Phred scale).
PP=457, Posterior probability (phred scaled) that this variant segregates.
QD=40.9809343789, Variant-quality/read-depth for this variant.
SC=AAATTTATGTAGCTTTTATTA, Genomic sequence 10 bases either side of variant position.
SF=0,1,2,3,4,5,etc. (index to sourceFiles, f when filtered). SbPval=1.0,Binomial P-value for strand bias test.
Source=File
TC=14, Total coverage at this locus.
TCF=0, Total forward strand coverage at this locus.
TCR=14, Total reverse strand coverage at this locus.
TR=12, Total number of reads containing this variant.
WE=93257382, End position of calling window.
WS=93257364, Starting position of calling window.


FILTER field description

GOF=Variant fails goodness-of-fit test.
HapScore=Too many haplotypes are supported by the data in this region.
MQ=Root-mean-square mapping quality across calling region is low.
Q20=Variant quality is below 20.
QD=Variants fail quality/depth filter.
QualDepth=Variant quality/Read depth ratio is low.
REFCALL=This line represents a homozygous reference call
SC=Variants fail sequence-context filter. Surrounding sequence is low-complexity.
alleleBias=Variant frequency is lower than expected for het.
badReads=Variant supported only by reads with low quality bases close to variant position, and not present on both strands.
hp10=Flanking sequence contains homopolymer of length 10 or greater.
strandBias=Variant fails strand-bias filter.


INFO field description

AC=Allele count in genotypes.
AN=Total number of alleles in called genotypes.
BRF=Fraction of reads around this variant that failed filters.
END=Stop position of the interval.
FR=Estimated population frequency of varian.t
FS=Fisher's exact test for strand bias (Phred scale).
HP=Homopolymer run length around variant locus.
HapScore=Haplotype score measuring the number of haplotypes the variant is segregating into in a window.
MGOF=Worst goodness-of-fit value reported across all samples.
MMLQ=Median minimum base quality for bases around variant.
MQ=RMS Mapping Quality.
NF=Total number of forward reads containing this variant.
NR=Total number of reverse reads containing this variant.
PP=Posterior probability (phred scaled) that this variant segregates.
QD=Variant-quality/read-depth for this variant.
ReadPosRankSum=Mann-Whitney Rank sum test for difference between in positions of variants in reads from ref and alt.
SC=Genomic sequence 10 bases either side of variant position.
SF=Source File (index to sourceFiles, f when filtered).
START=Start position of reference call block.
SbPval=Binomial P-value for strand bias test.
Size=Size of reference call block.
Source=Was this variant suggested by Playtypus, Assembler, or from a VCF?.
TC=Total coverage at this locus.
TCF=Total forward strand coverage at this locus.
TCR=Total reverse strand coverage at this locus.
TR=Total number of reads containing this variant.
WE=End position of calling window.
WS=Starting position of calling window.


FORMAT field description

GL=Genotype log10-likelihoods for AA,AB and BB genotypes, where A = ref and B = variant. Only applicable for bi-allelic sites.
GOF=Goodness of fit value.
GQ=Genotype Quality.
GT=Unphased genotypes.
NR=Number of reads covering variant location in this sample.
NV=Number of reads containing variant in this sample.
PL=Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification.


vcf-descriptions's People

Contributors

genomicsiter avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.