GithubHelp home page GithubHelp logo

upd's Introduction

UPD.py

Simple software to call UPD regions from germline exome/wgs trios.

Documentation

Installation

pip install --editable . or python setup.py install

The only dependencies are click and coloredlogs.

Input VCF requirements

  • Trio (proband-mother-father) VCF is required (i.e. all three individuals in the same VCF).
  • GQ must be present in genotype fields.
  • Must be annotated with some sort of population frequency. This can also be in the CSQ annotation (default: MAX_AF), if so use --vep

Running

upd --vcf input.vcf.gz --proband PB_ID --mother MOTHER_ID --father FATHER_ID regions --out upd_regions.bed

Where PB_ID/MOTHER_ID/FATHER_ID are the sample IDs from the vcf header.

Optional parameters

Command Parameter Description
base --min-af (DEFAULT: 0.05) Minimum population frequency required to include the SNP in the analysis.
base --af-tag (DEFAULT: MAX_AF) Specifies which frequency field to be used when selecting SNPs.
base --min-gq (DEFAULT: 30) Specifies the minimum GQ required to include a variant in the analysis. All three individuals' must have a GQ larged than or equal to this.
base --vep (flag) If given, search the CSQ field for af-tag
base --vep (flag) If given, search the CSQ field for af-tag
regions --min-sites (DEFAULT: 3) Minimum number of consecutive UPD sites needed to call an UPD region.
regions --min-size (DEFAULT: 1000) Minimum number of base pairs between first and last UPD site in a region required to call it.
regions/sites --out (DEFAULT: stdout) If the results should be printed to a file
regions/sites --iso-het-pct (DEFAULT: 0.01) Threshold ratio for calling homodisomy

Output

Called regions (udp regions)

BED file of all called regions fulfilling the --min-sites and --min-size filter criteria, including some annotation.

Example call:

15      22958103        101910550       ORIGIN=PATERNAL;TYPE=HETERODISOMY;LOW_SIZE=78952446;INF_SITES=182;SNPS=2896;HET_HOM=1275/1440;OPP_SITES=0;START_LOW=20170150;END_HIGH=102516492;HIGH_SIZE=82346342
Field Description
ORIGIN The parental origin of the UPD region
TYPE Indicates whether region is heterodisomic or isodisomic/deletion. See caveats
LOW_SIZE Lower bound of the estimated size of the called region. Distance between first and last UPD site of the region
INF_SITES Number of UPD informative sites in the called regions
SNPS Total number of SNPs in the region
HET_HOM Number of heterozygous and homozygous sites in the region. Used to indicate if region has LOH
OPP_SITES Number of sites in the region that have the opposite parental origin. So no other number than 0 have been observed...
START_LOW Earliest possible start positon of the UPD region (position of last anti-UPD site prior to the region)
END_HIGH Last possible end position of the UPD region (position of first anti-UPD site prior to the region)
HIGH_SIZE Upper bound of the estimated size of the called region. Distance between the surrounding anti-UPD sites.

Informative sites (upd sites)

Fourth field indicates what type of field:

Type Description
ANTI_UPD Site where proband has inherited alleles from both parents (e.g. AA x BB = AB)
UPD_MATERNAL_ORIGIN Site where both alleles are inherited from the mother, or a fathers allele was deleted. (e.g. AA x BB = AA)
UPD_PATERNAL_ORIGIN Site where both alleles are inherited from the father, or a mothers allele was deleted. (e.g. AA x BB = BB)
PB_HETEROZYGOUS Heterozygous site in the proband (used only to call hetero/isodisomy)
PB_HOMOZYGOUS Homozygous site in the proband (used only to call hetero/isodisomy)
UNINFORMATIVE Various sites excluded from the analysis. Ignore these...

Caveats

The isodisomic/heterodisomic calling depends on estimating if there is a run of homozygousity in the called region. This is called using presence of heterozygous sites within the call, and should only be seen as a rough estimate for smaller calls. For more statistically sound detection of this, use e.g. bcftools roh to detect regions of homozygozity and combine the results.

Isodisomies cannot be differentiated from deletions using this software. Combine the calls with SV/CNV calls to determine if there is an overlapping deletion. If overlapping deletion, the "UPD" call can be used to detemine which parental allele was deleted.

upd's People

Contributors

bjhall avatar mikaell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

upd's Issues

Test/Validation File

Hi, It would be nice to have a VCF with non-sensitive data that we can use for tests and validation. Do you have any like this that you can add to the repo @bjhall @dnil ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.