GithubHelp home page GithubHelp logo

t0mrg / fathmm-mkl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hashihab/fathmm-mkl

0.0 1.0 0.0 370 KB

Predict the functional consequences of both coding and non-coding single nucleotide variants (SNVs)

Python 6.47% Makefile 0.81% Java 6.43% C++ 5.73% C 73.43% Perl 1.92% XS 0.55% Roff 2.06% TeX 2.60%

fathmm-mkl's Introduction

fathmm-MKL

Predicting the functional consequences of both coding and non-coding single nucleotide variants (see http://fathmm.biocompute.org.uk).

For more information, please refer to the following publication:

Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day INM, Gaunt TR, Campbell C (2014). An Integrative Approach to Predicting the Functional Consequences of Non-coding and Coding Sequence Variation. Bioinformatics (In Press)

General Requirements

You will need the following packages installed on your system:

  • tabix (included as part of this repository)
  • Python (tested with Python 2.7)

Running the Software

  • Clone this repository
git clone https://github.com/HAShihab/fathmm-MKL
cd fathmm-MKL/
  • Download our pre-computed database:
wget http://fathmm.biocompute.org.uk/database/fathmm-MKL_Current.tab.gz

Note: this database contains one-based coordinates (positions). For true bed format (i.e. zero-based coordinates), please download the following database: http://fathmm.biocompute.org.uk/database/fathmm-MKL_Current_zerobased.tab.gz

Datafile md5sum
http://fathmm.biocompute.org.uk/database/fathmm-MKL_Current.tab.gz b8f4dd120586a34c82d5cc87cfe2a4ca
http://fathmm.biocompute.org.uk/database/fathmm-MKL_Current_zerobased.tab.gz c3213196a2471ade3742bd8f8a96d4cc
  • Add tabix to your PATH and create the database index file (please be patient, this may take a while!):
export PATH=./tabix-0.2.6/:$PATH
tabix -f -p bed fathmm-MKL_Current.tab.gz
  • Run our script using the following command:
python fathmm-MKL.py <fin> <fo> <db>

In the above command, <fin> is the list of mutations to process (see test.txt for an example), <fo> is where the predictions are written and <db> is the pre-computed database downloaded in Step 1.

Note: the database index file must be created before running our script. If this has not been created, your output will contain "No Prediction Found" for all variants!

Prediction Interpretation

Predictions are given as p-values in the range [0, 1]: values above 0.5 are predicted to be deleterious, while those below 0.5 are predicted to be neutral or benign. P-values close to the extremes (0 or 1) are the highest-confidence predictions that yield the highest accuracy.

We use distinct predictors for positions either in coding regions (positions within coding-sequence exons) and non-coding regions (positions in intergenic regions, introns or non-coding genes). The coding predictor is based on 10 groups of features, labeled A-J; the non-coding predictor uses a subset of 4 of these feature groups, A-D (see our related publication for details on the groups and their sources).

Note: predictions based on a subset of features may not be as accurate as those based on complete feature sets. In particular, predictions that are missing the conservation score features (groups A and E) will tend to be less accurate than other predictions. To aid in interpreting these predictions, we provide a list of the feature groups that contributed to each prediction.

Genome Build

FATHMM-MKL predictions are based on the GRCh37/hg19 genome build.

Contributing:

We welcome any comments and/or suggestions that you may have regarding our software - please send an email to [email protected]

fathmm-mkl's People

Contributors

hashihab avatar t0mrg avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.