GithubHelp home page GithubHelp logo

piotyama / orphan-prediction Goto Github PK

View Code? Open in Web Editor NEW

This project forked from eswlab/orphan-prediction

0.0 1.0 0.0 147.92 MB

methods for orphan gene prediction paper optimization

Shell 74.15% Python 25.85%

orphan-prediction's Introduction

See overview and documentation: Documentation Status

Maximizing prediction of orphan genes in assembled genomes

Table of Contents

Gene prediction and optimization using BIND and MIND workflows:

MIND: ab initio gene predictions by MAKER combined with gene predictions INferred Directly from alignment of RNA-Seq evidence to the genome. BIND: ab initio gene predictions by BRAKER combined with gene predictions INferred Directly from alignment of RNA-Seq evidence to the genome.

1. Find an Orphan-Enriched RNA-Seq dataset from NCBI-SRA (See details here):

  • Search RNA-Seq datasets for your organism on NCBI, filter Runs (SRR) for Illumina, paired-end, HiSeq 2500 or newer.
  • Download Runs from NCBI (SRA-toolkit)
  • If existing annotations is available, expression quantification is done against every gene using every SRR with Kallisto.
  • run phylostratr on current gene models to infer phylostrata of each gene model
  • Rank the SRRs with highest number of expressed orphans and select feasible amounts of data to work with.

Note: If NCBI-SRA has no samples for your organism, and you are relying solely on RNA-Seq that you generate yourself, best practice is to maximize representation of all genes by including conditions like reproductive tissues and stresses in which orphan gene expression is high.

2. Ab initio gene prediction:

Pick one of the 2 ab initio predictions below:

  1. Run BRAKER (See details here):

    • Align RNA-Seq with splice aware aligner (STAR or HiSat2 preferred, HiSat2 used here)
    • Generate BAM file for each SRA-SRR id, merge them to generate a single sorted BAM file
    • Run BRAKER
  2. Run MAKER (See details here):

    • Align RNA-Seq with splice aware aligner (STAR or HiSat2 preferred, HiSat2 used here)
    • Generate BAM file for each SRA-SRR id, merge them to generate a single sorted BAM file
    • Run Trinity to generate transcriptome assembly using the BAM file
    • Run TransDecoder on Trinity transcripts to predict ORFs and translate them to protein
    • Run MAKER with transcripts (Trinity), proteins (TransDecoder and SwissProt), in homology-only mode
    • Use the MAKER predictions to train SNAP and AUGUSTUS. Self-train GeneMark
    • Run second round of MAKER with the above (SNAP, AUGUSTUS, and GeneMark) ab initio predictions plus the results from previous MAKER rounds.

3. Direct Inference evidence-based predictions (See details here):

We provide an automated pipeline for evidence-based predictions (See details here)

  • Align RNA-Seq with splice aware aligner (STAR or HiSat2 preferred, HiSat2 used here)
  • Generate BAM file for each SRA-SRR id
  • For each BAM file, use multiple transcript assemblers for genome guided transcript assembly:
    • Class2
    • StringTie
    • Cufflinks
  • Run PortCullis to remove invalid splice junctions
  • Consolidate transcripts and generate a non-redundant set of transcripts using Mikado.
  • Predict ORFs on these consolidated transcripts using TransDecoder
  • Pick best transcripts using all the above information with Miakdo Pick.

4. Combine ab initio and Direct Inference evidence-based predictions:

If you ran BRAKER in step 2, run 4.1

  1. Merge BRAKER with Direct Inference (BIND) (See details here):
  • Use Mikado to combine BRAKER-generated predictions with Direct Inference evidence-based predictions.

If you ran MAKER in step 2, run 4.2

  1. Merge MAKER with Direct Inference (MIND) (See details here):
  • Use Mikado to combine MAKER-generated predictions with Direct Inference evidence-based predictions.

5. Evaluate your predictions (See details here):

  • Run BUSCO to see how well the conserved genes are represented in your final predictions
  • Run OrthoFinder to find and annotate orthologs present in your predictions
  • Run phylostratR to find orphan genes in your predictions
  • Add functional annotation to your genes using homology and InterProScan

Prediction tools include:

Tool Purpose
SRA Tools (v. 2.9.6 ) SRA access
Hisat2 (v. 2.2.0) Alignment
STAR (v. 2.7.7a) Alignment
Kallisto (v. 0.46.2) Quantification
Samtools (v. 1.10) Tools
CLASS2 (v. 2.1.7) Transcript Assembly
Stringtie (v. 1.3.3) Transcript Assembly
Cufflinks (v. 2.2.1) Transcript Assembly
Trinity (v. 2.6.6) Transcript Assembly
Porticullis (v. 1.2.2) Tools
Transdecoder (v. 3.0.1) CDS prediction
Mikado (v. 2.0) Direct Inference prediction
Phylostratr (v. 0.2.0) Phylostratigraphy
BLAST (v. 3.11.0) Tools
Braker (v. 2.1.2) Ab initio prediction
Maker (v. 2.31.10) Ab initio prediction
GMAP-GSNAP (v. 2019-05-12) Alignment
GeneMark (v. 4.83) Ab initio Prediction

orphan-prediction's People

Contributors

lijing28101 avatar aseetharam avatar urmi-21 avatar eve-syrkin-wurtele avatar evewurtele avatar priyanka8590 avatar jd-campbell avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.