See overview and documentation:

Maximizing prediction of orphan genes in assembled genomes

Table of Contents
Gene prediction and optimization using BIND and MIND workflows:
Tools list

Gene prediction and optimization using BIND and MIND workflows:

MIND: ab initio gene predictions by MAKER combined with gene predictions INferred Directly from alignment of RNA-Seq evidence to the genome. BIND: ab initio gene predictions by BRAKER combined with gene predictions INferred Directly from alignment of RNA-Seq evidence to the genome.

1. Find an Orphan-Enriched RNA-Seq dataset from NCBI-SRA (See details here):

Search RNA-Seq datasets for your organism on NCBI, filter Runs (SRR) for Illumina, paired-end, HiSeq 2500 or newer.
Download Runs from NCBI (SRA-toolkit)
If existing annotations is available, expression quantification is done against every gene using every SRR with Kallisto.
run phylostratr on current gene models to infer phylostrata of each gene model
Rank the SRRs with highest number of expressed orphans and select feasible amounts of data to work with.

Note: If NCBI-SRA has no samples for your organism, and you are relying solely on RNA-Seq that you generate yourself, best practice is to maximize representation of all genes by including conditions like reproductive tissues and stresses in which orphan gene expression is high.

2. Ab initio gene prediction:

Pick one of the 2 ab initio predictions below:

Run BRAKER (See details here):
- Align RNA-Seq with splice aware aligner (STAR or HiSat2 preferred, HiSat2 used here)
- Generate BAM file for each SRA-SRR id, merge them to generate a single sorted BAM file
- Run BRAKER
Run MAKER (See details here):
- Align RNA-Seq with splice aware aligner (STAR or HiSat2 preferred, HiSat2 used here)
- Generate BAM file for each SRA-SRR id, merge them to generate a single sorted BAM file
- Run Trinity to generate transcriptome assembly using the BAM file
- Run TransDecoder on Trinity transcripts to predict ORFs and translate them to protein
- Run MAKER with transcripts (Trinity), proteins (TransDecoder and SwissProt), in homology-only mode
- Use the MAKER predictions to train SNAP and AUGUSTUS. Self-train GeneMark
- Run second round of MAKER with the above (SNAP, AUGUSTUS, and GeneMark) ab initio predictions plus the results from previous MAKER rounds.

3. Direct Inference evidence-based predictions (See details here):

We provide an automated pipeline for evidence-based predictions (See details here)

Align RNA-Seq with splice aware aligner (STAR or HiSat2 preferred, HiSat2 used here)
Generate BAM file for each SRA-SRR id
For each BAM file, use multiple transcript assemblers for genome guided transcript assembly:
- Class2
- StringTie
- Cufflinks
Run PortCullis to remove invalid splice junctions
Consolidate transcripts and generate a non-redundant set of transcripts using Mikado.
Predict ORFs on these consolidated transcripts using TransDecoder
Pick best transcripts using all the above information with Miakdo Pick.

4. Combine ab initio and Direct Inference evidence-based predictions:

If you ran BRAKER in step 2, run 4.1

Merge BRAKER with Direct Inference (BIND) (See details here):

Use Mikado to combine BRAKER-generated predictions with Direct Inference evidence-based predictions.

If you ran MAKER in step 2, run 4.2

Merge MAKER with Direct Inference (MIND) (See details here):

Use Mikado to combine MAKER-generated predictions with Direct Inference evidence-based predictions.

5. Evaluate your predictions (See details here):

Run BUSCO to see how well the conserved genes are represented in your final predictions
Run OrthoFinder to find and annotate orthologs present in your predictions
Run phylostratR to find orphan genes in your predictions
Add functional annotation to your genes using homology and InterProScan

Prediction tools include:

Tool	Purpose
SRA Tools (v. 2.9.6 )	SRA access
Hisat2 (v. 2.2.0)	Alignment
STAR (v. 2.7.7a)	Alignment
Kallisto (v. 0.46.2)	Quantification
Samtools (v. 1.10)	Tools
CLASS2 (v. 2.1.7)	Transcript Assembly
Stringtie (v. 1.3.3)	Transcript Assembly
Cufflinks (v. 2.2.1)	Transcript Assembly
Trinity (v. 2.6.6)	Transcript Assembly
Porticullis (v. 1.2.2)	Tools
Transdecoder (v. 3.0.1)	CDS prediction
Mikado (v. 2.0)	Direct Inference prediction
Phylostratr (v. 0.2.0)	Phylostratigraphy
BLAST (v. 3.11.0)	Tools
Braker (v. 2.1.2)	Ab initio prediction
Maker (v. 2.31.10)	Ab initio prediction
GMAP-GSNAP (v. 2019-05-12)	Alignment
GeneMark (v. 4.83)	Ab initio Prediction

piotyama / orphan-prediction Goto Github PK

orphan-prediction's Introduction

Maximizing prediction of orphan genes in assembled genomes

Table of Contents

Gene prediction and optimization using BIND and MIND workflows:

1. Find an Orphan-Enriched RNA-Seq dataset from NCBI-SRA (See details here):

2. Ab initio gene prediction:

3. Direct Inference evidence-based predictions (See details here):

We provide an automated pipeline for evidence-based predictions (See details here)

4. Combine ab initio and Direct Inference evidence-based predictions:

5. Evaluate your predictions (See details here):

Prediction tools include:

orphan-prediction's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs