GithubHelp home page GithubHelp logo

splice's Introduction

SPLICE

Analysis pipeline for long-read RNA-seq data using Nanopore technology

Requirement

  • python3
  • minimap2 (v2.17 or higher)

Input file

FASTQ file

Usage

Step 1: Preparation of the reference transcriptome file

Download the reference transcriptome file from the UCSC genome browser (https://genome.ucsc.edu/)

  1. Select Table Browser from the Tools tab.
  2. Select "Genes and Gene Predictions" in the group section.
  3. Select the desired database in the Track section and download the file.

Numbering the exons of the reference transcriptome.
If you use files from two databases (e.g. GENCODE and RefSeq), sort them by gene name and transcript name, and remove redundant transcripts.

$ cd <path to SPLICE>
$ sh ref_exonnum.sh <path to reference transcriptome> <path to reference genome sequence (FASTA)> <output directory> 

Edit the configure file

  • The path to the reference genome file (FASTA) should be specified on the REF_GENOME_FA line
  • The path to the output .exonnum file should be specified on REF_TRANSCRIPT line
  • The path to the output .exonnum.fa file should be specified on REF_TRANSCRIPT_FA line
  • The path to the minimap2 execution file should be specified on MINIMAP2 line

Step 2: Annotation to reference transcriptome

$ sh SPLICE_annot.sh <path to FASTQ> <output directory>

Step 3: Analysis of expression levels (Option for multiple analysis)

Move all .annot and .novel_exon files to single directory

$ sh SPLICE_exp.sh <output directory of Step2 or directory of `.annot` and `.novel_exon` files (multi-sample analysis)> <output directory>

Output

expression.tsv - Output file of transcript expression levels (number of supporting reads)

gene transcript known/novel coding/non-coding transcript length novel information sample1 sample2 sampleN
GAPDH NM_001256799.2 known coding full - 71 50 31
CSDE1 NM_001242891.1 known coding partial - 346 40 88
SAP18 - novel_exon_length coding - *,6,8,10,*/*,k,l,k,*/21140681,21147186/*,0,0,0,-35,0,0,0,0,* 8 0 9

Notation of novel transcript
Notation of novel transcript

.fusion - Output file of fusion transcript expression levels (number of supporting reads) for each sample

Number of reads Read frequency(%) GeneA/B ChrA/B BreakpointA/B Read IDs
150 28.571 UQCRFS1/YWHAE chr19/chr17 29207585-29207585/1364879-1364879 86ada1...
29 7.143 RPS6KA5/TMSB4X chr14/chrX 91060446-91060446/12977041-12977041 38a936...
12 5.128 TMED10/VPS4B chr14/chr18 75132140-75132140/63390367-63390367 1e0cbc...

breakpoint indicates the range after merging the neighboring breakpoints

Example

Preparation of the reference transcriptome file

$ git clone https://github.com/hkiyose/SPLICE
$ cd SPLICE
$ sh ref_exonnum.sh <path to reference transcriptome file> <path to reference genome sequence (FASTA)> ./example/ref

Edit the configure file as described above.

Annotation to reference transcriptome.

$ sh SPLICE_annot.sh ./example/fastq/sample1_test.fastq ./example/annot
$ sh SPLICE_annot.sh ./example/fastq/sample2_test.fastq ./example/annot

Analysis of expression levels

$ sh SPLICE_exp.sh ./example/annot ./example/exp

Installation and usage via Docker

Install Docker in your computer, and build a Docker image with the following commands.

$ git clone https://github.com/hkiyose/SPLICE.git
$ cd <path to SPLICE>
$ docker build -t splice .

The following command mounts the host directory containing the data to the container. Refer to Step 1 of Usage to download the reference data.

$ docker run --rm -it \
  -v <path to directory of reference transcriptome and referece genome sequence (FASTA)>:/ref \
  -v <path to directory of sample data(FASTQ)>:/sample_input \
  -v <path to output directory>:/sample_output \
  splice

Then Run according to Usage.

Parameter settings in configuration file

If you want to use different parameters, please change the configuration file.

BQ_FILT - Read quality cutoff. Minimum average base quality score (15)
MIN_SC_LEN - Minimum length of the softclip region to be remapped (60)
MQ_FILT - Mapping quality cutoff (0)
MIN_FUSION_DIST - Minimum distance of each transcript in the fusion transcript (200000)
MAX_FUSION_BP_MERGE - Maximam distance to merge fusion gene breakpoints (5)
MIN_FUSION_READ - Minimum number of support reads for fusion transcripts (1)
MIN_FUSION_FREQ - Minimum frequency of the fusion transcript in the total amount of the gene (%) (0.1)
MQ_FILT_NOVEL_EXON - Mapping quality cutoff of novel exon (1)
MIN_READ_NUM - Minimum number of support reads (3)
MIN_READ_FREQ - Minimum frequency of the transcript in the total amount of the gene (%) (1)
RANGE_SJ_EVA - Maximum change of novel exon length for evaluate the error rate (20)
ERR_RATE_FILT - Mapping error rate cutoff at splicing junctino sites (%) (20)
MIN_NOVEL_LEN_GAP - Minimum change of novel exon length (5)
MIN_NOVEL_EXON_LEN - Minimum length of novel exon (60)

License

GPLv3

Contact

Hiroki Kiyose - [email protected]

splice's People

Contributors

hkiyose avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.