GithubHelp home page GithubHelp logo

general0710 / rna-seq-example Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jmzeng1314/rna-seq-example

0.0 1.0 0.0 1.64 MB

An analysis of Arabidopsis RNA-seq data (hy5 mutant and wt, two replicates each; SRA accession SRX029582)

Shell 100.00%

rna-seq-example's Introduction

An Example Differential Expression Analysis Using RNA-Seq Data

This is an example analysis of RNA-seq data using open source tools R and Bioconductor. It starts with raw reads downloaded from the Short Read Archive (SRA), does quality assessment and improvement, mapping, and analysis.

Data

The data for this comes from Zhang et al., 2011, Genome-wide mapping of the HY5-mediated genenetworks in Arabidopsis that involve both transcriptional and post-transcriptional regulation (http://onlinelibrary.wiley.com/doi/10.1111/j.1365-313X.2010.04426.x/full). It is composed of two Arabidopsis thaliana ecotype Col-0 sets, one hy5 mutant and one wild type. The SRA accession is SRX029582.

  • GEO:GSM613465: wild type samples
    • SRR070570: replicate 1
    • SRR070571: replicate 2
  • GEO:GSM613466: hy5 mutant samples
    • SRR070572: replicate 1
    • SRR070573: replicate 2

These were sequenced on an Illumina Genome Analyzer after oligo(dT) selection and random hexamer priming.

Downloading

Read data was acquired from the SRA via wget, and checked with:

cd data/raw-reads/
md5sum -c *sra.md5

The TAIR10 Arabidopsis Genome was downloaded with:

wget -nd ftp://ftp.arabidopsis.org/home/tair/Sequences/whole_chromosomes/*fas

-nd tells wget not to mirror the directory structure.

FASTQ Extraction

FASTQ data was extracted with NCBI's SRA Toolkit's fastq-dump, used with xargs and find to run this in parallel:

find . -name "*sra" | xargs -n1 -P4 /share/apps/sratoolkit/fastq-dump

Process a Genome for GSNAP

cd data/athaliana-genome
gmap_build -d athaliana10 -C *.fas

The -C option tries to extract chromosome information from the FASTA header.

Raw Sequence Quality Assessment

First, we assess the quality with qrqc; see raw-read-qa.Rmd.

Scythe

sycthe is a 3'-end quality trimmer. Illumina adapters are removed from the raw reads:

find data/raw-reads/ -name "*.fastq" | xargs -n1 -I{} basename {} .fastq | xargs -n1 -P4 -I{} /share/apps/scythe/scythe -q sanger -a /share/apps/scythe/solexa_adapters.JNF.fa -o data/improved-reads/{}-trimmed.fastq data/raw-reads/{}.fastq

Sickle

sickle is a tool for trimming low-quality bases off of the 5'-end and 3'-end of reads.

Aligning Reads with GSNAP

I use a script (map-gsnap.sh) to process each alignment file. The purpose of using the script is that I can use it with xargs. Because gsnap returns results to standard out, we need to wrap the call in a script to allow this to be parallelizable.

find data/improved-reads/ -name "*final.fastq" | xargs -n1 -P4 bash map-gsnap.sh

I am using -N1 with gsnap, which allows for novel splicing. gsnap also allows for known splicing junctions to be used.

Converting SAM files to BAM files

Another task for xargs. As before, I use basename to keep just the unique file name as a key, and run samtools with the correct directory and extension. This allows me to change the extension too.

find data/alignments/ -name "*sam" | xargs -n1 -I{} basename {} .sam | xargs -n1 -I{} -P4 samtools view -b -S -o data/alignments/{}.bam data/alignments/{}.sam

rna-seq-example's People

Contributors

vsbuffalo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.