GithubHelp home page GithubHelp logo

rnaseq's Introduction

Bulk RNAseq pipeline for paired-end and single-end reads

last updated 2020-01-07

Author: Noah Siegel

Clone repo

git clone https://github.com/nasiegel88/rnaseq.git
cd rnaseq

Create conda environment

conda env create -n rnaseq -f env/rnaseq.yml

This environment has everything needed to run snakemake, perform quality control of reads, and quantify reads based on what matches a reference genome.

Download practice data from Himes at et al., 2014 by first installing and activating the grabseq env. Note, only do this for paired sequencing data. If you want to test out the snakefiles for single-end sequencing you either need to find different practice data or simply delete the reverse reads from Himes et al., 2014 by using something like rm *_2.fastq.gz in the raw.dir directory.

conda env create -n grabseq -f env/grabseq.yml
conda activate grabseq
# switch to the raw data directory within rnaseq
cd raw.dir
grabseqs sra PRJNA229998
  • Note this may take a few hours to download

Or do it all at once with the bash script by running the code below...

bash raw.dir/download_data.sh

Next, change the paths in the configurtion file to map to your computer. I am only using TruSeq-PE adapter but it possible to run this workflow by substituting for different adapter sequences. Currently, the configuration file supports both paired and single end TruSeq adapters

proj_name: name
contact:
  email: [email protected]
  person: John Doe
raw-data: /$PWD/rnaseq/raw.dir
scratch:  /$PWD/rnaseq/scratch
outputDIR: /$PWD/rnaseq/output
metadata: /$PWD/metadata.tsv
ref: /$PWD/rnaseq/refs/

# Organism
species:
  mouse: Mus_musculus.GRCm38.cdna.all.fa.gz
  human: Homo_sapiens.GRCh38.cdna.all.fa.gz

# Adapters
seq:
  PE: TruSeq2-PE.fa
  trueseq-pe: https://raw.githubusercontent.com/timflutre/trimmomatic/master/adapters/TruSeq2-PE.fa
  SE: TruSeq2-SE.fa
  trueseq-se: https://raw.githubusercontent.com/timflutre/trimmomatic/master/adapters/TruSeq2-SE.fa

# Downloads
transcriptome:
  human: ftp://ftp.ensembl.org/pub/release-99/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
  mouse: ftp://ftp.ensembl.org/pub/release-102/fasta/mus_musculus/cdna/Mus_musculus.GRCm38.cdna.all.fa.gz


# Fastq file suffix, following the read pair designation
suffix: .fastq.gz
# suffix: _001.fastq.gz

# Read pair designations
r1_suf: 1
r2_suf: 2
#r1_suf: R1
#r2_suf: R2

Lastly, some changes will need to be made to the snakefile...

# Snakemake file
configfile: "config.yaml"
import io 
import os
import pandas as pd
import pathlib
from snakemake.exceptions import print_exception, WorkflowError

#----SET VARIABLES----#
PROJ = config["proj_name"]
INPUTDIR = config["raw-data"]
SCRATCH = config["scratch"]
REFERENCE = config["ref"]
OUTPUTDIR = config["outputDIR"]

Follow variable assigments based on the kind of reads you have.

Paired end reads:

# Adapters
ADAPTER = config['seq']['PE']
SEQUENCE = config['seq']['trueseq-pe']

Single end reads:

# Adapters
SE_ADAPTER = config['seq']['SE']
SE_SEQUENCE = config['seq']['trueseq-se']

Specify which organsim transcriptome you want to align too. Currently mouse and human are the only transcriptomes listed in the config.yml. Different reference transcriptiome can be found at the enembl database.

# Organisim
TRANSCRIPTOME = config['transcriptome']['human']
SPECIES = config['species']['human']

If working with mouse sample adjusted the above code to...

# Organisim
TRANSCRIPTOME = config['transcriptome']['mouse']
SPECIES = config['species']['mouse']

Once every thing is set up activate the environment using, conda activate rnaseq You are ready to run Snakemake.

Paired-end: snakemake -s Snakefile-paired.smk -j 2

Single-end: snakemake -s Snakefile-single.smk -j 2

rnaseq's People

Contributors

nasiegel88 avatar taylorreiter avatar bluegenes avatar ctb avatar

Forkers

lmillergrp

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.