GithubHelp home page GithubHelp logo

sns's Introduction

Seq-N-Slide: illumina sequencing data analysis pipelines

Usage overview

Navigate to a clean new project directory. This is where all the results will end up.

cd <project dir>

Download the code from GitHub, which will create the sns sub-directory with all the code.

git clone --depth 1 https://github.com/igordot/sns

Scan a directory that contains FASTQ files to be used as input. This can be run multiple times if there are FASTQs in different directories.

sns/gather-fastqs <fastq dir>

All found files will be added to the samples.fastq-raw.csv file. It can be modified to change sample names, remove samples, or manually add samples. The first column is the sample name, the second column is the R1 FASTQ, and the third column is the R2 FASTQ (if available). Each line contains a single FASTQ (or FASTQ pair for paired-end experiments). If one sample has multiple FASTQs, each one will be on a different line. Multiple FASTQs for the same sample will be merged based on sample name.

Specify a reference genome (only hg19/mm10/dm3/dm6 are currently guaranteed to work).

sns/generate-settings <genome>

Run the analysis using a specific route (a set of analysis steps).

sns/run <route>

Check for potential problems.

grep "ERROR:" logs-qsub/*

There should be no matches. If there are any results, there was a problem. Check the specific log files where the errors are found for more info.

Routes

Routes are different analysis workflows. Generic routes are sample-centric (same analysis is performed for each sample). Available routes:

  • rna-star: RNA-seq using STAR. Generates BAMs, normalized bigWigs, counts matrix, and various QC metrics.
  • rna-rsem: RNA-seq using RSEM. Generates FPKM/TPM/counts matrix and various QC metrics.
  • rna-snv: RNA-seq variant detection. Generates BAMs, VCFs, and various QC metrics.
  • wgbs: WGBS methylation analysis.
  • rrbs: RRBS methylation analysis.
  • wes: Whole genome/exome/targeted variant detection. Generates BAMs, VCFs, and various QC metrics.
  • atac: ATAC-seq. Generates BAMs, bigWigs, peaks, nucleosome positions, and various QC metrics.
  • species: Species/metagenomics/contamination analysis.

There are additional routes for comparing groups of samples after individual samples are processed with a generic route. They depend on the output of the generic routes and must be run from the same directory. Before running, manually add proper group names or pairs to the samples.groups.csv or samples.pairs.csv files (depending on the comparison type). Available comparison routes:

Output

  • Directories for different output types (such as BAMs or bigWigs) containing files for each sample.
  • summary-combined.*.csv: Combined segment summaries table that provides a comprehensive overview of the project.
  • logs-* directories: Most stdout/stderr output will be placed here. The information can be used for tracking progress and troubleshooting.

Each route has a description with more specific details.

About

SNS is designed to work on NYULMC HPC cluster using the Sun Grid Engine job scheduler. It may require significant modifications to work in other environments.

SNS consists of multiple routes (or workflows). Each route contains multiple segments (or steps).

If there is a problem with any of the results, delete the broken files and re-run SNS. It will generate any missing output. Similarly, you can add additional samples and only the new ones will be processed when the route is re-run.

Most output and sample sheets are in a CSV format for macOS Quick Look (spacebar file preview) compatibility.

There is a copy of the code in each project directory for reproducibility. If you modify the code, the changes will not affect other projects. If you repeat the analysis with more samples in the future, same code will be used.

FAQs

Coming soon.

sns's People

Contributors

igordot avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.