GithubHelp home page GithubHelp logo

guigolab / chip-nf Goto Github PK

View Code? Open in Web Editor NEW
17.0 7.0 4.0 6.31 MB

An automated ChIP-seq pipeline using Nextfow

Awk 0.80% Shell 1.29% Nextflow 84.39% Dockerfile 13.51%
chip-seq nextflow pipeline ngs crg guigo guigolab-nf

chip-nf's Introduction

ChIP-nf

Nextflow Build Status

A Nextflow pipeline for processing ChIP-seq data.

Installing Nextflow

Nextflow can be installed by using the following command:

curl -fsSL get.nextflow.io | bash

Running the pipeline

First you need to pull the pipeline using Nextflow:

$ nextflow pull guigolab/chip-nf
Checking guigolab/chip-nf ...
 downloaded from https://github.com/guigolab/chip-nf.git

You can get the pipeline help with the following command:

$ nextflow run chip-nf --help

N E X T F L O W  ~  version 0.24.1
Launching `guigolab/chip-nf` [nostalgic_franklin] - revision: 974a45c356 [master]

C H I P - N F ~ ChIP-seq Pipeline
---------------------------------
Run ChIP-seq analyses on a set of data.

Usage:
    chipseq-pipeline.nf --index TSV_FILE --genome GENOME_FILE [OPTION]...

Options:
    --help                              Show this message and exit.
    --index TSV_FILE                    Tab separted file containing information about the data.
    --genome GENOME_FILE                Reference genome file.
    --genome-index GENOME_INDEX_ FILE   Reference genome index file.
    --genome-size GENOME_SIZE           Reference genome size for MACS2 callpeaks. Must be one of
                                        MACS2 precomputed sizes: hs, mm, dm, ce. (Default: hs)
    --mismatches MISMATCHES             Sets the maximum number/percentage of mismatches allowed for a read (Default: 2).
    --multimaps MULTIMAPS               Sets the maximum number of mappings allowed for a read (Default: 10).
    --min-matched-bases BASES           Sets the minimum number/percentage of bases that have to match with the reference (Default: 0.80).
    --quality-threshold THRESHOLD       Sets the sequence quality threshold for a base to be considered as low-quality (Default: 26).
    --fragment-length LENGTH            Sets the fragment length globally for all samples (Default: 200).
    --remove-duplicates                 Remove duplicate alignments instead of just flagging them (Default: false).
    --rescale                           Rescale peak scores to conform to the format supported by the
                                        UCSC genome browser (score must be <1000) (Default: false).
    --shift                             Move fragments ends and apply global extsize in peak calling. (Default: false).

Input

The input data and metadata should be specified using a tab separated file and passing it to the pipeline command with the option --index. Here is an example of the file format:

sample1    sample1_run1     /path/to/sample1_run1.fastq.gz    -           H3
sample1    sample1_run2     /path/to/sample1_run2.fastq.gz    -           H3
sample1    sample1_run3     /path/to/sample1_run3.fastq.gz    -           H3
sample1    sample1_run4     /path/to/sample1_run4.fastq.gz    -           H3
sample2    sample2_run1     /path/to/sample2_run1.fastq.gz    control1    H3K4me2
control1   control1_run1    /path/to/control1_run1.fastq.gz   control1    input

The fields in the file correspond to:

  1. identifier used for merging the BAM files

  2. single run identifier

  3. path to the fastq file to be processed

  4. identifier of the input or - if no control is used

  5. mark/histone or input if the line refers to a control

  6. optional sample fragment length. If not specified the fragment length is estimated using SPP

Output

The pipeline will produce the following output data:

  • Alignments

  • pileupSignal, pileup signal tracks

  • fcSignal, fold enrichment signal tracks

  • pvalueSignal, -log_10(P) signal tracks

  • narrowPeak, peak locations with peak summit, pvalue and qvalue (BED6+4)

  • broadPeak, similar to narrowPeak (BED6+3)

  • gappedPeak, both narrow and broad peaks (BED12+3)

Check MACS2 output files for details.

The output data information is written to a file called chipseq-pipeline.db created in the folder from where the pipeline is run. Here is an example of the db file:

sample1   /path/to/results/peakOut/sample1.pileup_signal.bw    H3         255     pileupSignal    0.9960   0.4393
sample1   /path/to/results/peakOut/sample1_peaks.narrowPeak    H3         255     narrowPeak      0.9960   0.4393
sample1   /path/to/results/sample1.bam                         H3         255     Alignments      0.9960   0.4393
sample1   /path/to/results/peakOut/sample1_peaks.gappedPeak    H3         255     gappedPeak      0.9960   0.4393
sample1   /path/to/results/peakOut/sample1_peaks.broadPeak     H3         255     broadPeak       0.9960   0.4393
sample2   /path/to/results/peakOut/sample2_peaks.gappedPeak    H3K4me2    200     gappedPeak      0.9995   0.7216
sample2   /path/to/results/peakOut/sample2.fc_signal.bw        H3K4me2    200     fcSignal        0.9995   0.7216
sample2   /path/to/results/peakOut/sample2.pval_signal.bw      H3K4me2    200     pvalueSignal    0.9995   0.7216
sample2   /path/to/results/peakOut/sample2_peaks.broadPeak     H3K4me2    200     broadPeak       0.9995   0.7216
sample2   /path/to/results/peakOut/sample2.pileup_signal.bw    H3K4me2    200     pileupSignal    0.9995   0.7216
sample2   /path/to/results/peakOut/sample2_peaks.narrowPeak    H3K4me2    200     narrowPeak      0.9995   0.7216
sample2   /path/to/results/sample2_GCCAAT_primary.bam          H3K4me2    200     Alignments      0.9995   0.7216

The fields in the file correspond to:

  1. merge identifier

  2. path

  3. mark/histone

  4. (estimated) fragment length

  5. data type

  6. NRF (Nonredundant Fraction)

  7. FRiP (Fraction of Reads in Peaks)

chip-nf's People

Contributors

emi80 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

chip-nf's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.