GithubHelp home page GithubHelp logo

labbcb / methseq Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 62 KB

Automation tool for high-throughput sequencing DNA methylation data

License: GNU General Public License v3.0

Python 35.43% WDL 64.57%
wgbs em-seq wdl cromwell

methseq's Introduction

MethSeq - Automation tool for high-throughput sequencing DNA methylation data

Supported sequencing protocols for identification of DNA methylation

  • WGBS
  • Pico-Methyl
  • EMSeq

Reduced-representation bisulfite sequencing (RRBS) is not supported!

Required software

  • Python 3
  • Cromwell workflow management system running in server mode
  • Docker (optional but highly recommended to improve reproducibility)

Required data

  • Paired-end raw sequencing FASTQ files (one file per strand)
  • Indexed reference genome using Bismark 0.22.2 and Bowtie 2.3.5.1

Used software via Docker container images

  • TrimGalore version 0.6.5
  • Bismark 0.22.2
  • Bowtie 2.3.5.1
  • FastQC

Your compute environment may not support Docker. Install those software locally and overwrite their paths using MethSeq command-line parameters.

Main data processing tasks

  • Trim and filter raw sequencing reads (TrimGalore!)
  • Align filtered reads to methylation-aware indexed reference genome (Bismark, Bowtie 2)
  • Deduplicate aligned reads to remove PCR-bias (Bismark)
  • Extract DNA methylation of CpG context genome-wide (Bismark)

Quality control and report tasks

  • Quality assessment of filtered sequencing reads (FastQC)
  • Overall report (Bismark)

Expected result files for each sample, independently of sequencing protocol

  • cpg_report Genome-wide DNA methylation in CpG context, contains strand information
  • cov Coverage file, doesn't contain strand information (only CpG with coverage)
  • bedgraph BedGraph file
  • mbias_png_1 M-Bias plot of forward strand (R1)
  • mbias_png_2 M-Bias Plot of reverse strand (R2, except PicoMethyl)
  • trim_stats_1 Trimming statistics of forward strand (R1)
  • trim_stats_2 Trimming statistics of reverse strand (R2)
  • align_stats Alignment stats
  • nucleotide_coverage
  • deduplicate_stats Deduplication statistics
  • mbias_stats M-Bias statistics (used to generate M-Bias plots)
  • splitting_stats
  • trim_qc_report_1 QC report of filtered reads of forward strand (R1)
  • trim_qc_report_zip_1 Zipped file containing QC statistics forward strand (R1)
  • trim_qc_report_2 QC report of filtered reads of reverse strand (R2)
  • trim_qc_report_zip_2 Zipped file containing QC statistics reverse strand (R2)
  • unmapped_qc_report_1 QC report of unmapped reads of forward strand (R1)
  • unmapped_qc_report_zip_1 Zipped file containing QC statistics forward strand (R1)
  • unmapped_qc_report_2 QC report of unmapped reads of reverse strand (R2, except PicoMethyl)
  • unmapped_qc_report_zip_2 Zipped file containing QC statistics reverse strand (R2, except PicoMethyl)
  • report Overall report

How to use

methseq accel|pico|emseq --fastq /path/to/fastqs [--fastq /path/to/other_fastqs] [trimming parameters] result_dir

By default, trimming step will only trim (Illumina) adapter sequences. Use the following trimming parameters to cut sequences after adapter removal. It is useful to remove methylation bias saw in M-Bias plot.

  • --five_prime_clip_R1 N remove N bases from the beginning (5') of forward reads (R1)
  • --three_prime_clip_R1 N remove N bases from the end (3') of forward reads (R1)
  • --five_prime_clip_R1 N remove N bases from the beginning (5') of reverse reads (R2)
  • --three_prime_clip_R1 N remove N bases from the end (3') of reverse reads (R2)

By default, trimming step will trim bases at the end (3') that have PHREAD score lower than 20. Use --quality N to change quality cutoff to N.

By default, trimming step will filter trimmed reads that are smaller than 20 bp. Use --length N to change read length cutoff to N.

MethSeq will

  1. Check if all samples are paired FASTQ files
  2. Check if indexed reference genome files exists
  3. Generate inputs JSON file
  4. Write workflow (WDL) and JSON files to result directory
  5. Submit WDL and JSON files to Cromwell server through its API
  6. Wait until workflow execution is completed
  7. If success, collect workflow output files copping (or moving) them to result directory

Extra: run MultiQC on result_dir folder!

Use cases

To process WGBS samples with trimming parameters to remove methylation bias saw in M-Bias plot.

methseq wgbs \
    --fastq /path/to/wgbs_fastqs \
    --five_prime_clip_R1 16 \
    --three_prime_clip_R1 16 \
    --five_prime_clip_R2 16 \
    --three_prime_clip_R2 16 \
    /path/to/wgbs_result

To process a single sample (EMSeq) passing path to specific file. It is useful to run MethSeq on files that are not in the same directory.

methseq emseq \
    --fastq_1 /path/to/emseq_1.fastq.gz \
    --fastq_2 /path/to/emseq_2.fastq.gz \
    /path/to/wgbs_result

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.