GithubHelp home page GithubHelp logo

kepbod / dna-me-pipeline Goto Github PK

View Code? Open in Web Editor NEW

This project forked from encode-dcc/dna-me-pipeline

0.0 2.0 0.0 19.81 MB

DCC/DAC methylation pipeline source

License: MIT License

Shell 2.13% Python 4.07% Perl 71.36% Smarty 22.41% AngelScript 0.04%

dna-me-pipeline's Introduction

#DCC/DAC methylation pipeline source, runs for both single-end and paired-end data

##Pipeline Overview

The ENCODE Whole-Genome Bisulfite Sequencing (WGBS) pipeline is used for discovering methylation patterns to base granularity. Bisulfite treatment is used to convert cytosines into uracils, but leaves methylated cytosines unchanged. After mapping bisulfite sequencing reads against a Bismark transformed genome, this pipeline then extracts the CpG, CGH, and CHH methylation patterns genome wide. The WGBS pipeline inputs gzipped DNA-sequencing reads (fastqs) and a Bismark-transformed, Bowtie-indexed genome in a tar.gz archive file. These are processed to generate bam alignment files, which in turn produce:

  • Methylation state at CpG (bedMethyl.gz and bigBed)
  • The methylation state at CHG (bedMethyl.gz and bigBed)
  • The methylation state at CHH (bedMethyl.gz and bigBed)
  • Raw signal files of all reads (bigWig)
  • SamTools quality metrics, Bismark quality metrics
  • Pearson correlation, calculated from the two replicates' methylation states at CpG.

###Description of bedMethyl file The bedMethyl file is a bed9+2 file containing the number of reads and the percent methylation. Each column represents the following:

  1. Reference chromosome or scaffold
  2. Start position in chromosome
  3. End position in chromosome
  4. Name of item
  5. Score from 0-1000. Capped number of reads
  6. Strandedness, plus (+), minus (-), or unknown (.)
  7. Start of where display should be thick (start codon)
  8. End of where display should be thick (stop codon)
  9. Color value (RGB)
  10. Coverage, or number of reads
  11. Percentage of reads that show methylation at this position in the genome

###Genomic References Used in this Pipeline

####References

Krueger, Felix, and Simon R. Andrews. "Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications." Bioinformatics 27.11 (2011): 1571-1572.

Tsuji, Junko, and Zhiping Weng. "Evaluation of preprocessing, mapping and postprocessing algorithms for analyzing whole genome bisulfite sequencing data." Briefings in bioinformatics (2015): bbv103.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.