GithubHelp home page GithubHelp logo

cpgenie's Introduction

CpGenie

A deep learning method for predicting DNA methylation level of CpG sites from the sequence context, and predicting non-coding variants' effects on DNA methylation.

Dependencies

Note: You can run CpGenie on CPU by replacing all the nvidia-docker below with docker, but it will be extremely slow and thus highly not recommended.

Predict DNA methylation level of CpG sites

  • Prepare a FASTA file (example) of the 1001 bp sequence context centered at the CpG you wish to predict, one sequence for one CpG. The 501 nucleotide should be the 'C' of the CpG.

  • Process the FASTA file and make predictions with 50 CpGenie models trained on RRBS datasets from ENCODE immortal cell lines.

     	docker pull haoyangz/cpgenie:CUDA_VER
     	mkdir -p OUTPUT_DIR
     	nvidia-docker run -u $(id -u) -v FASTA_FILE:/in.fa -v OUTPUT_DIR:/outdir \
     			--rm haoyangz/cpgenie:CUDA_VER python main.py ORDER -cpg_fa /in.fa -cpg_out /outdir
    • CUDA_VER: 'cuda7.0' or 'cuda8.0' depending on your NVIDIA driver version.
    • FASTA_FILE: the absolute path to the FASTA file of sequences to predict.
    • OUTPUT_DIR: the absolute path to the output directory, under which the prediction from each of the 50 CpGenie models will be saved.
    • ORDER: the following orders can be both used and seperated by space:
      • -embed: data preprocessing. The output will be saved under $OUTPUT_DIR/embedded.h5.
      • -cpg: make prediction and save under $OUTPUT_DIR/CpGenie_pred.
    • If you wish to only predict for a subset of the 50 CpGenie models
      • Download the 50 CpGenie models to a customized directory (e.g. YOUR_MODEL_DIR, an absolute path)

         	mkdir -p YOUR_MODEL_DIR
         	cd YOUR_MODEL_DIR
         	wget http://gerv.csail.mit.edu/CpGenie_models.tar.gz
         	tar -zxvf CpGenie_models.tar.gz
      • Keep only the models you want

      • Run the following instead:

         	docker pull haoyangz/cpgenie:CUDA_VER
         	mkdir -p OUTPUT_DIR
         	nvidia-docker run -u $(id -u) -v FASTA_FILE:/in.fa -v OUTPUT_DIR:/outdir \
         			-v YOUR_MODEL_DIR:/modeldir --rm haoyangz/cpgenie:CUDA_VER \
         			python main.py ORDER -cpg_fa /in.fa -cpg_out /outdir \
         				-modeltop /modeldir/models

Predict the functional score of sequence variants

  • Prepare the variants to score in VCF file (example).

  • Score each variant by the predicted impact on DNA methylation in 50 RRBS datasets.

     	docker pull haoyangz/cpgenie:CUDA_VER
     	mkdir -p OUTPUT_DIR
      	nvidia-docker run -u $(id -u) -v VCF_FILE:/in.vcf -v OUTPUT_DIR:/outdir \
     			--rm haoyangz/cpgenie:CUDA_VER python main.py ORDER -var_vcf /in.vcf -var_outdir /outdir
    • CUDA_VER: 'cuda7.0' or 'cuda8.0' depending on your NVIDIA driver version.
    • VCF_FILE: the absolute path to the VCF file to score.
    • OUTPUT_DIR: the absolute path to the output directory.
    • ORDER: the following orders can be both used and seperated by space:
      • -var_prep: find all CpG sites within 500 bp to each variant and prepare the right format under $OUTPUT_DIR/CpGenie_processed.
      • -var_score: make predictions on the data preprocessed in the previous step. For each variant, the predicted absolute change of the following metrics are generated for each of the 50 RRBS datasets, resulting in a 250-dim feature vector. The output will be saved as $OUTPUT_DIR/CpGenie_var_pred, where the first line is a header of feature names and then one line for each variant's 250-dim feature vector.
        • sum of methylation within 500 bp
        • max methylation within 500 bp
        • log odds of the max methylation within 500 bp
        • mean methylation within 500 bp
        • log odds of the mean methylation within 500 bp

cpgenie's People

Contributors

haoyangz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.