GithubHelp home page GithubHelp logo

agentbind's Introduction

AgentBind

DOI

AgentBind is a machine-learning framework for analyzing context regions of binding sites and identifying specific non-coding nucleotides with strong effects on binding activities. This code repository contains code for the classification + visualization experiments with the DanQ and DeepSEA architectures respectively.

Preprint: https://www.biorxiv.org/content/10.1101/2020.02.26.965343v1.full

System Requirement & Installation

All experiments are executed on CentOS Linux 7 (core) with Python (v2.7.5). Prior to your code execution, please make sure you have installed the following tools/libraries.

Install FIMO from the MEME-suite

You can download the MEME-suite from http://meme-suite.org/doc/download.html. This will give you a package of tools including FIMO. You also need to run the following command line to set up a short-cut for FIMO:

export PATH={YOUR-PATH}/MEME-Suite/bin:$PATH

python libraries

Our code requires external python libraries including tensorflow v1.9.0 GPU-version, biopython v1.71, numpy v1.15.4, six v1.14.0, scikit-image v0.14.5, and matplotlib. You can install them with the pip package manager:

pip install numpy six matplotlib biopython sklearn scikit-image tensorflow-gpu==1.9.0

Data Download

Data for experiments with the DanQ architecture https://drive.google.com/file/d/12mrLk9Ci7u2tKB8kuqldGXE9ghAzpbUk/view?usp=sharing

Data for experiments with the DeepSEA architecture https://drive.google.com/file/d/1UaaqgFlce9FSaBX2RoIz9pDaXacwQ3lW/view?usp=sharing

Run

AgentBind.py is the go-to python script which execute all the experiments.

Required parameters:

  • --datadir: the directory where you stored the downloaded data.
  • --motif: a text file containing the names, motifs, and ChIPseq files of TFs of interest. This text file can be found in the given data at {your-data-path}/table_matrix/table_core_motifs.txt.
  • --workdir: a directory where to store all the intermediate/oversized files including the well-trained models, one-hot-encoded input sequences, and Grad-CAM annoation scores.
  • --resultdir: a directory where to store all the results.

To run AgentBind, you can simply execute:

python AgentBind.py 
--motif {your-data-path}/table_matrix/table_core_motifs.txt 
--workdir {your-work-path}
--datadir {your-data-path}
--resultdir {your-result-path}

AgentBind reports results of two situations, core motifs present (c) and blocked (b). You can find the correspounding classification results (AUC curves) in: {your-result-path}/{b or c}/{TF-name}+GM12878/. And the Grad-CAM annoation scores are available at {your-work-path}/{TF-name}+GM12878/seqs_one_hot_{b or c}/vis-weights-total/weight.txt.

The python program "AgentBind.py" takes ~24-48 hours to complete. If you need the Grad-CAM annotation scores only, you can directly download them here (DanQ version only):

For questions on usage, please open an issue, submit a pull request, or contact Melissa Gymrek ([email protected]) or An Zheng ([email protected]).

agentbind's People

Contributors

pandaman-ryan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.