GithubHelp home page GithubHelp logo

aspirincode / theano-dbns-for-drug-discovery Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jason-fer/theano-dbns-for-drug-discovery

0.0 2.0 0.0 166.58 MB

Massively Multitask Neural Networks for Drug Discovery

Python 80.98% Shell 19.02%

theano-dbns-for-drug-discovery's Introduction

+-------------------------------------------------------------------------------
| Massively Multitask Deep Learning for Drug Discovery
| Professor: Anthony Gitter
| Researchers: Jason Feriante, Vaidhyanathan Venkiteswaran (Vee)
| Email: [email protected], [email protected]
| Semester: CS790 Summer 2015
| Date: 15 June 2015
+-------------------------------------------------------------------------------

+-------------------------------------------------------------------------------
| Conventions:
+-------------------------------------------------------------------------------

prepended names represent different libraries:
sk_ = scikit learn
th_ = theano

e.g. sk_logistic_regression means this file uses scikit learn's version

+-------------------------------------------------------------------------------
| SBEL Jobs:
+-------------------------------------------------------------------------------
sbel_jobs: configurations for the PBS-Torque Schedular. Also see the README
in this folder for a few extra notes on how to submit & monitor SBEL jobs.
sbel_config: configuration files you need if you want to run SBEL jobs. These
need to be in your root home directory on the SBEL server.

-View examples in sbel_jobs

+-------------------------------------------------------------------------------
| HTCondor Jobs:
+-------------------------------------------------------------------------------
-View examples in wid_jobs

+-------------------------------------------------------------------------------
| Install Requirements:
+-------------------------------------------------------------------------------

-scikit learn (comes with anaconda)
-pip install Pillow (maybe needed for DBN image; maybe this can be removed.)


+-------------------------------------------------------------------------------
| Fold Data:
+-------------------------------------------------------------------------------
The generate the fold structures, use the 1024 bit strings provided by Spencer
and run:
$ python generate_folds.py

The data is organized in folders:
DUD-E: 102 targets
MUV: has 17 targets
Tox21: 12 targets
PCBA: 128 targets

For each target within each dataset there's always exactly 2x files:
<target_name>_actives.fl
<target_name>_inactives.fl

-Each file contains a 'shuffled' version of the original data
-Aside from being shuffled, it is identical to the original data Spencer gave 
us (1:1), the big difference is the assigned sha1 hash and the fold_id which we 
needed for our own deep learning experiments

The data in each row of each file is delimited by a space:
<sha1 hash> <is_active> <native_id> <fold_id> <1024 bitstring>


<sha1 hash> = a sha1 hash for building hashmaps or matrix models; 
i.e. hash =  sha1_hash(<1024 bit string>)
<is_active> = {0, 1} inactive = 0 or active = 1
<native_id> = this is the id used by this particular dataset (e.g. the ID used 
by Tox21)
<fold_id> = each file is divided into 5 folds used for testing, validation, and 
training... this was useful for our stratified k-fold experiments
<1024 bitstring> = the 1024 bitstring ECP4 fingerprint you generated for us

All rows in all files are organized the same:
1aadeae8f414b55798d95610ba68ed2d72c21430 1 CHEMBL472111 0 0000000000000000000010000000000000000000000100000000000000000000100000000000000010000100000000000000000000000000000000000100000001000000000000000001000000000010000000000001000000000000000000000000000000001000010110000000000000000000000000000010000000100000000000000000000000000000000000000000000000001001000000000000000000000000000001000000000000000000000010000100000000000000001000000000000000000000000000000000100001000000100000000101000000000000000000001010000000000000001000000000000000000000000000100000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000010000010000000000000001000000000000000000000000000000000000000001100010000001000000000000000000000000001000000000000010000000000000100010100100000000000001000000000000000000001000001010000100000000001000000000000001000000000010000000000000001000000000010000000000000100000000010000010010001000000000000000010000000100000000000000000000000000000000000000000000000000000000000

+-------------------------------------------------------------------------------
| HTCondor Job List
+-------------------------------------------------------------------------------
Beware: HTCondor has 2-3 day limit

condor_submit wid-stnn_dbn-pcba.sub

condor_q to do a 'qstat'
condor_rm remove job
condor_rm -constraint 'JobStatus =!= 2'   # kill jobs not running
condor_rm <username>                      # wipes out ALL jobs for  user
condor_release <username>                 # release held jobs.
condor_submit <file>.sh

condor_submit wid-logistic_regression-tox21.sub
condor_submit wid-logistic_regression-dude.sub
condor_submit wid-logistic_regression-muv.sub
condor_submit wid-logistic_regression-pcba.sub

condor_submit wid-random_forests-tox21.sub
condor_submit wid-random_forests-dude.sub
condor_submit wid-random_forests-muv.sub
condor_submit wid-random_forests-pcba.sub

condor_submit wid-stnn_dbn-tox21.sub
condor_submit wid-stnn_dbn-dude.sub
condor_submit wid-stnn_dbn-muv.sub
condor_submit wid-stnn_dbn-pcba.sub

theano-dbns-for-drug-discovery's People

Contributors

agitter avatar jason-fer avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.