aspirincode / theano-dbns-for-drug-discovery Goto Github PK
View Code? Open in Web Editor NEWThis project forked from jason-fer/theano-dbns-for-drug-discovery
Massively Multitask Neural Networks for Drug Discovery
This project forked from jason-fer/theano-dbns-for-drug-discovery
Massively Multitask Neural Networks for Drug Discovery
+------------------------------------------------------------------------------- | Massively Multitask Deep Learning for Drug Discovery | Professor: Anthony Gitter | Researchers: Jason Feriante, Vaidhyanathan Venkiteswaran (Vee) | Email: [email protected], [email protected] | Semester: CS790 Summer 2015 | Date: 15 June 2015 +------------------------------------------------------------------------------- +------------------------------------------------------------------------------- | Conventions: +------------------------------------------------------------------------------- prepended names represent different libraries: sk_ = scikit learn th_ = theano e.g. sk_logistic_regression means this file uses scikit learn's version +------------------------------------------------------------------------------- | SBEL Jobs: +------------------------------------------------------------------------------- sbel_jobs: configurations for the PBS-Torque Schedular. Also see the README in this folder for a few extra notes on how to submit & monitor SBEL jobs. sbel_config: configuration files you need if you want to run SBEL jobs. These need to be in your root home directory on the SBEL server. -View examples in sbel_jobs +------------------------------------------------------------------------------- | HTCondor Jobs: +------------------------------------------------------------------------------- -View examples in wid_jobs +------------------------------------------------------------------------------- | Install Requirements: +------------------------------------------------------------------------------- -scikit learn (comes with anaconda) -pip install Pillow (maybe needed for DBN image; maybe this can be removed.) +------------------------------------------------------------------------------- | Fold Data: +------------------------------------------------------------------------------- The generate the fold structures, use the 1024 bit strings provided by Spencer and run: $ python generate_folds.py The data is organized in folders: DUD-E: 102 targets MUV: has 17 targets Tox21: 12 targets PCBA: 128 targets For each target within each dataset there's always exactly 2x files: <target_name>_actives.fl <target_name>_inactives.fl -Each file contains a 'shuffled' version of the original data -Aside from being shuffled, it is identical to the original data Spencer gave us (1:1), the big difference is the assigned sha1 hash and the fold_id which we needed for our own deep learning experiments The data in each row of each file is delimited by a space: <sha1 hash> <is_active> <native_id> <fold_id> <1024 bitstring> <sha1 hash> = a sha1 hash for building hashmaps or matrix models; i.e. hash = sha1_hash(<1024 bit string>) <is_active> = {0, 1} inactive = 0 or active = 1 <native_id> = this is the id used by this particular dataset (e.g. the ID used by Tox21) <fold_id> = each file is divided into 5 folds used for testing, validation, and training... this was useful for our stratified k-fold experiments <1024 bitstring> = the 1024 bitstring ECP4 fingerprint you generated for us All rows in all files are organized the same: 1aadeae8f414b55798d95610ba68ed2d72c21430 1 CHEMBL472111 0 0000000000000000000010000000000000000000000100000000000000000000100000000000000010000100000000000000000000000000000000000100000001000000000000000001000000000010000000000001000000000000000000000000000000001000010110000000000000000000000000000010000000100000000000000000000000000000000000000000000000001001000000000000000000000000000001000000000000000000000010000100000000000000001000000000000000000000000000000000100001000000100000000101000000000000000000001010000000000000001000000000000000000000000000100000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000010000010000000000000001000000000000000000000000000000000000000001100010000001000000000000000000000000001000000000000010000000000000100010100100000000000001000000000000000000001000001010000100000000001000000000000001000000000010000000000000001000000000010000000000000100000000010000010010001000000000000000010000000100000000000000000000000000000000000000000000000000000000000 +------------------------------------------------------------------------------- | HTCondor Job List +------------------------------------------------------------------------------- Beware: HTCondor has 2-3 day limit condor_submit wid-stnn_dbn-pcba.sub condor_q to do a 'qstat' condor_rm remove job condor_rm -constraint 'JobStatus =!= 2' # kill jobs not running condor_rm <username> # wipes out ALL jobs for user condor_release <username> # release held jobs. condor_submit <file>.sh condor_submit wid-logistic_regression-tox21.sub condor_submit wid-logistic_regression-dude.sub condor_submit wid-logistic_regression-muv.sub condor_submit wid-logistic_regression-pcba.sub condor_submit wid-random_forests-tox21.sub condor_submit wid-random_forests-dude.sub condor_submit wid-random_forests-muv.sub condor_submit wid-random_forests-pcba.sub condor_submit wid-stnn_dbn-tox21.sub condor_submit wid-stnn_dbn-dude.sub condor_submit wid-stnn_dbn-muv.sub condor_submit wid-stnn_dbn-pcba.sub
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.