GithubHelp home page GithubHelp logo

pengsl-lab / deepr2cov Goto Github PK

View Code? Open in Web Editor NEW
6.0 1.0 1.0 138 KB

A deep representation on heterogeneous drug network, termed DeepR2cov, to discover potential agents for treating the excessive inflammatory response in COVID-19 patients.

Python 100.00%
network-representation-learning graph-representation-learning drug-discovery covid-19

deepr2cov's Introduction

DeepR2cov

A deep representation on heterogeneous drug network, termed DeepR2cov, to discover potential agents for treating the excessive inflammatory response in COVID-19 patients.

Data description

  • Example_metapath: A representative subset of meta paths.
  • CMapscore: Connectivity map score based on up- and down-regulated genes of SARS patients for 2439 drug compounds.

Requirements

DeepR2cov is tested to work under:

  • Python 3.6
  • Tensorflow 1.1.4
  • tflearn
  • numpy 1.14.0
  • sklearn 0.19.0

Quick start

  • Download the source code of BERT.

  • Manually replace the run_pretraining.py The network representation model and training regime in DeepR2cov are similar to the original implementation described in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". Therefore, the code of network representation of DeepR2cov can be downloaded from https://github.com/google-research/bert. But BERT uses a combination of two tasks, i.e,. masked language learning and the consecutive sentences classification. Nevertheless, different from natural language modeling, meta paths do not have a consecutive relationship. Therefore, DeepR2cov does not involve the continuous sentences training. If you want to run DeepR2cov, please manually replace the run_pretraining.py in BERT with this file.

  • Download the BERT-Base, Uncased model: 12-layer, 768-hidden, 12-heads. You can construct a vocab file (vocab.txt) of nodes and modify the config file (bert_config.json) which specifies the hyperparameters of the model.

  • Run create_pretraining_data.py to mask metapath sample.

 python create_pretraining_data.py   \
  --input_file=../example_metapath.txt   \
  --output_file=../tf_examples.tfrecord   \
  --vocab_file=../uncased_L-12_H-768_A-12/vocab.txt   \ 
  --do_lower_case=True   \  
  --max_seq_length=128   \  
  --max_predictions_per_seq=20   \
  --masked_lm_prob=0.15   \ 
  --random_seed=12345   \
  --dupe_factor=5 

The max_predictions_per_seq is the maximum number of masked meta path predictions per path sample. masked_lm_prob is the probability for masked token.

  • Run run_pretraining.py to train a network representation model.
 python run_pretraining.py   \  
  --input_file=../tf_examples.tfrecord   \  
  --output_dir=../RLearing_output   \  
  --do_train=True   \  
  --do_eval=True   \  
  --bert_config_file=../uncased_L-12_H-768_A-12/bert_config.json   \  
  --train_batch_size=32   \  
  --max_seq_length=128   \  
  --max_predictions_per_seq=20   \  
  --num_train_steps=20   \  
  --num_warmup_steps=10   \  
  --learning_rate=2e-5  
  • Run extract_features.py extract_features.py to attain the low-dimensional representation vectors of vertices.
 python extract_features.py   \  
  --input_file=../node.txt   \  
  --output_file=../output.jsonl   \  
  --vocab_file=../uncased_L-12_H-768_A-12/vocab.txt   \  
  --bert_config_file=../uncased_L-12_H-768_A-12/bert_config.json   \  
  --init_checkpoint=../RLearing_output/bert_model.ckpt   \  
  --layers=-1,-2,-3,-4   \  
  --max_seq_length=128   \  
  --batch_size=8 
  • Run PDI_drug_cov.py to predict of the confidence scores between drugs and TNF-α/IL-6.
 python PDI_drug_cov.py 
  • Run top_rank.py to select top 20 high-confidence drugs binding to TNF-α and IL-6, respectively.
 python top_rank.py   

Please cite our paper if you use this code and data in your work.

@article{DeepR2cov2021,
title = {DeepR2cov: deep representation learning on heterogeneous drug networks to discover anti-inflammatory agents for COVID-19},
author = {Wang, Xiaoqi and Xin, Bin and Tan, Weihong and Xu, Zhijian and Li, Kenli and Li, Fei and Zhong, Wu and Peng, Shaoliang},
journal = {Briefings in Bioinformatics},
year = {2021},
doi = {10.1093/bib/bbab226}
}

Contacts

If you have any questions or comments, please feel free to email: [email protected].

deepr2cov's People

Contributors

pengsl-lab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

deepr2cov's Issues

Ask a question about data

Is this data Example_metapath incomplete? If I want to reproduce your results, will it affect my ability to run this program?I hope you can answer my questions. Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.