GithubHelp home page GithubHelp logo

aspirincode / symbolic_rxn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wesleyyc/symbolic_rxn

0.0 1.0 0.0 47.68 MB

Integrating Deep Neural Networks and Symbolic Inference for Organic Reactivity Prediction

License: GNU General Public License v3.0

Python 98.41% Shell 1.59%

symbolic_rxn's Introduction

Symbolic RXN

Integrating Deep Neural Networks and Symbolic Inference for Organic Reactivity Prediction [ChemRxiv]

Code for the inference pipeline is here, and the complete training pipeline will be updated once the paper is accepted.

System Requirements

  • conda=4.7.10 (from here with Python 3.6.9)
  • tensorflow-gpu=1.12.0 (from conda, and GPU is recommended)
  • rdkit=2019.09.1 (from conda)
  • gurobi=8.1.1 (from conda, and free academic licence can be found here)
  • h5py=2.9.0(from conda)
  • other softwares required by the above packages

Installation Guide

Follwoings are the intructions for installing our software and the dependencies specified above:

# the total installation time should be less than 15 minutes

# download the source code
git clone https://github.com/WesleyyC/symbolic_rxn.git
cd symbolic_rxn

# create a py3 conda env
conda create -n symbolic_rxn python=3
conda activate symbolic_rxn  

# install the dependencies
# tensorflow-gpu
conda install tensorflow-gpu=1.12.0
# RDKit
conda install -c conda-forge rdkit=2019.09.1
# gurobi
conda config --add channels http://conda.anaconda.org/gurobi
conda install gurobi=8.1.1

Data Preprocessing

To better balance the GPU/CPU workload during training/inference, we precompute the graph features and store them in HDF5 format for fast random query. Followings are the instructions for digesting the raw text input into our graph feature representation.

# prepare the data
unzip data.zip

# digesting a small demo dataset w/ 2000 reactions (~30 secs)
python -m reactivity_prediction.data_digestion --input data/demo.txt --output data/hdf5_demo 

# digesting the full test dataset (~10 mins)
python -m reactivity_prediction.data_digestion --input data/test.txt --output data/hdf5_test

Downloading the Pre-Trained Models

We provided the pre-trained models for reproducing our Top-K performance on the USPTO dataset:

# download the ensembles models
wget https://www.dropbox.com/s/enlfd7vb5u7uuex/ckpt.zip

# unzip the models
unzip ckpt.zip

Running a Sample Pre-trained Model on the Small Demo Dataset

Followings are the instructions for running a single pre-trained model on the provided demo dataset:

# reactivity prediction w/ existing ckpt (<1 mins with GPU)
python -m reactivity_prediction.run_model \
  --mode infer --ckpt ckpt/mdl_0 \
  --eval_input data/hdf5_demo --eval_output demo_output

# evaluate reactivity prediction (<30 secs)
python -m reactivity_prediction.eval --input demo_output/delta_predictions.pkl
# and the program will report the bond changes coverage:
# ==========================================================
# ================ Reaction Bond Prediction ================
# ==========================================================
# as well as the delta prediction accuracy/F-1 socre:
# ==========================================================
# ==================== Delta Evaluation ====================
# ==========================================================

# run symbolic inference and evaluation (~30 mins with multi cores)
python -m octet_sampling.run_sampler --input demo_output/delta_predictions.pkl 
# and the program will report in the last row with the Top-K prediction accuracy:
# ==================================== #2000 of Reactions Evaluated ====================================
# Gurobi: Top1: ... Top2: ... Top3: ... Top5: ... Top20: ... Average Time: ...

Reproducing the Full USPTO Test Dataset with the Ensemble Models:

Followings are the instruction for reproducing our reprorted Top-k prediction accuracy on the USPTO test dataset using the ensemble models:

# running the following script to genereate ensembled predictions (~80 mins or ~10 mins/model)
bash reactivity_prediction/run_ensemble_inference.sh

# evaluate reactivity prediction (<10 mins)
python -m reactivity_prediction.eval --input ensemble_output/delta_predictions.pkl

# run symbolic inference and evaluation (<1 day)
python -m octet_sampling.run_sampler --input ensemble_output/delta_predictions.pkl

Contact

symbolic_rxn's People

Contributors

wesleyyc avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.