GithubHelp home page GithubHelp logo

parrot's Introduction

Parrot

License: MIT
Implementation of reaction condition prediction with Parrot

Parrot

Contents

Publication

Xiaorui Wang, Chang-Yu Hsieh*, Xiaodan Yin, Jike Wang, Yuquan Li, Yafeng Deng, Dejun Jiang, Zhenxing Wu, Hongyan Du, Hongming Chen, Yun Li, Huanxiang Liu, Yuwei Wang, Pei Luo, Tingjun Hou*, Xiaojun Yao*. Generic Interpretable Reaction Condition Predictions with Open Reaction Condition Datasets and Unsupervised Learning of Reaction Center. Research 2023;6:Article 0231. DOI:10.34133/research.0231

Quickly Start From Gitpod

About 4 minutes.
Open in Gitpod

OS Requirements

This repository has been tested on Linux operating systems.

Python Dependencies

  • Python (version >= 3.7)
  • PyTorch (version >= 1.10.0)
  • RDKit (version >= 2019)
  • Transformers (version == 4.18.0)
  • Simpletransformers (version == 0.63.6)

Installation Guide

Create a virtual environment to run the code of Parrot.
It is recommended to use conda to manage the virtual environment.The installation method for conda can be found here.
Make sure to install pytorch with the cuda version that fits your device.
This process usually takes few munites to complete.

git clone https://github.com/wangxr0526/Parrot.git
cd Parrot
conda env create -f envs.yaml
conda activate parrot_env
pip install gdown wtforms flask flask_bootstrap

Use Parrot

You can use Parrot to predict suitable catalysts, solvents and reagents, and temperatures for reactions.
First download the model and datasest files by this command:

python preprocess_script/download_data.py

The links correspond to the paths of the zip files as follows:

https://drive.google.com/uc?id=1aX70qzZrJ9TZ9KpqnvUVR8WBxiTwXOsI    --->    dataset/source_dataset/USPTO_condition_final.zip

https://drive.google.com/uc?id=1uEqpkF4tPTlLIPdTyWJdXows7hKQbAAc    --->    dataset/pretrain_data.zip

https://drive.google.com/uc?id=1gFV2KdVKaLCTeb3nrzopyYHXbM0G_cr_    --->    outputs/Parrot_train_in_USPTO_Condition_enhance.zip

https://drive.google.com/uc?id=1bVB89ByGkYjiUtbvEcp1mgwmoKy5Ka2b    --->    outputs/best_rcm_model_pretrain.zip

https://drive.google.com/uc?id=1DmHILXSOhUuAzqF0JmRTx1EcOOQ7Bm5O    --->    outputs/best_mlm_model_pretrain.zip

We provide two usage methods, one is to use the command line, and the other is through the web interface.

Command

Then prepare the txt file containing the SMILES of the reactions you want to predict, and enter the following command:

cd Parrot
python inference.py --config_path path/to/config_file.yaml \
                    --input_path path/to/input_file.txt \
                    --output_path path/to/output.csv \
                    --num_workers NUM_WORKERS \
                    --inference_batch_size BATCH_SIZE \
                    --gpu CUDA_ID          # use cpu: CUDA_ID=-1

For example, using Parrot predictions trained on the USPTO-Condition dataset, use the following command:

python inference.py --config_path configs/config_inference_use_uspto.yaml \
                    --input_path test_files/input_demo.txt \
                    --output_path test_files/predicted_conditions.csv \
                    --num_workers 6 \
                    --inference_batch_size 8 \
                    --gpu 0

Or using Parrot predictions trained on the Reaxys-TotalSyn-Condition dataset, use the following command:

# Could be used to predict temperatures.
python inference.py --config_path configs/config_inference_use_reaxys.yaml \
                    --input_path test_files/input_demo.txt \
                    --output_path test_files/predicted_conditions.csv \
                    --num_workers 6 \
                    --inference_batch_size 8 \
                    --gpu 0

Web Interface

Use this command to run web interface.

cd web_app
python app.py

Open the browser, enter: http://127.0.0.1:8000 and you will see the following interface:

web_interface
Support three input methods

Draw


web_interface_draw

Reaction SMILES


web_interface_str

TXT Files


web_interface_upfiles

Reproduce the results

[1] Get Dataset

The complete processed USPTO-Condition, USPTO-Suzuki-Condition and pretrain dataset after USE Parrot is already in dataset/source_dataset/USPTO_condition_final and dataset/pretrain_data, if you want to recreate the USPTO-Condition dataset, you can read here. If you want to use Reaxys-TotalSyn-Condition, you can only process it from scratch. We provide the ReaxysID of the data and the script for processing. For details, you can read here.The final dataset directory structure should be as follows:

dataset/
├── pretrain_data
│   ├── mlm_rxn_train.txt                # MLM pretrain dataset (train)
│   ├── mlm_rxn_val.txt                  # MLM pretrain dataset (validation)
│   ├── rxn_center_modeling.pkl          # RCM pretrain dataset (train + validation)
│   └── vocab.txt                        # Parrot reaction SMILES vocabulary
└── source_dataset
    ├── Reaxys_total_syn_condition_final
    │   ├── Reaxys_total_syn_condition.csv
    │   ├── Reaxys_total_syn_condition_alldata_idx.pkl
    │   └── Reaxys_total_syn_condition_condition_labels.pkl
    └── USPTO_condition_final
        ├── canonical_pistachio_label.json
        ├── condition_replace_dict_final.json
        ├── USPTO_condition_alldata_idx.pkl
        ├── USPTO_condition_aug_n5_alldata_idx.pkl
        ├── USPTO_condition_aug_n5_condition_labels.pkl
        ├── USPTO_condition_aug_n5.csv
        ├── USPTO_condition_condition_labels.pkl
        ├── USPTO_condition.csv
        ├── USPTO_condition_pred_category.csv
        └── USPTO_condition_pred_category_org.csv


[2] Pretrain

  • Masked Language Modeling pretrain:
    python pretrain_mlm.py --gpu CUDA_ID --config_path configs/pretrain_mlm_config.yaml
    
  • masked Reaction Center Modeling pretrain:
    python pretrain_rcm.py --gpu CUDA_ID --config_path configs/pretrain_rcm_config.yaml
    

After pretraining, you will get best_mlm_uspto_pretrain and best_rcm_uspto_pretrain containing model state in outputs.

[3] Train Parrot

Training in the USPTO-Condition dataset:

  • Parrot-ML
    python train_parrot_model.py --gpu CUDA_ID --config_path configs/config_uspto_condition.yaml
    
  • Parrot-ML-E
    python train_parrot_model.py --gpu CUDA_ID \
                                --config_path configs/config_uspto_condition_aug_n5_lr_low.yaml
    

Training in the Reaxy-TotalSyn-Condition dataset:

  • Parrot-RCM
    python train_parrot_model.py --gpu CUDA_ID \
                                 --config_path configs/config_reaxys_totalsyn_condition.yaml
    

[4] Test Parrot

Test in the USPTO-Condition dataset:

  • Parrot-ML-E
    python test_parrot_model.py --gpu CUDA_ID \
                                --config_path configs/config_uspto_condition_aug_n5_lr_low.yaml
    

Test in the Reaxy-TotalSyn-Condition dataset:

  • Parrot-RCM
    python test_parrot_model.py --gpu CUDA_ID \
                                --config_path configs/config_reaxys_totalsyn_condition.yaml
    

Cite Us

@article{
doi:10.34133/research.0231,
author = {Xiaorui Wang  and Chang-Yu Hsieh  and Xiaodan Yin  and Jike Wang  and Yuquan Li  and Yafeng Deng  and Dejun Jiang  and Zhenxing Wu  and Hongyan Du  and Hongming Chen  and Yun Li  and Huanxiang Liu  and Yuwei Wang  and Pei Luo  and Tingjun Hou  and Xiaojun Yao },
title = {Generic Interpretable Reaction Condition Predictions with Open Reaction Condition Datasets and Unsupervised Learning of Reaction Center},
journal = {Research},
volume = {6},
pages = {0231},
year = {2023},
doi = {10.34133/research.0231},
URL = {https://spj.science.org/doi/abs/10.34133/research.0231},
eprint = {https://spj.science.org/doi/pdf/10.34133/research.0231},
}

parrot's People

Contributors

dinglee17 avatar wangxr0526 avatar xiaodanyin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.