GithubHelp home page GithubHelp logo

zlatkojoncev / multistepretrosynthesisttl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from reymond-group/multistepretrosynthesisttl

0.0 0.0 0.0 22.18 MB

Multi-Step Retrosynthesis Tool based on Augmented Disconnection Aware Triple Transformer Loop Predictions

License: Other

Python 97.10% Jupyter Notebook 2.90%

multistepretrosynthesisttl's Introduction

Multistep Retrosynthesis by a Disconnection Aware Triple Transformer Loop

This repo complements the ChemRxiv preprint "Multistep retrosynthesis combining a disconnection aware triple transformer loop with a route penalty score guided tree search".

The goal of this tool is to predict multistep retrosynthesis routes using a tree search strategy and exploiting the Disconnection-Aware Retrosynthesis model. This single-step retrosynthesis model is augmented by combining a bias-free systematic tagging as well as a template-based-inspired tagging using a reaction center substructure identification from known reactions.

Setup Environment

conda create -n MultiStepRetro python=3.8.16 -y
conda activate MultiStepRetro

git clone https://github.com/reymond-group/MultiStepRetrosynthesisTTL.git
cd MultiStepRetrosynthesisTTL
pip install -e .

Download Models and Tagging Templates

Download models from Zenodo:

wget https://zenodo.org/record/8160148/files/USPTO_STEREO_separated_T0_AutoTag_260000.pt?download=1 -O models/USPTO_STEREO_separated_T0_AutoTag_260000.pt
wget https://zenodo.org/record/8160148/files/USPTO_STEREO_separated_T1_Retro_255000.pt?download=1 -O models/USPTO_STEREO_separated_T1_Retro_255000.pt
wget https://zenodo.org/record/8160148/files/USPTO_STEREO_separated_T2_Reagent_Pred_225000.pt?download=1 -O models/USPTO_STEREO_separated_T2_Reagent_Pred_225000.pt
wget https://zenodo.org/record/8160148/files/USPTO_STEREO_separated_T3_Forward_255000.pt?download=1 -O models/USPTO_STEREO_separated_T3_Forward_255000.pt

Commercial Building Blocks

The list of commercial compounds should be requested and downloaded from MolPort and/or from Enamine. SMILES should be canonicalized using the same environment and located as one SMILES per line and the path. The file should be referenced in the config file as "commercial_file_path".

Usage for Multistep Prediction

Edit the target molecule SMILES as shown in the default configuration file /configs/config_example.yaml, as well as search parameters. Then, start the multistep prediction in a terminal:

conda activate MultiStepRetro
retrosynthesis --config configs/config_example.yaml

Visualizing Results

Results are written into output/project_name/ as pickle files. Forward validated single step-reaction predictions are stored as output/project_name/DayJob__prediction.pkl, and full predicted route paths are stored as output/project_name/DayJob__tree.pkl, which refers to reaction indexes from prediction.pkl. Routes could be sorted by scores to get the best ones. Temporary checkpoints are constantly written in the output/project_name/ folder after each iteration to monitor the progress of retrosynthesis, it also serves to resume a job starting from a checkpoint. If logs are enabled, those are written into output/project_name/.

To visualize predicted routes, check this notebook /notebooks/visualize_results.ipynb or the following example:

import pandas as pd
import ttlretro.view_routes as vr

project_name = 'config_example'
log_time_stamp = 'YYYY_MM_DD__HHMMSS'

predictions = pd.read_pickle('output/{}/{}__prediction.pkl'.format(project_name, log_time_stamp))
tree = pd.read_pickle('output/{}/{}__tree.pkl'.format(project_name, log_time_stamp))
tree = vr.get_advanced_scores(tree=tree, predictions=predictions)

bests = vr.get_best_first_branches(
    tree=tree, 
    predictions=predictions, 
    num_branches=10, 
    score_metric='Fwd_Conf_Score'
)

vr.display_branch(
    branch_tree_index_or_list_rxn_id=0, 
    tree=bests, 
    predictions=predictions, 
    forwarddirection=True
)

Citations

This repository makes use of existing projects:

OpenNMT-py

OpenNMT: Neural Machine Translation Toolkit

OpenNMT technical report

@inproceedings{klein-etal-2017-opennmt,
    title = "{O}pen{NMT}: Open-Source Toolkit for Neural Machine Translation",
    author = "Klein, Guillaume  and
      Kim, Yoon  and
      Deng, Yuntian  and
      Senellart, Jean  and
      Rush, Alexander",
    booktitle = "Proceedings of {ACL} 2017, System Demonstrations",
    month = jul,
    year = "2017",
    address = "Vancouver, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P17-4012",
    pages = "67--72",
}

SCScore

Publication: SCScore: Synthetic Complexity Learned from a Reaction Corpus

GitHub repository: SCScore

@article{coley_scscore_2018,
	title = {{SCScore}: {Synthetic} {Complexity} {Learned} from a {Reaction} {Corpus}},
	author = {Coley, Connor W. and Rogers, Luke and Green, William H. and Jensen, Klavs F.},
	volume = {58},
	issn = {1549-9596},
	shorttitle = {{SCScore}},
	url = {https://doi.org/10.1021/acs.jcim.7b00622},
	doi = {10.1021/acs.jcim.7b00622},
	number = {2},
	urldate = {2022-09-02},
	journal = {Journal of Chemical Information and Modeling},
	month = feb,
	year = {2018},
	note = {Publisher: American Chemical Society},
	pages = {252--261},
}

Cite this work

@article{kreutter_multistep_2023,
	title = {Multistep retrosynthesis combining a disconnection aware triple transformer loop 
        with a route penalty score guided tree search},
	author = {Kreutter, David and Reymond, Jean-Louis},
	url = {https://chemrxiv.org/engage/chemrxiv/article-details/6422d09a62fecd2a83937199},
	doi = {10.26434/chemrxiv-2022-8khth-v2},
	publisher = {ChemRxiv},
	month = mar,
	year = {2023},
}

multistepretrosynthesisttl's People

Contributors

davkre avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.