askcos / askcos-core Goto Github PK

Python package for the ASKCOS platform for prediction of chemical reactivity

License: Other

Dockerfile 0.08% Makefile 0.17% Python 99.75%

askcos-core's Introduction

November 2023 Update

We have fully migrated the application and all of its interfaces, underlying models, etc. to Gitlab. Please see https://gitlab.com/mlpds_mit/askcosv2/askcos2_core for instructions on deployment; components in their own repositories will be automatically pulled during setup based on the provided config file in the askcos2_core repo.

ASKCOS:

Software package for the prediction of feasible synthetic routes towards a desired compound and associated tasks related to synthesis planning. Originally developed under the DARPA Make-It program and now being developed under the MLPDS Consortium.

Please note that the MPL 2.0 license for this repository does not apply to the data and trained models. The data and trained models are released under CC BY-NC-SA (i.e., are for noncommercial use only).

Contributors include Connor Coley, Mike Fortunato, Hanyu Gao, Pieter Plehiers, Matthew Cameron, Max Liu, Yuran Wang, Thomas Struble, Jiannan Liu, and Yiming Mo.

Where to find newer ASKCOS releases

This repository contains the original ASKCOS project and will no longer be updated, with the final release being v0.4.1. However, the ASKCOS project is still under very active development! The project has been split into multiple smaller repositories for many reasons including improved modularity. The askcos, deploy, and makeit directories have been migrated to askcos-site, askcos-deploy, and askcos-core, respectively. In addition, version numbers have changed to a date based system, with the first new open-source release being version 2020.07. We highly encourage you to check out and update to the new version! For any questions or comments, please contact [email protected].

Quick start using Google Cloud

# (1) Create a Google Cloud instance
#     - recommended specs: 8 vCPUs, 64 GB memory
#     - select Ubuntu 18.04 LTS Minimal
#     - upgrade to a 100 GB disk
#     - allow HTTP and HTTPS traffic

# (2) Install docker
#     - https://docs.docker.com/engine/install/ubuntu/
sudo apt-get update
sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common -y
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io -y
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

# (3) Install docker-compose
#     - https://docs.docker.com/compose/install/
sudo curl -L "https://github.com/docker/compose/releases/download/1.27.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

# (4) Install git lfs
#     - https://github.com/git-lfs/git-lfs/wiki/Installation
sudo apt-get install software-properties-common -y
sudo add-apt-repository ppa:git-core/ppa -y
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs -y
git lfs install

# (5) Pull
git clone https://github.com/ASKCOS/ASKCOS
cd ASKCOS
git lfs pull

# (6) Build & run
docker build -t askcos/askcos . # build
cd deploy
bash deploy.sh deploy     # start containers (detached) and run other initialization tasks
docker-compose logs -f    # start tailing logs (can CTRL+C to exit)

# (7) Navigate to your instance's external IP
#     - note that it may take ~5 minutes for the retro transformer workers to start up
#     - you can check the status of their startup by looking at "server status"
#     - the first request to a website process may take ~10 seconds
#     - the first request to a retro transform worker may take ~5-10 seconds
#     - the first request to the forward predictor may take ~60 seconds

First Time Deployment with Docker

Prerequisites

If you're building the image from scratch, make sure git (and git lfs) is installed on your machine
Install Docker OS specific instructions
Install docker-compose installation instructions

(Optional) Building the ASKCOS Image

The askcos image itself can be built using the Dockerfile in this repository.

$ git clone https://github.com/ASKCOS/ASKCOS
$ cd ASKCOS
$ git lfs pull
$ docker build -t askcos/askcos .

NOTE: For application deployment, double check the image tag used in the docker-compose.yml file and be sure to tag your newly built image with the same image name. Otherwise, the image tag used in docker-compose.yml will be pulled and deployed instead of the image that was just built.

Deploying the Web Application

Deployment is initiated by a bash script that runs a few docker-compose commands in a specific order. Several database services need to be started first, and more importantly seeded with data, before other services (which rely on the availability of data in the database) can start. The bash script can be found and should be run from the deploy folder as follows:

$ bash deploy.sh command [optional arguments]

There are a number of available commands, including the following for common deploy tasks:

deploy: runs standard first-time deployment tasks, including seed-db
update: pulls new docker image from GitLab repository and restarts all services
seed-db: seed the database with default or custom data files
start: start a deployment without performing first-time tasks
stop: stop a running deployment
clean: stop a running deployment and remove all docker containers

For a running deployment, new data can be seeded into the database using the seed-db command along with arguments indicating the types of data to be seeded. Note that this will replace the existing data in the database. The available arguments are as follows:

-b, --buyables: specify buyables data to seed, either default or path to data file
-c, --chemicals: specify chemicals data to seed, either default or path to data file
-x, --reactions: specify reactions data to seed, either default or path to data file
-r, --retro-templates: specify retrosynthetic templates to seed, either default or path to data file
-f, --forward-templates: specify forward templates to seed, either default or path to data file

For example, to seed default buyables data and custom retrosynthetic pathways, run the following from the deploy folder:

$ bash deploy.sh seed-db --buyables default --retro-templates /path/to/my.retro.templates.json.gz

To update a deployment, run the following from the deploy folder:

$ bash deploy.sh update --version x.y.z

To stop a currently running application, run the following from the deploy folder:

$ bash deploy.sh stop

If you would like to clean up and remove everything from a previous deployment (NOTE: you will lose user data), run the following from the deploy folder:

$ bash deploy.sh clean

Important Notes

Recommended hardware

We recommend running this code on a machine with at least 8 compute cores (16 preferred) and 64 GB RAM (128 GB preferred).

First startup

The celery worker will take a few minutes to start up (possibly up to 5 minutes; it reads a lot of data into memory from disk). The web app itself will be ready before this, however upon the first get request (only the first for each process) a few files will be read from disk, so expect a 10-15 second delay.

Scaling workers

Only 1 worker per queue is deployed by default with limited concurrency. This is not ideal for many-user demand. You can easily scale the number of celery workers you'd like to use with

$ docker-compose up -d --scale tb_c_worker=N tb_c_worker

where N is the number of workers you want, for example. The above note applies to each worker you start, however, and each worker will consume RAM. You can also adjust the default number of workers defined by the variables at the top of the deploy.sh script.

Managing Django

If you'd like to manage the Django app (i.e. - run python manage.py ...), for example, to create an admin superuser, you can run commands in the running app service (do this after docker-compose up) as follows:

$ docker-compose exec app bash -c "python /usr/local/ASKCOS/askcos/manage.py createsuperuser"

In this case you'll be presented an interactive prompt to create a superuser with your desired credentials.

Data migration to askcos-data

In the v0.4.1 release of ASKCOS, data and models have been migrated to a separate repository at https://github.com/ASKCOS/askcos-data. The pre-built ASKCOS Docker image available from Docker Hub already contains the data and models. For local use, you will need to clone the askcos-data repository separately:

$ cd ASKCOS/makeit
$ git clone https://github.com/ASKCOS/askcos-data data

How to run individual modules

Many of the individual modules -- at least the ones that are the most interesting -- can be run "standalone". Examples of how to use them are often found in the if __name__ == '__main__' statement at the bottom of the script definitions. For example...

Using the learned synthetic complexity metric (SCScore)

makeit/prioritization/precursors/scscore.py

Obtaining a single-step retrosynthetic suggestion with consideration of chirality

makeit/retrosynthetic/transformer.py

Finding recommended reaction conditions based on a trained neural network model

makeit/synthetic/context/neuralnetwork.py

Using the template-free forward predictor

makeit/synthetic/evaluation/template_free.py

Using the coarse "fast filter" (binary classifier) for evaluating reaction plausibility

makeit/synthetic/evaluation/fast_filter.py

Integrated CASP tool

For the integrated synthesis planning tool at makeit/application/run.py, there are several options available. The currently enabled options for the command-line tool can be found at makeit/utilities/io/arg_parser.py. There are some options that are only available for the website and some that are only available for the command-line version. As an example of the former, the consideration of popular but non-buyable chemicals as suitable "leaf nodes" in the search. It is highly recommended to use the web interface when possible.

askcos-core's People

Contributors

Stargazers

Watchers

Forkers

fanwangm thomasstruble sparklestheunicorn fourthievesvinegar ydzdfci itai-levin gzahoranszky jenna-fromer great3riftvalley aerinko082 runzexu2023 yaketyhacks

askcos-core's Issues

Wondering about dataset update

First of all, Thank you for your wonderful work in CASP area.
I used askcos based on python using docker.
I wonder that askcos-data in the docker you made is the newest version? (like public release 2020.07)
I'd really appreciate it if you could reply.

tensorflow gpu

I'm running individual modules of template-free forward predictor.

Is there an option or argument that I can run the script using gpu? I will run about millions building blocks and it will cost long time. Do you have any suggestions on large amount batch process? Thanks
I got different results between individual modules and online predictor
For example, react1: NNC(=O)CCc1ccc(S(=O)(=O)N2CCOCC2)cc1, react2: O=C(CSC1=CCS(=O)(=O)C1)Nc1ccccc1
From askcos.mit.edu, I input those two reactants in the fields and leave solvent blank.
The result shows O=C(CSC1CCS(=O)(=O)C1)Nc1ccccc1 with prob of 0.8978.
However, from python askcos/synthetic/evaluation/template_free.py 'NNC(=O)CCc1ccc(S(=O)(=O)N2CCOCC2)cc1.O=C(CSC1=CCS(=O)(=O)C1)Nc1ccccc1', I got NNC(=O)CCc1ccc(S(=O)(=O)N2CCOCC2)cc1 with prob of 0.9059. This predicted product is actually one of the reactant.
Thanks

error when using ASKCOS core

Hi,

I am interested in the ASKCOS tool and trying to use the retrosynthetic transformer. I followed the instruction to install the package and run the command python3 askcos/retrosynthetic/transformer.py
but got the error in the following figure

could you please help me with this problem? Thanks!

retro transformer.py issue

I encountered the error message below when I ran transformer.py individually.
INFO@retro_transformer : [1.278s] Loading precursor prioritizer for RetroTransformer
INFO@pricer : [2.305s] Cannot connect to mongodb to load prices
INFO@pricer : [3.839s] Loaded prices from flat file
INFO@retro_transformer : [3.846s] Loading fast filter for RetroTransformer
INFO@fast_filter : [3.861s] Starting to load fast filter
INFO@fast_filter : [5.480s] Done loading fast filter
INFO@retro_transformer : [5.497s] Using default clustering for RetroTransformer
INFO@pricer : [6.543s] Cannot connect to mongodb to load prices
INFO@pricer : [7.897s] Loaded prices from flat file
INFO@retro_transformer : [7.913s] Loading retro-synthetic transformer
INFO@retro_transformer : [7.928s] reading from file
INFO@template_transformer: [7.942s] Loading templates from askcos-core/askcos/data/templates/retro.templates.json.gz
INFO@template_transformer: [14.288s] Loaded templates. Using 163723 templates
[(1, 'CCOC(=O)[C@H]1CC@@H C@@HN1', 109659, [], 0.0)]

Thanks

error: UnicodeDecodeError: 'utf-8' codec can't decode byte

Hi,

I am trying to use the retrosynthetic transformer and I encountered some problem. I created a jupyter notebook in the folder askcos-core, and I can successfully import the packages with code
import time
import unittest
import askcos.global_config as gc
import askcos.retrosynthetic.transformer as retro_trans

however, when I try to use the transformer with the following code
t = retro_trans.RetroTransformer(load_all=False, use_db=True)
t.load()
I got the error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 169: invalid continuation byte

the detailes can also be seen in the figures

I am confused about this and could you please help me with it? Many thansk

Is there any literature to explain the meaning of four stages in askcos-data/models/context/v2?

I would appreciate it if you can provide some paper describing the graph models and reaction condition prediction quantitatively.

what is the meaning of output of quantitative condition predictions?

Hi,
I have two questions below:
1.Does the "Reagents" column include catalysts and solvents, or just include reagents?
2.The "(Amount)" beside the "Reagents" means what? mol, ml, or g?
Thank u very much.

problem running individual modules

I followed the instruction on readme.
I created anaconda environment from environment.yml and checked the environment includes all packages in requirements.txt
I download askcos-data and put in askco-core/askcos/data
Then I add the project in PYTHONPATH. (I used *** to mark personal info here)
On CentOS 7, export PYTHONPATH=${PYTHONPATH}: *** /askcos-core
On Windows 10, I used GUI and add the *** /askcos-core in PATH under the environment variables
After I finished steps above I run the command below
python askcos/synthetic/evaluation/template_free.py CCCO.CCCN
I got error messages below:
2022-03-24 10:35:50.902925: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Out of range: Read less bytes than requested
Traceback (most recent call last):
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(args)
File "** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: Read less bytes than requested
[[{{node save/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "askcos/synthetic/evaluation/template_free.py", line 131, in
scorer = TemplateFreeNeuralNetScorer()
File "askcos/synthetic/evaluation/template_free.py", line 31, in __ init __
self.model = TFFP()
File "*** /askcos-core/askcos/synthetic/evaluation/rexgen_direct/predict.py", line 11, in __ init __
self.finder.load_model()
File "*** /askcos-core/askcos/synthetic/evaluation/rexgen_direct/core_wln_global/directcorefinder.py", line 83, in load_model
saver.restore(self.session, model_path)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1290, in restore
{self.saver_def.filename_tensor_name: save_path})
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: Read less bytes than requested
[[node save/RestoreV2 (defined at *** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1751) ]]

Original stack trace for 'save/RestoreV2':
File "askcos/synthetic/evaluation/template_free.py", line 131, in
scorer = TemplateFreeNeuralNetScorer()
File "askcos/synthetic/evaluation/template_free.py", line 31, in __ init __
self.model = TFFP()
File "*** /askcos-core/askcos/synthetic/evaluation/rexgen_direct/predict.py", line 11, in __ init __
self.finder.load_model()
File "*** /askcos-core/askcos/synthetic/evaluation/rexgen_direct/core_wln_global/directcorefinder.py", line 82, in load_model
saver = tf.train.Saver()
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 828, in __ init __
self.build()
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 840, in build
self._build(self._filename, build_save=True, build_restore=True)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 878, in _build
build_restore=build_restore)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 508, in _build_internal
restore_sequentially, reshape)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 328, in _AddRestoreOps
restore_sequentially)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 575, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_io_ops.py", line 1696, in restore_v2
name=name)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 793, in _apply_op_helper
op_def=op_def)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(args, kwargs)
File " /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3360, in create_op
attrs, op_def, compute_device)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3429, in _create_op_internal
op_def=op_def)
File "*** /anaconda3/envs/askcos/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1751, in __ init __
self._traceback = tf_stack.extract_stack()

Thanks

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble