askcos / askcos Goto Github PK

Software package for computer aided synthesis planning

License: Other

Dockerfile 0.03% Python 50.95% Shell 0.75% HTML 24.95% CSS 9.34% JavaScript 8.57% Jupyter Notebook 5.40%

askcos's Introduction

November 2023 Update

We have fully migrated the application and all of its interfaces, underlying models, etc. to Gitlab. Please see https://gitlab.com/mlpds_mit/askcosv2/askcos2_core for instructions on deployment; components in their own repositories will be automatically pulled during setup based on the provided config file in the askcos2_core repo.

ASKCOS:

Software package for the prediction of feasible synthetic routes towards a desired compound and associated tasks related to synthesis planning. Originally developed under the DARPA Make-It program and now being developed under the MLPDS Consortium.

Please note that the MPL 2.0 license for this repository does not apply to the data and trained models. The data and trained models are released under CC BY-NC-SA (i.e., are for noncommercial use only).

Contributors include Connor Coley, Mike Fortunato, Hanyu Gao, Pieter Plehiers, Matthew Cameron, Max Liu, Yuran Wang, Thomas Struble, Jiannan Liu, and Yiming Mo.

Where to find newer ASKCOS releases

This repository contains the original ASKCOS project and will no longer be updated, with the final release being v0.4.1. However, the ASKCOS project is still under very active development! The project has been split into multiple smaller repositories for many reasons including improved modularity. The askcos, deploy, and makeit directories have been migrated to askcos-site, askcos-deploy, and askcos-core, respectively. In addition, version numbers have changed to a date based system, with the first new open-source release being version 2020.07. We highly encourage you to check out and update to the new version! For any questions or comments, please contact [email protected].

Quick start using Google Cloud

# (1) Create a Google Cloud instance
#     - recommended specs: 8 vCPUs, 64 GB memory
#     - select Ubuntu 18.04 LTS Minimal
#     - upgrade to a 100 GB disk
#     - allow HTTP and HTTPS traffic

# (2) Install docker
#     - https://docs.docker.com/engine/install/ubuntu/
sudo apt-get update
sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common -y
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io -y
sudo groupadd docker
sudo usermod -aG docker $USER
newgrp docker

# (3) Install docker-compose
#     - https://docs.docker.com/compose/install/
sudo curl -L "https://github.com/docker/compose/releases/download/1.27.4/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

# (4) Install git lfs
#     - https://github.com/git-lfs/git-lfs/wiki/Installation
sudo apt-get install software-properties-common -y
sudo add-apt-repository ppa:git-core/ppa -y
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs -y
git lfs install

# (5) Pull
git clone https://github.com/ASKCOS/ASKCOS
cd ASKCOS
git lfs pull

# (6) Build & run
docker build -t askcos/askcos . # build
cd deploy
bash deploy.sh deploy     # start containers (detached) and run other initialization tasks
docker-compose logs -f    # start tailing logs (can CTRL+C to exit)

# (7) Navigate to your instance's external IP
#     - note that it may take ~5 minutes for the retro transformer workers to start up
#     - you can check the status of their startup by looking at "server status"
#     - the first request to a website process may take ~10 seconds
#     - the first request to a retro transform worker may take ~5-10 seconds
#     - the first request to the forward predictor may take ~60 seconds

First Time Deployment with Docker

Prerequisites

If you're building the image from scratch, make sure git (and git lfs) is installed on your machine
Install Docker OS specific instructions
Install docker-compose installation instructions

(Optional) Building the ASKCOS Image

The askcos image itself can be built using the Dockerfile in this repository.

$ git clone https://github.com/ASKCOS/ASKCOS
$ cd ASKCOS
$ git lfs pull
$ docker build -t askcos/askcos .

NOTE: For application deployment, double check the image tag used in the docker-compose.yml file and be sure to tag your newly built image with the same image name. Otherwise, the image tag used in docker-compose.yml will be pulled and deployed instead of the image that was just built.

Deploying the Web Application

Deployment is initiated by a bash script that runs a few docker-compose commands in a specific order. Several database services need to be started first, and more importantly seeded with data, before other services (which rely on the availability of data in the database) can start. The bash script can be found and should be run from the deploy folder as follows:

$ bash deploy.sh command [optional arguments]

There are a number of available commands, including the following for common deploy tasks:

deploy: runs standard first-time deployment tasks, including seed-db
update: pulls new docker image from GitLab repository and restarts all services
seed-db: seed the database with default or custom data files
start: start a deployment without performing first-time tasks
stop: stop a running deployment
clean: stop a running deployment and remove all docker containers

For a running deployment, new data can be seeded into the database using the seed-db command along with arguments indicating the types of data to be seeded. Note that this will replace the existing data in the database. The available arguments are as follows:

-b, --buyables: specify buyables data to seed, either default or path to data file
-c, --chemicals: specify chemicals data to seed, either default or path to data file
-x, --reactions: specify reactions data to seed, either default or path to data file
-r, --retro-templates: specify retrosynthetic templates to seed, either default or path to data file
-f, --forward-templates: specify forward templates to seed, either default or path to data file

For example, to seed default buyables data and custom retrosynthetic pathways, run the following from the deploy folder:

$ bash deploy.sh seed-db --buyables default --retro-templates /path/to/my.retro.templates.json.gz

To update a deployment, run the following from the deploy folder:

$ bash deploy.sh update --version x.y.z

To stop a currently running application, run the following from the deploy folder:

$ bash deploy.sh stop

If you would like to clean up and remove everything from a previous deployment (NOTE: you will lose user data), run the following from the deploy folder:

$ bash deploy.sh clean

Important Notes

Recommended hardware

We recommend running this code on a machine with at least 8 compute cores (16 preferred) and 64 GB RAM (128 GB preferred).

First startup

The celery worker will take a few minutes to start up (possibly up to 5 minutes; it reads a lot of data into memory from disk). The web app itself will be ready before this, however upon the first get request (only the first for each process) a few files will be read from disk, so expect a 10-15 second delay.

Scaling workers

Only 1 worker per queue is deployed by default with limited concurrency. This is not ideal for many-user demand. You can easily scale the number of celery workers you'd like to use with

$ docker-compose up -d --scale tb_c_worker=N tb_c_worker

where N is the number of workers you want, for example. The above note applies to each worker you start, however, and each worker will consume RAM. You can also adjust the default number of workers defined by the variables at the top of the deploy.sh script.

Managing Django

If you'd like to manage the Django app (i.e. - run python manage.py ...), for example, to create an admin superuser, you can run commands in the running app service (do this after docker-compose up) as follows:

$ docker-compose exec app bash -c "python /usr/local/ASKCOS/askcos/manage.py createsuperuser"

In this case you'll be presented an interactive prompt to create a superuser with your desired credentials.

Data migration to askcos-data

In the v0.4.1 release of ASKCOS, data and models have been migrated to a separate repository at https://github.com/ASKCOS/askcos-data. The pre-built ASKCOS Docker image available from Docker Hub already contains the data and models. For local use, you will need to clone the askcos-data repository separately:

$ cd ASKCOS/makeit
$ git clone https://github.com/ASKCOS/askcos-data data

How to run individual modules

Many of the individual modules -- at least the ones that are the most interesting -- can be run "standalone". Examples of how to use them are often found in the if __name__ == '__main__' statement at the bottom of the script definitions. For example...

Using the learned synthetic complexity metric (SCScore)

makeit/prioritization/precursors/scscore.py

Obtaining a single-step retrosynthetic suggestion with consideration of chirality

makeit/retrosynthetic/transformer.py

Finding recommended reaction conditions based on a trained neural network model

makeit/synthetic/context/neuralnetwork.py

Using the template-free forward predictor

makeit/synthetic/evaluation/template_free.py

Using the coarse "fast filter" (binary classifier) for evaluating reaction plausibility

makeit/synthetic/evaluation/fast_filter.py

Integrated CASP tool

For the integrated synthesis planning tool at makeit/application/run.py, there are several options available. The currently enabled options for the command-line tool can be found at makeit/utilities/io/arg_parser.py. There are some options that are only available for the website and some that are only available for the command-line version. As an example of the former, the consideration of popular but non-buyable chemicals as suitable "leaf nodes" in the search. It is highly recommended to use the web interface when possible.

askcos's People

Contributors

Stargazers

Watchers

askcos's Issues

Does the rollout policy in "/makeit/retrosynthetic/mcts/tree_builder.py" use the neuron network?

According to Marwin Segler's work description, they use one hidden layer NN to randomly sampling the templates for estimating the position value during the rollout phase and use another NN in expansion step to guide the search in a promising direction.

I'm wondering whether in "/makeit/retrosynthetic/mcts/tree_builder.py" uses the same idea that applies the respective NN in expansion and rollout step or not. I can just find the trained model's weight & bias in "ASKCOS/makeit/data/prioritization/" for expansion step, but not any for rollout step in "ASKCOS/makeit/data/*". The rollout step in ASKCOS seems directly to use the applied result (top_probs, top_indeces) from template relevance NN which has been used in the expansion step for obtaining the top template(sorted).

Standalone Module impurity predictor and FastFilter.py throwing error

Hi
I am unable to run the impurity predictor module as a standalone module

Also there is an error while running the fastfiler module as a standalone module

It throws a key error for the model path even thought model path is set to the right directory in makeit/data/models/fast_filter/1/ (git pull from askcos data)

Any help would be much appreciated !

what's meaning of 'efgs' key in the self.templates?

I can guess the meaning of these field "count rxn_example name reference intra_only incompatible_groups references intra_only incompatible_groups ..." in the info of template.
I can't guess the meaning of the filed "efgs".
what's meaning of 'efgs' key in the self.templates?

retro template not found from bash script

Do we need to change the bash script ?
The retro template and forward template are not loaded even after I change the bash script path to local database.
(In makeit folder, git clone askcos data repo)

connecting to Reaxys backend

For the One-Step Retrosyntehsis module, if I click on the recommended reactants image, it takes me to the template page and tells that the reaxys backend is not configured.
Is there a way to configure this backend?

Dockerfile for askcos-base

The Dockerfile for ASKCOS v0.4.1 (https://github.com/ASKCOS/ASKCOS/blob/v0.4.1/Dockerfile) uses FROM askcos/askcos-base:2019.03.4-gh2855-py35. Is the Dockerfile for that image available anywhere? The dockerhub page doesn't list it.

I would like to reference it for building ASKCOS inside a network that requires the use of RHEL (rather than debian).

ImportError: No module named urls

Greetings,
I am attempting to get a basic build of this project going. I followed the installation steps but on docker-compose up I'm receiving the following (shortened for brevity):

cr_network_worker_1    |   File "/usr/local/lib/python2.7/dist-packages/celery/loaders/base.py", line 108, in import_default_modules
cr_network_worker_1    |     raise response
cr_network_worker_1    | ImportError: No module named urls

The same error occurs for multiple containers, including cr_network_worker_1, te_coordinator_1, sc_coordinator_1, tb_c_worker_1, cr_coordinator_1, tb_coordinator_mcts_1, and others.

It looks like python2 is used by the project so there is likely some dependency issue. My host is a MacOS but I'm doing everything in docker so that shouldn't be an issue.

what's the different meaning between the argument max_branching & template_count in arg_parser.py?

I have look through the most of script in ASKCOS; however, I still cannot figure out the clear meaning of max_branching & template_count.

For a simple example, if I set the max_branching=2, template_count=100, depth=1 and start from the root A to make a tree, I expect that I can probably get the two child nodes, named B, C here, of A at expansion step (due to the branching factor is 2). But where does the argument template_count be used in? Is the max_branching just a more restricted argument than template_count, but both of than in the same meaning?

The comment of template_count says: 'Maximum number of templates to be applied for each expansion step'. I have thought that root A applied template_count=100 to generate child nodes B & C when doing expansion. However, my thought seems a wrong concept because root A doing the same action two times...

Thank you for your help.

A question regarding your choice of reaction predictor for ImpurityPredictor class

I have been reading the code to understand how you implemented impurity prediction. While thinking about the predictor method, I realized that WLN has been used to generate products for the given reactants. Contrary to the other modules such as forward reaction prediction and retrosynthesis, there is no option for the model in impurity prediction module in ASKCOS website. Is there any particular reason why you applied WLN but did not provide additional forward predictors such as graph2smiles?

Also, regardless from the aforementioned topic but, why did you decide to assign a certain threshold of 30 percent for substructure match in ImpurityPrediction.check_mode_outcome? I know that finding substructures is not trivial, but I just wondered if there are some preliminary results that led you apply the function in this way.

Thanks for making your amazing application available and making it open-source.

Docker IP not reflecting on Browser

The following containers are activate.
i can ping the docker container ip via ping "ip" but that ip is not visible on the browser?
Do i need to change the HOSTNAME in .env file?
Also which container is the entry point?
is it rabbitmq?

Specific molecule causes ASKCOS to crash

During a screen of ca 100 I found one molecule that crashed ASKCOS: O=C(NC1=C(C=CC1)F)NC2=CC=CCC2 with the error message shown here

The error message is INFO@mcts_tree_builder : [360.590s] Starting cooridnation loop
Traceback (most recent call last):
File "covid_csv.py", line 132, in
soft_stop=True)
File "/usr/local/ASKCOS/makeit/retrosynthetic/mcts/tree_builder.py", line 1105, in get_buyable_paths
return_first=return_first,
File "/usr/local/ASKCOS/makeit/retrosynthetic/mcts/tree_builder.py", line 744, in build_tree
forbidden_molecules=forbidden_molecules, return_first=return_first)
File "/usr/local/ASKCOS/makeit/retrosynthetic/mcts/tree_builder.py", line 438, in coordinate
if all(pathway == {} for pathway in self.active_pathways) and len(self.pending_results) == 0:
AttributeError: MCTS instance has no attribute 'pending_results'

The setting are
smiles_list = ['O=C(NC1=C(C=CC1)F)NC2=CC=CCC2']
for smiles in smiles_list:
smiles = Chem.MolToSmiles(Chem.MolFromSmiles(smiles), True)
status, paths = Tree.get_buyable_paths(smiles,
nproc=NCPUS,
max_depth=9,
max_branching=25,
expansion_time=60,
max_cum_template_prob=0.999,
template_count=1000,
max_ppg=100,
filter_threshold=0.1,
return_first=True,
soft_reset=False,
soft_stop=True)

starting materials database

Hi,

I am trying to use this tool and I wonder where can I find the starting materials (building blocks) database file since I want to add some molecules into the original database? Thanks in advance!

Best,

what is ASKCOS short for??

Sorry, non technical question lol. Just curious, since I don't see it anywhere

Individual modules not working

Hey, I'm trying to run some individual modules like makeit/retrosynthetic/mcts/tree_builder.py inside docker container, but the script fails with the following error:

Using Theano backend.
Traceback (most recent call last):
  File "makeit/retrosynthetic/mcts/tree_builder.py", line 1, in <module>
    from makeit.retrosynthetic.transformer import RetroTransformer
  File "/home/askcos/ASKCOS/makeit/retrosynthetic/transformer.py", line 25, in <module>
    from makeit.synthetic.evaluation.fast_filter import FastFilterScorer
  File "/home/askcos/ASKCOS/makeit/synthetic/evaluation/fast_filter.py", line 1, in <module>
    from makeit.utilities.fastfilter_utilities import Highway_self, pos_ct, true_pos, real_pos, set_keras_backend
  File "/home/askcos/ASKCOS/makeit/utilities/fastfilter_utilities.py", line 12, in <module>
    from keras.layers import merge, activations
ImportError: cannot import name activations

Looks like the wrong version of keras is installed inside a docker image. Do you have any ideas of how to fix this?

Thanks,
Maria

where could I download achiral model template_relevance_network_weights_v6_25.pickle?

Dear connorcoley:

can you support the model for achirl data retrotransformer_achiral_using_reaxys_v2-transforms_retro_v6_mincount25.pkl if available?

https://github.com/connorcoley/ASKCOS/tree/master/makeit/data/prioritization

change ' if idx not in atoms_to_use: ' TO ' if idx not in new_atoms_to_use: '?

def expand_atoms_to_use(mol, atoms_to_use, groups=[], symbol_replacements=[]):
    '''Given an RDKit molecule and a list of AtomIdX which should be included
    in the reaction, this function expands the list of AtomIdXs to include one 
    nearest neighbor with special consideration of (a) unimportant neighbors and
    (b) important functional groupings'''

    # Copy
    new_atoms_to_use = atoms_to_use[:]

    # Look for all atoms in the current list of atoms to use
    for atom in mol.GetAtoms():
        if atom.GetIdx() not in atoms_to_use: continue
        # Ensure membership of changed atom is checked against group
        for group in groups:
            if int(atom.GetIdx()) in group[0]:
                if v: 
                    print('adding group due to match')
                    try:
                        print('Match from molAtomMapNum {}'.format(
                            atom.GetProp('molAtomMapNumber'),
                        ))
                    except KeyError:
                        pass
                for idx in group[1]:
                    if idx not in atoms_to_use:   # if idx not in new_atoms_to_use: 
                        new_atoms_to_use.append(idx)
                        symbol_replacements.append((idx, convert_atom_to_wildcard(mol.GetAtomWithIdx(idx))))

Server 500 Error - One-Step Retrosynthesis, ASKCOS v0.2.9

We are currently using an outdated version of ASKCOS on an internal network on RHEL7. We've installed relevant dependencies and pip modules manually and have the server working and displaying a home page and many of the top-level links work. Due to our network restrictions, we are NOT running this in Docker but instead have a run script roughly based on this repo's docker-compose.yml, which runs uwsgi and each worker in the background when started.

However, when navigating to One-Step Retrosynthesis and clicking on any molecules, we get a Server 500 error with no relevant log messages. I've done some digging and found that the following stacktrace -

From askcos_site.main.views.retro:149, when res.get is called,

https://github.com/ASKCOS/ASKCOS/blob/v0.2.9/askcos/askcos_site/main/views/retro.py#L149

We see the following error:

[2021-07-28 17:46:57,314: ERROR/ForkPoolWorker-8] Task askcos_site.askcos_celery.treebuilder.tb_c_worker.get_top_precursors[ea24cc1b-0bde-46c4-895b-5127c92dc0c2] raised unexpected: AttributeError("'NoneType' object has no attribute 'get_outcomes'",)
Traceback (most recent call last):
  File "/home/paula1/ASKCOS/env/lib/python2.7/site-packages/celery/app/trace.py", line 382, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/paula1/ASKCOS/env/lib/python2.7/site-packages/celery/app/trace.py", line 641, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/paula1/ASKCOS/askcos/askcos_site/askcos_celery/treebuilder/tb_c_worker.py", line 60, in get_top_precursors
    result = retroTransformer.get_outcomes(
AttributeError: 'NoneType' object has no attribute 'get_outcomes'

https://github.com/ASKCOS/ASKCOS/blob/v0.2.9/askcos/askcos_site/askcos_celery/treebuilder/tb_c_worker.py#L60

It seems that the global retroTransformer is still NoneType when its .get_outcomes method is called.

Have you seen a similar error, or have an idea for possible workarounds with v0.2.9? We're operating this on a VM with the minimum required mem/cpu cores for ASKCOS v0.2.9, but upgrading to current v0.4.1 is somewhat difficult due to our requirement to use a RHEL VM rather than Docker (and since your docker images are ubuntu-based, porting the docker builds is not trivial).

Problem when trying to run individual modules

When I try to run the individual modules inside the folder 'makeit", for example:
python tree_builder.py
I got this error:
"ModuleNotFoundError: No module named makeit
Please help me solve this problem, thank you!

Changes to banning chemicals

Allow authenticated users to avoid chemical bans
Enable ban by substructure

Create new banned(smiles) function that accepts a SMILES string, and returns whether or not the prediction should be made.