GithubHelp home page GithubHelp logo

eliochen / tartarus Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aspuru-guzik-group/tartarus

0.0 0.0 0.0 71.4 MB

A Benchmarking Platform for Realistic And Practical Inverse Molecular Design

Home Page: https://arxiv.org/abs/2209.12487

Shell 0.01% Python 0.79% Jupyter Notebook 99.19% Dockerfile 0.01%

tartarus's Introduction

Documentation Status License

Tartarus: Practical and Realistic Benchmarks for Inverse Molecular Design

This repository contains the code and results for the paper Tartarus, an open-source collection of benchmarks for evaluation of a generative model.

Benchmarking with Tartarus

Benchmarking with Docker

To run the Tartarus benchmark we recommend using the provided Docker container. Optionally, we also provide instructions to building and running the benchmark locally. The following directions will walk you through setup and evaluation. You will need to have Docker installed on your machine. Once you have Docker installed, you can follow these steps:

  1. Write the SMILES to be evaluated to a CSV file with a column header smiles.

  2. Pull the latest Tartarus Docker image:

    docker pull johnwilles/tartarus:latest
  1. Run the Docker container with the directory of your data mounted, the benchmark mode and the CSV input filename:
    docker run --rm -it -v ${LOCAL_PATH_TO_DATA}:/data johnwilles/tartarus:latest --mode ${BENCHMARK_MODE} --input_filename ${INPUT_FILENAME}
  1. The output file will be written to the same directory by default with the filename output.csv.

Installing from Source

To install Tartarus locally, we recommend using the provided Conda environment definition.

  1. Clone the Tartarus repository.
    git clone [email protected]:aspuru-guzik-group/Tartarus.git
  1. Create a Conda environment.
    conda env create -f environment.yml
  1. Activate the tartarus Conda environment.
    conda activate tartarus
  1. Ensure that docking task executables have the correct permissions.
    chmod 777 tartarus/data/qvina
    chmod 777 tartarus/data/smina

Note: These executables are only compatible with Linux.

Documentation

Detailed documentation can be found here: Tartarus Docs

Getting started

Below are some examples of how to load the datasets and use the fitness functions. For more details, you can also look at example.py.

Datasets

All datasets are found in the datasets directory. The arrows indicate the goal (↑ = maximization, ↓ = minimization).

Task Dataset name # of smiles Columns in file
Designing OPV hce.csv 24,953 PCEPCBM -SAS (↑) PCEPCDTBT -SAS (↑)
Designing emitters gdb13.csv 403,947 Singlet-triplet gap (↓) Oscillator strength (↑) Multi-objective (↑)
Designing drugs docking.csv 152,296 1SYH (↓) 6Y2F (↓) 4LDE (↓)
Designing chemical reaction substrates reactivity.csv 60,828 Activation energy ΔE (↓) Reaction energy ΔEr (↓) ΔE + ΔEr (↓) - ΔE + ΔEr (↓)

Designing organic photovoltaics

To use the evaluation function, load either the full xtb calculation from the pce module, or use the surrogate model, with pretrained weights.

import pandas as pd
data = pd.read_csv('./datasets/hce.csv')   # or ./dataset/unbiased_hce.csv
smiles = data['smiles'].tolist()
smi = smiles[0]

## use full xtb calculation in hce module
from tartarus import pce
dipm, gap, lumo, combined, pce_pcbm_sas, pce_pcdtbt_sas = pce.get_properties(smi)

## use pretrained surrogate model
dipm, gap, lumo, combined = pce.get_surrogate_properties(smi)

Designing Organic Emitters

Load the objective functions from the tadf module. All 3 fitness functions are returned for each smiles.

import pandas as pd
data = pd.read_csv('./datasets/gdb13.csv')  
smiles = data['smiles'].tolist()
smi = smiles[0]

## use full xtb calculation in hce module
from tartarus import tadf
st, osc, combined = tadf.get_properties(smi)

Design of drug molecule

Load the docking module. There are separate functions for each of the proteins, as shown below.

import pandas as pd
data = pd.read_csv('./datasets/docking.csv')  
smiles = data['smiles'].tolist()
smi = smiles[0]

## Design of Protein Ligands 
from tartarus import docking
score_1syh = docking.get_1syh_score(smi)
score_6y2f = docking.get_6y2f_score(smi)
score_4lde = docking.get_4lde_score(smi)

Design of Chemical Reaction Substrates

Load the reactivity module. All 4 fitness functions are returned for each smiles.

import pandas as pd
data = pd.read_csv('./datasets/reactivity.csv')  
smiles = data['smiles'].tolist()
smi = smiles[0]

## calculating binding affinity for each protein
from tartarus import reactivity
Ea, Er, sum_Ea_Er, diff_Ea_Er = reactivity.get_properties(smi)

Results

Our results for running the corresponding benchmarks can be found here:

Questions, problems?

Make a github issue 😄. Please be as clear and descriptive as possible. Please feel free to reach out in person: (akshat98[AT]stanford[DOT]edu, robert[DOT]pollice[AT]gmail[DOT]com)

License

Apache License 2.0

tartarus's People

Contributors

akshat998 avatar jwilles avatar gkwt avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.