GithubHelp home page GithubHelp logo

etiennereboul / optimol Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jacquesboitreaud/optimol

0.0 0.0 0.0 853.94 MB

Optimization of binding affinities in chemical space for drug discovery

Python 86.33% Shell 0.19% Jupyter Notebook 13.48%

optimol's Introduction

OptiMol

This is the code for the paper on https://www.biorxiv.org/content/10.1101/2020.05.23.112201v2

This repo introduces two things :

  • A new Variational Auto-Encoder (VAE) architecture that goes from a molecular graph to a sequence representation (and especially SELFIEs).
  • An optimization pipeline that optimizes a scoring function that includes docking

The necessary packages are packaged as ymls available for cpu or cuda10 usage.

conda env create -f ymls/cpu.yml 

Otherwise one should manually install the following packages :

pytorch, dgl, networkx, scikit-learn,rdkit, tqdm, ordered-sets, moses, pandas

Prior model training

Data loading

We use Molecular Sets (https://github.com/molecularsets/moses) to train our model : After installing the moses python library, the data can be reached by running

python data_processing/download_moses.py 

To train a graph2selfies model, selfies need to be precomputed for the train set by running To compute selfies for another dataset stored in csv, the molecules should be in a column entitled 'smiles', run :

python data_processing/get_selfies.py -i [path_to_my_csv_dataset]

Model training

To train the model run

python train.py --train [my_dataset.csv] --n [your_model_name]

The csv must contain columns entitled 'smiles' and 'selfies'

Embedding molecules

To compute embeddings for molecules in csv file:

python embed_mols.py -i [path_to_csv] --name [your_model_name] -v [smiles]/[selfies]

The column containing the smiles/selfies should be labeled 'smiles'.

Generating samples

To generate samples from a trained model, run :

python generate/sample_prior.py -N [number_of_samples] --name [name_of_the_model]

Moses metrics

To compute the Moses benchmark metrics for the samples (recommended 30k samples), run

python eval/moses_metrics.py -i [path_to_txt_with_samples]

Scoring function optimization

This is mostly an efficient implementation of the CbAS algorithm for docking. there is also two implementations for BO in /optim

OptiMol

Go to /cbas

optimol's People

Contributors

jacquesboitreaud avatar vincentx15 avatar nono9212 avatar etiennereboul avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.