GithubHelp home page GithubHelp logo

kirito-ausna / diffdockpp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ketatam/diffdock-pp

1.0 0.0 0.0 40.65 MB

Implementation of DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models in PyTorch (ICLR 2023 - MLDD Workshop)

Home Page: https://arxiv.org/abs/2304.03889

Shell 4.92% Python 95.08%

diffdockpp's Introduction

DiffDock-PP

Implementation of DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models in PyTorch (ICLR 2023 - MLDD Workshop) by Mohamed Amine Ketata*, Cedrik Laue*, Ruslan Mammadov*, Hannes Stärk, Menghua Wu, Gabriele Corso, Céline Marquet, Regina Barzilay, Tommi S. Jakkola.

DiffDock-PP is a new approach to rigid-body protein-protein docking that is based on a diffusion generative model that learns to translate and rotate unbound protein structures into their bound conformations, and a confidence model that learns to rank different poses generated by the score model and select the best one.

If you encounter any problem with the code, feel free to open an issue or contact [email protected].

diffdock_pp.jpeg

Setup

Installation

First, clone this repository

git clone https://github.com/ketatam/DiffDock-PP.git

Then, create a virtual environment to install the dependencies. We use python=3.10.8, but other new Python versions should work as well.

conda create -n diffdock_pp python=3.10.8
conda activate diffdock_pp

Now, you can install the required packages

conda install pytorch=1.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia

# install compatible pytorch geometric in this order WITH versions
pip install --no-cache-dir  torch-scatter==2.0.9 torch-sparse==0.6.15 torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.13.0+cu116.html

pip install numpy dill tqdm pyyaml pandas biopandas scikit-learn biopython e3nn wandb tensorboard tensorboardX matplotlib

Note that the code was tested on Ubuntu 22.04 using NVIDIA RTX A6000 GPUs.

DIPS Data

To get the DIPS dataset, you can either follow the steps stated in the EquiDock repo to download the raw data and process it to prepare the protein pairs for docking, or you can directly download the processed files (2.6GB):

curl -L -o datasets/DIPS/dips.zip https://www.dropbox.com/s/sqknqofy58nlosh/DIPS.zip?dl=0
unzip datasets/DIPS/dips.zip -d datasets/DIPS/pairs_pruned

This should result in creating the folder datasets/DIPS/pairs_pruned which contains around 962 folders corresponding to the different protein pairs present in DIPS.

The data split that we used can be found in datasets/DIPS/data_file.csv

DB5.5 Data

We also support using the DB5.5 dataset, which can be downloaded from:

https://zlab.umassmed.edu/benchmark/

However, in this work we only focused on DIPS and therefore only release models that were trained on DIPS.

Running DiffDock-PP

Score Model Training

Our code supports multi-GPU training to accelerate the development of new models. To train a new score model from scratch, simply run

sh src/train.sh

In this bash file, you can specify the experimental setup such as the number and IDs of GPUs, batch size, etc. You also need to specify a config file that defines the parameters related to the data, model, training and inference.

The parameters that we used to train our score model can be found in config/dips_esm.yaml. This file specifies, among other things, the path to the data folder, the model hyperparameters, the training and inference configuration.

Note that the very first run will take longer time as it processes the data and caches it for future runs. Also note that you need to setup a WandB account to log the experiments results or change the logger to "tensorboard".

Confidence Model Training

As described in the paper, the confidence model is used to rank multiple poses generated by the score model based on predicted confidence scores. As such, it is a classification network that is trained to predict the quality of the poses generated by the score model.

To train the confidence model, you first need to create its training dataset by generating multiple samples using the score model:

sh src/generate_samples.sh

Then, you can start the actual training by running

sh src/train_confidence.sh

Trained Models

We provide the weights of our score model trained for 170 epochs on DIPS in checkpoints/large_model_dips

We also provide the weights of our trained confidence model in checkpoints/confidence_model_dips

Inference

To run inference on the validation or test set, run

sh src/inference.sh

Similarly to training, you can specify all necessary configurations in the bash file and in the config files. The default configuration with the provided trained score and confidence models allows you to reproduce the numbers in the paper.

Note that if you want to test our models on your custom dataset, the easiest way would be to use the DB5Loader class defined in src/data/data_train_utils.py and name your PDB files {PDB_ID}_l_b.pdb and {PDB_ID}_r_b.pdb for the ligand and the receptor, respectively. For illustration, check out the script src/db5_inference.sh and the corresponding config file config/single_pair_inference.yaml, which run the inference on a single pair located in datasets/single_pair_dataset.

Visualization

To visualize the predictions of the model, in the inference.sh script add the flag --visualization_path path/to/visualization/folder and --visualize_n_val_graphs NUMBER_OF_COMPLEXES_TO_VISUALIZE and it will save the protein complex structure at each time step of the reverse diffusion process as .pdb files that you can visualize using, e.g., PyMOL.

Citation

@article{ketata2023diffdock,
  title={DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models},
  author={Ketata, Mohamed Amine and Laue, Cedrik and Mammadov, Ruslan and St{\"a}rk, Hannes and Wu, Menghua and Corso, Gabriele and Marquet, C{\'e}line and Barzilay, Regina and Jaakkola, Tommi S},
  journal={arXiv preprint arXiv:2304.03889},
  year={2023}
}

Licence

MIT

diffdockpp's People

Contributors

ketatam avatar

Stargazers

Yufei Huang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.