Code and trained model for our paper GAABind: A Geometry-Aware Attention-Based
Network for Accurate Protein-Ligand Binding Pose
and Binding Affinity Prediction
You can set up the environment using Anaconda or Docker.
git clone https://github.com/Mercuryhs/GAABind.git
cd GAABind
conda env create -f environment.yml
conda activate gaabind
git clone https://github.com/Mercuryhs/GAABind.git
docker pull mercuryyy/gaabind:latest
docker run --shm-size 8G --gpus all -it -v $PWD/GAABind:/GAABind gaabind:latest /bin/bash
git clone https://github.com/Mercuryhs/GAABind.git
cd GAABind
docker build -t gaabind:latest .
docker run --shm-size 8G --gpus all -it -v $PWD:/GAABind gaabind:latest /bin/bash
The processed dataset can be downloaded from zenodo. If you want to train GAABind from scratch, or reproduce the GAABind results, you can:
- download dataset from zenodo;
- unzip the zip file and place it into data_path
dataset
.
We provided both jupyter notebook tutorials and python scripts for running GAABind on your own dataset.
GAABind_example.ipynb
example_data/
├── Mpro-x11271
│ ├── ligand.txt
│ ├── pocket.txt
│ └── receptor.pdb
└── Mpro-x12715
├── ligand.txt
├── pocket.txt
└── receptor.pdb
The ligand file can be in .sdf, .mol2, or .mol format, or you can provide the ligand's SMILES representation in a .txt file. The pocket.txt contains the residues of binding pockets, each residue is named by the chain, residue number, and three-letter abbreviation as follows:
A_166_GLU
A_49_MET
A_143_GLY
...
Run the following command to preprocess the dataset
python preprocess.py --input_path [raw_path] --output_path [processed_path]
python predict.py --input_path [processed_path] --output_path [predict_path]
Then you can find the prediction results in the predict_path
.
Download and unzip the dataset, then place it into the dataset
directory.
Running the following command to reproduce the evaluation result on CASF2016
:
python inference.py --input_path dataset/PDBBind/processed --complex_list dataset/PDBBind/test.txt --output_path casf2016_predict_result --batch_size 3
Running the following command to reproduce the evaluation result on Mpro dataset
:
python inference.py --input_path dataset/Mpro/processed --complex_list dataset/Mpro/test.txt --output_path mpro_predict_result --batch_size 2
You can also retraining the model yourself by using the following command:
OMP_NUM_THREADS=8 CUDA_VISIBLE_DEVICES='0,1' torchrun --nproc_per_node=2 train.py --data_dir dataset/PDBBind
Under Review