gnina / libmolgrid Goto Github PK

Comprehensive library for fast, GPU accelerated molecular gridding for deep learning workflows

Home Page: https://gnina.github.io/libmolgrid/

License: Apache License 2.0

CMake 3.57% Python 18.10% C++ 66.21% Cuda 11.76% Makefile 0.22% Batchfile 0.14%

libmolgrid's Introduction

gnina (pronounced NEE-na) is a molecular docking program with integrated support for scoring and optimizing ligands using convolutional neural networks. It is a fork of smina, which is a fork of AutoDock Vina.

Help

Please subscribe to our slack team. An example colab notebook showing how to use gnina is available here. We also hosted a workshp on using gnina (video, slides).

Citation

If you find gnina useful, please cite our paper(s):

GNINA 1.0: Molecular docking with deep learning (Primary application citation)
A McNutt, P Francoeur, R Aggarwal, T Masuda, R Meli, M Ragoza, J Sunseri, DR Koes. J. Cheminformatics, 2021
link PubMed ChemRxiv

Protein–Ligand Scoring with Convolutional Neural Networks (Primary methods citation)
M Ragoza, J Hochuli, E Idrobo, J Sunseri, DR Koes. J. Chem. Inf. Model, 2017
link PubMed arXiv

Ligand pose optimization with atomic grid-based convolutional neural networks
M Ragoza, L Turner, DR Koes. Machine Learning for Molecules and Materials NIPS 2017 Workshop, 2017
arXiv

Visualizing convolutional neural network protein-ligand scoring
J Hochuli, A Helbling, T Skaist, M Ragoza, DR Koes. Journal of Molecular Graphics and Modelling, 2018
link PubMed arXiv

Convolutional neural network scoring and minimization in the D3R 2017 community challenge
J Sunseri, JE King, PG Francoeur, DR Koes. Journal of computer-aided molecular design, 2018
link PubMed

Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design
PG Francoeur, T Masuda, J Sunseri, A Jia, RB Iovanisci, I Snyder, DR Koes. J. Chem. Inf. Model, 2020
link PubMed Chemrxiv

Virtual Screening with Gnina 1.0 J Sunseri, DR Koes D. Molecules, 2021 link Preprints

Docker

A pre-built docker image is available here and Dockerfiles are here.

Installation

We recommend that you use the pre-built binary unless you have significant experience building software on Linux, in which case building from source might result in an executable more optimized for your system.

Ubuntu 22.04

apt-get  install build-essential git cmake wget libboost-all-dev libeigen3-dev libgoogle-glog-dev libprotobuf-dev protobuf-compiler libhdf5-dev libatlas-base-dev python3-dev librdkit-dev python3-numpy python3-pip python3-pytest libjsoncpp-dev

Follow NVIDIA's instructions to install the latest version of CUDA (>= 11.0 is required). Make sure nvcc is in your PATH.

Optionally install cuDNN.

Install OpenBabel3. Note there are errors in bond order determination in version 3.1.1 and older.

git clone https://github.com/openbabel/openbabel.git
cd openbabel
mkdir build
cd build
cmake -DWITH_MAEPARSER=OFF -DWITH_COORDGEN=OFF -DPYTHON_BINDINGS=ON -DRUN_SWIG=ON ..
make
make install

Install gnina

git clone https://github.com/gnina/gnina.git
cd gnina
mkdir build
cd build
cmake ..
make
make install

WSL2 Ubuntu 22.04

sudo apt-get remove nvidia-cuda-toolkit
wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
chmod 700 cuda_12.4.0_550.54.14_linux.run
sudo sh cuda_12.4.0_550.54.14_linux.run
wget https://developer.download.nvidia.com/compute/cudnn/9.0.0/local_installers/cudnn-local-repo-ubuntu2204-9.0.0_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2204-9.0.0_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-9.0.0/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn-cuda-12
apt-get install build-essential git cmake wget libboost-all-dev libeigen3-dev libgoogle-glog-dev libprotobuf-dev protobuf-compiler libhdf5-dev libatlas-base-dev python3-dev librdkit-dev python3-numpy python3-pip python3-pytest libjsoncpp-dev

git clone https://github.com/openbabel/openbabel.git
cd openbabel
mkdir build
cd build
cmake -DWITH_MAEPARSER=OFF -DWITH_COORDGEN=OFF -DPYTHON_BINDINGS=ON -DRUN_SWIG=ON ..
make -j8
sudo make install

git clone https://github.com/gnina/gnina.git
cd gnina
mkdir build
cd build
cmake ..
make -j8
sudo make install

If you are building for systems with different GPUs (e.g. in a cluster environment), configure with -DCUDA_ARCH_NAME=All.
Note that the cmake build will automatically fetch and install libmolgrid if it is not already installed.

The scripts provided in gnina/scripts have additional python dependencies that must be installed.

Usage

To dock ligand lig.sdf to a binding site on rec.pdb defined by another ligand orig.sdf:

gnina -r rec.pdb -l lig.sdf --autobox_ligand orig.sdf -o docked.sdf.gz

To perform docking with flexible sidechain residues within 3.5 Angstroms of orig.sdf (generally not recommend unless prior knowledge indicates pocket is highly flexible):

gnina -r rec.pdb -l lig.sdf --autobox_ligand orig.sdf --flexdist_ligand orig.sdf --flexdist 3.5 -o flex_docked.sdf.gz

To perform whole protein docking:

gnina -r rec.pdb -l lig.sdf --autobox_ligand rec.pdb -o whole_docked.sdf.gz --exhaustiveness 64

To utilize the default ensemble CNN in the energy minimization during the refinement step of docking (10 times slower than the default rescore option):

gnina -r rec.pdb -l lig.sdf --autobox_ligand orig.sdf --cnn_scoring refinement -o cnn_refined.sdf.gz

To utilize the default ensemble CNN for every step of docking (1000 times slower than the default rescore option):

gnina -r rec.pdb -l lig.sdf --autobox_ligand orig.sdf --cnn_scoring all -o cnn_all.sdf.gz

To utilize all empirical scoring using the Vinardo scoring function:

gnina -r rec.pdb -l lig.sdf --autobox_ligand orig.sdf --scoring vinardo --cnn_scoring none -o vinardo_docked.sdf.gz

To utilize a different CNN during docking (see help for possible options):


gnina -r rec.pdb -l lig.sdf --autobox_ligand orig.sdf --cnn dense -o dense_docked.sdf.gz

To minimize and score ligands ligs.sdf already positioned in a binding site:

gnina -r rec.pdb -l ligs.sdf --minimize -o minimized.sdf.gz

To covalently dock a pyrazole to a specific iron atom on the receptor with the bond formed between a nitrogen of the pyrazole and the iron.

gnina  -r rec.pdb.gz -l conformer.sdf.gz --autobox_ligand bindingsite.sdf.gz --covalent_rec_atom A:601:FE --covalent_lig_atom_pattern '[$(n1nccc1)]' -o output.sdf.gz

The same as above, but with the covalently bonding ligand atom manually positioned (instead of using OpenBabel binding heuristics) and the ligand/residue complex UFF optimized.

gnina  -r rec.pdb.gz -l conformer.sdf.gz --autobox_ligand bindingsite.sdf.gz --covalent_lig_atom_position -11.796,31.887,72.682  --covalent_optimize_lig  --covalent_rec_atom A:601:FE --covalent_lig_atom_pattern '[$(n1nccc1)]' -o output.sdf.gz

All options:

Input:
  -r [ --receptor ] arg              rigid part of the receptor
  --flex arg                         flexible side chains, if any (PDBQT)
  -l [ --ligand ] arg                ligand(s)
  --flexres arg                      flexible side chains specified by comma 
                                     separated list of chain:resid
  --flexdist_ligand arg              Ligand to use for flexdist
  --flexdist arg                     set all side chains within specified 
                                     distance to flexdist_ligand to flexible
  --flex_limit arg                   Hard limit for the number of flexible 
                                     residues
  --flex_max arg                     Retain at at most the closest flex_max 
                                     flexible residues

Search space (required):
  --center_x arg                     X coordinate of the center
  --center_y arg                     Y coordinate of the center
  --center_z arg                     Z coordinate of the center
  --size_x arg                       size in the X dimension (Angstroms)
  --size_y arg                       size in the Y dimension (Angstroms)
  --size_z arg                       size in the Z dimension (Angstroms)
  --autobox_ligand arg               Ligand to use for autobox
  --autobox_add arg                  Amount of buffer space to add to 
                                     auto-generated box (default +4 on all six 
                                     sides)
  --autobox_extend arg (=1)          Expand the autobox if needed to ensure the
                                     input conformation of the ligand being 
                                     docked can freely rotate within the box.
  --no_lig                           no ligand; for sampling/minimizing 
                                     flexible residues

Covalent docking:
  --covalent_rec_atom arg            Receptor atom ligand is covalently bound 
                                     to.  Can be specified as 
                                     chain:resnum:atom_name or as x,y,z 
                                     Cartesian coordinates.
  --covalent_lig_atom_pattern arg    SMARTS expression for ligand atom that 
                                     will covalently bind protein.
  --covalent_lig_atom_position arg   Optional.  Initial placement of covalently
                                     bonding ligand atom in x,y,z Cartesian 
                                     coordinates.  If not specified, 
                                     OpenBabel's GetNewBondVector function will
                                     be used to position ligand.
  --covalent_fix_lig_atom_position   If covalent_lig_atom_position is 
                                     specified, fix the ligand atom to this 
                                     position as opposed to using this position
                                     to define the initial structure.
  --covalent_bond_order arg (=1)     Bond order of covalent bond. Default 1.
  --covalent_optimize_lig            Optimize the covalent complex of ligand 
                                     and residue using UFF. This will change 
                                     bond angles and lengths of the ligand.

Scoring and minimization options:
  --scoring arg                      specify alternative built-in scoring 
                                     function: ad4_scoring default dkoes_fast 
                                     dkoes_scoring dkoes_scoring_old vina 
                                     vinardo
  --custom_scoring arg               custom scoring function file
  --custom_atoms arg                 custom atom type parameters file
  --score_only                       score provided ligand pose
  --local_only                       local search only using autobox (you 
                                     probably want to use --minimize)
  --minimize                         energy minimization
  --randomize_only                   generate random poses, attempting to avoid
                                     clashes
  --num_mc_steps arg                 fixed number of monte carlo steps to take 
                                     in each chain
  --max_mc_steps arg                 cap on number of monte carlo steps to take
                                     in each chain
  --num_mc_saved arg                 number of top poses saved in each monte 
                                     carlo chain
  --temperature arg                  temperature for metropolis accept 
                                     criterion
  --minimize_iters arg (=0)          number iterations of steepest descent; 
                                     default scales with rotors and usually 
                                     isn't sufficient for convergence
  --accurate_line                    use accurate line search
  --simple_ascent                    use simple gradient ascent
  --minimize_early_term              Stop minimization before convergence 
                                     conditions are fully met.
  --minimize_single_full             During docking perform a single full 
                                     minimization instead of a truncated 
                                     pre-evaluate followed by a full.
  --approximation arg                approximation (linear, spline, or exact) 
                                     to use
  --factor arg                       approximation factor: higher results in a 
                                     finer-grained approximation
  --force_cap arg                    max allowed force; lower values more 
                                     gently minimize clashing structures
  --user_grid arg                    Autodock map file for user grid data based
                                     calculations
  --user_grid_lambda arg (=-1)       Scales user_grid and functional scoring
  --print_terms                      Print all available terms with default 
                                     parameterizations
  --print_atom_types                 Print all available atom types

Convolutional neural net (CNN) scoring:
  --cnn_scoring arg (=1)             Amount of CNN scoring: none, rescore 
                                     (default), refinement, metrorescore 
                                     (metropolis+rescore), metrorefine 
                                     (metropolis+refine), all
  --cnn arg                          built-in model to use, specify 
                                     PREFIX_ensemble to evaluate an ensemble of
                                     models starting with PREFIX: 
                                     crossdock_default2018 
                                     crossdock_default2018_1 
                                     crossdock_default2018_2 
                                     crossdock_default2018_3 
                                     crossdock_default2018_4 default2017 dense 
                                     dense_1 dense_2 dense_3 dense_4 
                                     general_default2018 general_default2018_1 
                                     general_default2018_2 
                                     general_default2018_3 
                                     general_default2018_4 redock_default2018 
                                     redock_default2018_1 redock_default2018_2 
                                     redock_default2018_3 redock_default2018_4
  --cnn_model arg                    caffe cnn model file; if not specified a 
                                     default model will be used
  --cnn_weights arg                  caffe cnn weights file (*.caffemodel); if 
                                     not specified default weights (trained on 
                                     the default model) will be used
  --cnn_resolution arg (=0.5)        resolution of grids, don't change unless 
                                     you really know what you are doing
  --cnn_rotation arg (=0)            evaluate multiple rotations of pose (max 
                                     24)
  --cnn_update_min_frame arg (=1)    During minimization, recenter coordinate 
                                     frame as ligand moves
  --cnn_freeze_receptor              Don't move the receptor with respect to a 
                                     fixed coordinate system
  --cnn_mix_emp_force                Merge CNN and empirical minus forces
  --cnn_mix_emp_energy               Merge CNN and empirical energy
  --cnn_empirical_weight arg (=1)    Weight for scaling and merging empirical 
                                     force and energy 
  --cnn_outputdx                     Dump .dx files of atom grid gradient.
  --cnn_outputxyz                    Dump .xyz files of atom gradient.
  --cnn_xyzprefix arg (=gradient)    Prefix for atom gradient .xyz files
  --cnn_center_x arg                 X coordinate of the CNN center
  --cnn_center_y arg                 Y coordinate of the CNN center
  --cnn_center_z arg                 Z coordinate of the CNN center
  --cnn_verbose                      Enable verbose output for CNN debugging

Output:
  -o [ --out ] arg                   output file name, format taken from file 
                                     extension
  --out_flex arg                     output file for flexible receptor residues
  --log arg                          optionally, write log file
  --atom_terms arg                   optionally write per-atom interaction term
                                     values
  --atom_term_data                   embedded per-atom interaction terms in 
                                     output sd data
  --pose_sort_order arg (=0)         How to sort docking results: CNNscore 
                                     (default), CNNaffinity, Energy
  --full_flex_output                 Output entire structure for out_flex, not 
                                     just flexible residues.

Misc (optional):
  --cpu arg                          the number of CPUs to use (the default is 
                                     to try to detect the number of CPUs or, 
                                     failing that, use 1)
  --seed arg                         explicit random seed
  --exhaustiveness arg (=8)          exhaustiveness of the global search 
                                     (roughly proportional to time)
  --num_modes arg (=9)               maximum number of binding modes to 
                                     generate
  --min_rmsd_filter arg (=1)         rmsd value used to filter final poses to 
                                     remove redundancy
  -q [ --quiet ]                     Suppress output messages
  --addH arg                         automatically add hydrogens in ligands (on
                                     by default)
  --stripH arg                       remove hydrogens from molecule _after_ 
                                     performing atom typing for efficiency (off
                                     by default)
  --device arg (=0)                  GPU device to use
  --no_gpu                           Disable GPU acceleration, even if 
                                     available.

Configuration file (optional):
  --config arg                       the above options can be put here

Information (optional):
  --help                             display usage summary
  --help_hidden                      display usage summary with hidden options
  --version                          display program version

CNN Scoring

--cnn_scoring determines at what points of the docking procedure that the CNN scoring function is used.

none - No CNNs used for docking. Uses the specified empirical scoring function throughout.
rescore (default) - CNN used for reranking of final poses. Least computationally expensive CNN option.
refinement - CNN used to refine poses after Monte Carlo chains and for final ranking of output poses. 10x slower than rescore when using a GPU.
all - CNN used as the scoring function throughout the whole procedure. Extremely computationally intensive and not recommended.

The default CNN scoring function is an ensemble of 5 models selected to balance pose prediction performance and runtime: dense, general_default2018_3, dense_3, crossdock_default2018, and redock_default2018. More information on these various models can be found in the papers listed above.

Training

Scripts to aid in training new CNN models can be found at https://github.com/gnina/scripts and sample models at https://github.com/gnina/models.

The DUD-E docked poses used in the original paper can be found here and the CrossDocked2020 set is here.

License

gnina is dual licensed under GPL and Apache. The GPL license is necessitated by the use of OpenBabel (which is GPL licensed). In order to use gnina under the Apache license only, all references to OpenBabel must be removed from the source code.

libmolgrid's People

Contributors

Stargazers

Watchers

libmolgrid's Issues

Grids are not centered when dimension is not divisible by resolution

The centering of grids appears to depend on the dimension argument- the output grids are only centered at the provided center if dimension is divisible by resolution. See the following example:

import molgrid, torch

# single atom with radius=1.0 located at (0,0,0)
coords = torch.zeros((1, 3), device='cuda')
types = torch.ones((1, 1), device='cuda')
radii = torch.ones((1,), device='cuda')

# two grid makers with resolution=1.0 and center=(0,0,0)
#   with same size (# points) but different "dimensions"

gridder1 = molgrid.Coords2Grid(
    gmaker=molgrid.GridMaker(
        resolution=1.0,
        dimension=2.0,
    ),
    center=(0,0,0)
)
gridder2 = molgrid.Coords2Grid(
    gmaker=molgrid.GridMaker(
        resolution=1.0,
        dimension=1.9,
    ),
    center=(0,0,0)
)

grid1 = gridder1.forward(coords, types, radii)
grid2 = gridder2.forward(coords, types, radii)

print('grids have the same shape:', grid1.shape == grid2.shape)
print('grids have the same values:', (grid1 == grid2).all().item())

m = tuple(dim//2 for dim in grid1.shape) # midpoint index
print('grid1 is centered at (0,0,0):', (grid1[m] == 1.0).item())
print('grid2 is centered at (0,0,0):', (grid2[m] == 1.0).item())

def check_symmetry(grid):
    return (
        (grid == grid.flip(dims=(1,))).all() and
        (grid == grid.flip(dims=(2,))).all() and
        (grid == grid.flip(dims=(3,))).all()
    ).item()

print('grid1 is symmetric:', check_symmetry(grid1))
print('grid2 is symmetric:', check_symmetry(grid2))

Which produces this output:

grids have the same shape: True
grids have the same values: False
grid1 is centered at (0,0,0): True
grid2 is centered at (0,0,0): False
grid1 is symmetric: True
grid2 is symmetric: False

cmake error

When I cmake .., error found as below:
CMake Error at /opt/conda/lib/cmake/Boost-1.70.0/BoostConfig.cmake:95 (find_package):
Could not find a package configuration file provided by "boost_python38"
(requested version 1.70.0) with any of the following names:

boost_python38Config.cmake
boost_python38-config.cmake

Add the installation prefix of "boost_python38" to CMAKE_PREFIX_PATH or set
"boost_python38_DIR" to a directory containing one of the above files. If
"boost_python38" provides a separate development package or SDK, be sure it
has been installed.
Call Stack (most recent call first):
/opt/conda/lib/cmake/Boost-1.70.0/BoostConfig.cmake:124 (boost_find_dependency)
/opt/conda/share/cmake-3.19/Modules/FindBoost.cmake:460 (find_package)
python/CMakeLists.txt:17 (find_package)

Troubles compiling libmolgrid

Working platform:

CentOS 7.4
GCC 6.2
Python 3.6
Boost 1.72
CMake 3.12
CUDA 9.0

Below is the output when I just cmake in the build directory.

-- The C compiler identification is GNU 6.2.0
-- The CXX compiler identification is GNU 6.2.0
-- The CUDA compiler identification is NVIDIA 9.0.176
-- Check for working C compiler: /usr/local/gcc-6.2/bin/gcc
-- Check for working C compiler: /usr/local/gcc-6.2/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/local/gcc-6.2/bin/g++
-- Check for working CXX compiler: /usr/local/gcc-6.2/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found Git: /usr/bin/git
-- Current git revision is
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda (found version "9.0")
-- Boost  found.
-- Found Boost components:
   regex;unit_test_framework;program_options;system;filesystem;iostreams
-- Found Open Babel include files at /usr/local/include/openbabel3
-- Found Open Babel library at /lib64/libopenbabel.so
Setting openbabel found TRUE
-- Found PythonLibs: /usr/lib64/libpython3.6m.so (found version "3.6.8")
-- Found PythonInterp: /usr/bin/python3.6 (found version "3.6.8")
CMake Error at /usr/local/lib/cmake/Boost-1.72.0/BoostConfig.cmake:120 (find_package):
  Could not find a package configuration file provided by "boost_python3"
  (requested version 1.72.0) with any of the following names:

    boost_python3Config.cmake
    boost_python3-config.cmake

  Add the installation prefix of "boost_python3" to CMAKE_PREFIX_PATH or set
  "boost_python3_DIR" to a directory containing one of the above files.  If
  "boost_python3" provides a separate development package or SDK, be sure it
  has been installed.
Call Stack (most recent call first):
  /usr/local/lib/cmake/Boost-1.72.0/BoostConfig.cmake:185 (boost_find_component)
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:261 (find_package)
  python/CMakeLists.txt:14 (find_package)


-- Configuring incomplete, errors occurred!
See also "/home/wangqi/wangqi/downloads/libmolgrid-master/build/CMakeFiles/CMakeOutput.log".
See also "/home/wangqi/wangqi/downloads/libmolgrid-master/build/CMakeFiles/CMakeError.log".

I believe Boost was properly installed since the following files are found in my /usr/local/lib:

libboost_python3.a
libboost_python3.so
libboost_python36.a
libboost_python36.so
libboost_python36.so.1
libboost_python36.so.1.72
libboost_python36.so.1.72.0

After some research, I added set(Boost_NO_BOOST_CMAKE true) to CMakeLists.txt (everything else is untouched) to make it go through:

-- The C compiler identification is GNU 6.2.0
-- The CXX compiler identification is GNU 6.2.0
-- The CUDA compiler identification is NVIDIA 9.0.176
-- Check for working C compiler: /usr/local/gcc-6.2/bin/gcc
-- Check for working C compiler: /usr/local/gcc-6.2/bin/gcc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/local/gcc-6.2/bin/g++
-- Check for working CXX compiler: /usr/local/gcc-6.2/bin/g++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found Git: /usr/bin/git
-- Current git revision is
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda (found version "9.0")
CMake Warning at /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:847 (message):
  New Boost version may have incorrect or missing dependencies and imported
  targets
Call Stack (most recent call first):
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:963 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:1622 (_Boost_MISSING_DEPENDENCIES)
  CMakeLists.txt:40 (find_package)


CMake Warning at /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:847 (message):
  New Boost version may have incorrect or missing dependencies and imported
  targets
Call Stack (most recent call first):
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:963 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:1622 (_Boost_MISSING_DEPENDENCIES)
  CMakeLists.txt:40 (find_package)


CMake Warning at /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:847 (message):
  New Boost version may have incorrect or missing dependencies and imported
  targets
Call Stack (most recent call first):
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:963 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:1622 (_Boost_MISSING_DEPENDENCIES)
  CMakeLists.txt:40 (find_package)


CMake Warning at /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:847 (message):
  New Boost version may have incorrect or missing dependencies and imported
  targets
Call Stack (most recent call first):
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:963 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:1622 (_Boost_MISSING_DEPENDENCIES)
  CMakeLists.txt:40 (find_package)


CMake Warning at /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:847 (message):
  New Boost version may have incorrect or missing dependencies and imported
  targets
Call Stack (most recent call first):
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:963 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:1622 (_Boost_MISSING_DEPENDENCIES)
  CMakeLists.txt:40 (find_package)


CMake Warning at /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:847 (message):
  New Boost version may have incorrect or missing dependencies and imported
  targets
Call Stack (most recent call first):
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:963 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:1622 (_Boost_MISSING_DEPENDENCIES)
  CMakeLists.txt:40 (find_package)


-- Boost version: 1.72.0
-- Found the following Boost libraries:
--   regex
--   unit_test_framework
--   program_options
--   system
--   filesystem
--   iostreams
-- Found Open Babel include files at /usr/local/include/openbabel3
-- Found Open Babel library at /lib64/libopenbabel.so
Setting openbabel found TRUE
-- Found PythonLibs: /usr/lib64/libpython3.6m.so (found version "3.6.8")
-- Found PythonInterp: /usr/bin/python3.6 (found version "3.6.8")
CMake Warning at /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:847 (message):
  New Boost version may have incorrect or missing dependencies and imported
  targets
Call Stack (most recent call first):
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:963 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:1622 (_Boost_MISSING_DEPENDENCIES)
  python/CMakeLists.txt:14 (find_package)


CMake Warning at /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:847 (message):
  New Boost version may have incorrect or missing dependencies and imported
  targets
Call Stack (most recent call first):
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:963 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:1622 (_Boost_MISSING_DEPENDENCIES)
  python/CMakeLists.txt:14 (find_package)


CMake Warning at /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:847 (message):
  New Boost version may have incorrect or missing dependencies and imported
  targets
Call Stack (most recent call first):
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:963 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:1622 (_Boost_MISSING_DEPENDENCIES)
  python/CMakeLists.txt:14 (find_package)


-- Boost version: 1.72.0
-- Found the following Boost libraries:
--   system
--   filesystem
--   python3
-- Found NumPy: /usr/local/lib64/python3.6/site-packages/numpy/core/include (found version "1.18.3")
CMake Warning at /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:847 (message):
  New Boost version may have incorrect or missing dependencies and imported
  targets
Call Stack (most recent call first):
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:963 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:1622 (_Boost_MISSING_DEPENDENCIES)
  test/CMakeLists.txt:5 (find_package)


CMake Warning at /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:847 (message):
  New Boost version may have incorrect or missing dependencies and imported
  targets
Call Stack (most recent call first):
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:963 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.12/Modules/FindBoost.cmake:1622 (_Boost_MISSING_DEPENDENCIES)
  test/CMakeLists.txt:5 (find_package)


-- Boost version: 1.72.0
-- Found the following Boost libraries:
--   unit_test_framework
--   system
-- Configuring done
-- Generating done
-- Build files have been written to: /home/wangqi/wangqi/downloads/libmolgrid-master/build

So I assume I can continue with the compilation. However, when I run make command, the following error was given:

Scanning dependencies of target libmolgrid_static
[  1%] Building CXX object src/CMakeFiles/libmolgrid_static.dir/libmolgrid.cpp.o
[  2%] Building CXX object src/CMakeFiles/libmolgrid_static.dir/atom_typer.cpp.o
[  4%] Building CXX object src/CMakeFiles/libmolgrid_static.dir/example.cpp.o
[  5%] Building CXX object src/CMakeFiles/libmolgrid_static.dir/exampleref_providers.cpp.o
[  7%] Building CXX object src/CMakeFiles/libmolgrid_static.dir/example_extractor.cpp.o
[  8%] Building CXX object src/CMakeFiles/libmolgrid_static.dir/example_provider.cpp.o
[ 10%] Building CXX object src/CMakeFiles/libmolgrid_static.dir/grid_maker.cpp.o
[ 11%] Building CUDA object src/CMakeFiles/libmolgrid_static.dir/grid_maker.cu.o
/usr/local/include/boost/core/noncopyable.hpp(42): error: defaulted default constructor cannot be constexpr because the corresponding implicitly declared default constructor would not be constexpr

1 error detected in the compilation of "/tmp/tmpxft_00007b2b_00000000-9_grid_maker.compute_70.cpp1.ii".
make[2]: *** [src/CMakeFiles/libmolgrid_static.dir/grid_maker.cu.o] Error 1
make[1]: *** [src/CMakeFiles/libmolgrid_static.dir/all] Error 2
make: *** [all] Error 2

So I was wondering:

Is adding set(Boost_NO_BOOST_CMAKE true) to CMakeLists.txt the correct way to make it work?
What could be the cause of the /usr/local/include/boost/core/noncopyable.hpp(42): error?
Actually I've also tried GCC 4.8 and 9.0 but neither worked. What are the versions of GCC, CUDA, Boost, Python and CMake were you using upon the publication of the Gnina paper?

How to voxelize a single .pdb file?

Hello,
It appears that libmolgrid is a very powerful library and it being open source is a big plus for the community. But as a beginner to the field, it seems quite non-intuitive (with minimal examples and little documentation). I am struggling to perform a rather simple task. For example, if I have a protein file in .pdb format, how do I voxelize it? Looking at the examples, it seems that the code accepts .gninatype files but it does not explain anywhere how to get such .gninatype files in the first place? Provided .pdb files are most common, it would serve the community well if you could please provide a minimalistic (but complete code) example for processing a single .pdb file to generate a Tensor.
I looked at other issues and it seems that someone else asked it as well. Although you have provided some comments there, it is still quite obscure for me as a complete beginner to the field.

I am happy to contribute to the documentation for the same publicly, if possible, to help others in the future with similar questions. Thank you.

ligonly.types file missing

The file ligonly.types seems to be missing from test/data and the pymolgrid test fails as follows:

11: =================================== FAILURES ===================================
11: ______________________ test_make_vector_types_ex_provider ______________________
11:
11: capsys = <_pytest.capture.CaptureFixture object at 0x7fd73b9b4dd8>
11:
11:     def test_make_vector_types_ex_provider(capsys):
11:         fname = datadir+"/ligonly.types"
11:         e = molgrid.ExampleProvider(data_root=datadir+"/structs",make_vector_types=True)
11: >       e.populate(fname)
11: E       ValueError: Could not open file /home/lina3015/git/libmolgrid/test/data/ligonly.types
11:
11: test/test_example_provider.py:194: ValueError

Fail to import the MolDataset

Hi,

I would like to use the MolDataset class shown in your document (https://gnina.github.io/libmolgrid/python/index.html?highlight=merge_coordinates#molgrid.torch_bindings.MolDataset), but it fails to import this class. I have tried both installing from the source (version 0.2.1) and installing by pip. They all show the ImportError as the following:
ImportError: cannot import name 'MolDataset' from 'molgrid.torch_bindings' (/usr/local/lib/python3.7/dist-packages/molgrid-0.2.1.dev35+g281b3c3-py3.7.egg/molgrid/torch_bindings.py)
(pip install version: ImportError: cannot import name 'MolDataset' from 'molgrid.torch_bindings' (/opt/tiger/miniconda/lib/python3.7/site-packages/molgrid/torch_bindings.py))

Could you please help me with this issue?

Thanks!

what is the input format for ligand only

Hi, I tried to use libmolgrid to make only ligands (ligonly.types in data folder) (no binding protein) into grid. But one error happened:
File "test_torch_cnn.py", line 52, in test_train_torch_cnn
e.populate(fname)
ValueError: Example has no label at position 0. There are only 0 labels

Very appreciate if you could help provide some clues or some tutorials about the input format for ligand only. Thanks!

How to apply prediction model to external set

Hi,

The examples of tests/test_torch_cnn.py and README.md are very easy to understand for me.

However, I do not know what should I do when applying a prediction model to an external set (cf. test set).

The next_batch method of molgrid.ExampleProvider seems to generate more than the number of data (repeatly). Is it possible to generate exactly the number of test sets?
Now, I use for statements to generate the number of test sets with batch_size = 1.

Please let me know if there are other means.

Thanks for your help.

Question about atom_typer.cpp

Hi,

Thank you for the awesome repository! I just had a quick question regarding a few lines in atom_typer.cpp, specifically the code snippet below:

switch(a->GetAtomicNum()) {
case 1:
  ename =  a->IsPolarHydrogen() ? "HD" : "H";
  break;
case 6:
  if(a->IsAromatic()) ename = "A";
  break;
case 7:
  if(a->IsHbondAcceptor()) ename = "NA";
  break;
case 8:
  ename = "OA";
  break;
case 16:
  if(a->IsHbondAcceptor()) ename = "SA";
  break;
case 34:
  ename = "S"; //historically selenium is treated as sulfur  ¯\_(ツ)_/¯
  break;
}

It seems to me as if the ename variable gets set to "OA" for any type of oxygen atom, no matter its hydrogen-bond-acceptor status. Is this intended behaviour? If so, why?

Equally, it seems to me that as a consequence the first two of the cases below will never occur:

case OxygenXSDonor: //O_O_O_D,
case Oxygen: //O_O_O_P,
  ret = Hbonded ? OxygenXSDonor : Oxygen;
  break;
case OxygenXSDonorAcceptor: //O_OA_O_DA, also an autodock acceptor
case OxygenXSAcceptor: //O_OA_O_A, also an autodock acceptor
  ret = Hbonded ? OxygenXSDonorAcceptor : OxygenXSAcceptor;
  break;

Is this simply hardcoding the fact that at least one of the lone pairs on the oxygen should be available to act as a hydrogen-bond receptor? At least I can't think of a counterexample to that straight away... Anyway, I thought I should check just to be sure.

libmolgrid fails at make

Trying to build libmolgrid from source with GNU compilers 8.4.0.
The cmake stage is successful, I've previously built OpenBabel following the given instructions here: https://github.com/gnina/gnina

I have set the variable to OpenBabel for header and libs and my cmake commands looks like:
cmake -DCMAKE_CUDA_COMPILER=/usr/local/cuda-10.2/bin/nvcc -DOPENBABEL3_INCLUDE_DIR=/mnt/shared/releases/compiled/OpenBabel/include -DOPENBABEL3_LIBRARIES=/mnt/shared/releases/compiled/OpenBabel/lib -DCMAKE_INSTALL_PREFIX=/mnt/shared/releases/compiled/Libmolgrid ..

In the make stage I get the following errors when trying to build test_gridmaker_cpp:

CMakeFiles/test_coordinateset_cpp.dir/test_coordinateset.cpp.o: In function vectortyper::test_method()': /mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:36: undefined reference to OpenBabel::OBConversion::OBConversion(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)'
/mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:37: undefined reference to OpenBabel::OBMol::OBMol()' /mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:38: undefined reference to OpenBabel::OBConversion::Read(OpenBabel::OBBase*, std::istream*)'
/mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:37: undefined reference to OpenBabel::OBMol::~OBMol()' /mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:36: undefined reference to OpenBabel::OBConversion::~OBConversion()'
CMakeFiles/test_coordinateset_cpp.dir/test_coordinateset.cpp.o: In function vectortyper_invoker()': /mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:36: undefined reference to OpenBabel::OBConversion::OBConversion(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)'
/mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:37: undefined reference to OpenBabel::OBMol::OBMol()' /mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:38: undefined reference to OpenBabel::OBConversion::Read(OpenBabel::OBBase*, std::istream*)'
/mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:37: undefined reference to OpenBabel::OBMol::~OBMol()' /mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:36: undefined reference to OpenBabel::OBConversion::~OBConversion()'
CMakeFiles/test_coordinateset_cpp.dir/test_coordinateset.cpp.o: In function vectortyper::test_method()': /mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:37: undefined reference to OpenBabel::OBMol::~OBMol()'
/mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:36: undefined reference to OpenBabel::OBConversion::~OBConversion()' CMakeFiles/test_coordinateset_cpp.dir/test_coordinateset.cpp.o: In function vectortyper_invoker() [clone .cold.228]':
/mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:37: undefined reference to OpenBabel::OBMol::~OBMol()' /mnt/shared/releases/source/libmolgrid/test/test_coordinateset.cpp:36: undefined reference to OpenBabel::OBConversion::~OBConversion()'
../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::ElementIndexTyper::get_atom_type_index(OpenBabel::OBAtom*) const': /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:211: undefined reference to OpenBabel::OBElements::GetCovalentRad(unsigned int)'
../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::ElementIndexTyper::get_int_type(int) const': /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:218: undefined reference to OpenBabel::OBElements::GetCovalentRad(unsigned int)'
../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::GninaIndexTyper::get_atom_type_index(OpenBabel::OBAtom*) const': /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:88: undefined reference to OpenBabel::OBAtomAtomIter::OBAtomAtomIter(OpenBabel::OBAtom*)'
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:88: undefined reference to OpenBabel::OBAtomAtomIter::operator++()' /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:95: undefined reference to OpenBabel::OBElements::GetSymbol(unsigned int)'
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:101: undefined reference to OpenBabel::OBAtom::IsPolarHydrogen()' /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:113: undefined reference to OpenBabel::OBAtom::IsHbondAcceptor()'
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:107: undefined reference to OpenBabel::OBAtom::IsHbondAcceptor()' /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:104: undefined reference to OpenBabel::OBAtom::IsAromatic() const'
../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::GninaVectorTyper::get_atom_type_vector(OpenBabel::OBAtom*, std::vector<float, std::allocator<float> >&) const': /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:389: undefined reference to OpenBabel::OBAtom::GetPartialCharge()'
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:390: undefined reference to OpenBabel::OBAtom::IsAromatic() const' ../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::ElementIndexTyper::get_type_radii() const':
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:238: undefined reference to OpenBabel::OBElements::GetCovalentRad(unsigned int)' ../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::ElementIndexTyper::get_type_namesabi:cxx11 const':
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:228: undefined reference to OpenBabel::OBElements::GetName(unsigned int)' ../lib/libmolgrid.a(coordinateset.cpp.o): In function libmolgrid::CoordinateSet::CoordinateSet(OpenBabel::OBMol*, libmolgrid::AtomTyper const&)':
/mnt/shared/releases/source/libmolgrid/src/coordinateset.cpp:30: undefined reference to OpenBabel::OBMolAtomIter::OBMolAtomIter(OpenBabel::OBMol*)' /mnt/shared/releases/source/libmolgrid/src/coordinateset.cpp:30: undefined reference to OpenBabel::OBMolAtomIter::operator++()'
collect2: error: ld returned 1 exit status
test/CMakeFiles/test_coordinateset_cpp.dir/build.make:108: recipe for target 'bin/test_coordinateset_cpp' failed
make[2]: *** [bin/test_coordinateset_cpp] Error 1
CMakeFiles/Makefile2:1083: recipe for target 'test/CMakeFiles/test_coordinateset_cpp.dir/all' failed
make[1]: *** [test/CMakeFiles/test_coordinateset_cpp.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::ElementIndexTyper::get_atom_type_index(OpenBabel::OBAtom*) const': /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:211: undefined reference to OpenBabel::OBElements::GetCovalentRad(unsigned int)'
../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::ElementIndexTyper::get_int_type(int) const': /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:218: undefined reference to OpenBabel::OBElements::GetCovalentRad(unsigned int)'
../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::GninaIndexTyper::get_atom_type_index(OpenBabel::OBAtom*) const': /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:88: undefined reference to OpenBabel::OBAtomAtomIter::OBAtomAtomIter(OpenBabel::OBAtom*)'
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:88: undefined reference to OpenBabel::OBAtomAtomIter::operator++()' /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:95: undefined reference to OpenBabel::OBElements::GetSymbol(unsigned int)'
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:101: undefined reference to OpenBabel::OBAtom::IsPolarHydrogen()' /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:113: undefined reference to OpenBabel::OBAtom::IsHbondAcceptor()'
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:107: undefined reference to OpenBabel::OBAtom::IsHbondAcceptor()' /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:104: undefined reference to OpenBabel::OBAtom::IsAromatic() const'
../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::GninaVectorTyper::get_atom_type_vector(OpenBabel::OBAtom*, std::vector<float, std::allocator<float> >&) const': /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:389: undefined reference to OpenBabel::OBAtom::GetPartialCharge()'
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:390: undefined reference to OpenBabel::OBAtom::IsAromatic() const' ../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::ElementIndexTyper::get_type_radii() const':
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:238: undefined reference to OpenBabel::OBElements::GetCovalentRad(unsigned int)' ../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::ElementIndexTyper::get_type_namesabi:cxx11 const':
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:228: undefined reference to OpenBabel::OBElements::GetName(unsigned int)' ../lib/libmolgrid.a(coordinateset.cpp.o): In function libmolgrid::CoordinateSet::CoordinateSet(OpenBabel::OBMol*, libmolgrid::AtomTyper const&)':
/mnt/shared/releases/source/libmolgrid/src/coordinateset.cpp:30: undefined reference to OpenBabel::OBMolAtomIter::OBMolAtomIter(OpenBabel::OBMol*)' /mnt/shared/releases/source/libmolgrid/src/coordinateset.cpp:30: undefined reference to OpenBabel::OBMolAtomIter::operator++()'
../lib/libmolgrid.a(coord_cache.cpp.o): In function libmolgrid::CoordCache::set_coords(char const*, libmolgrid::CoordinateSet&)': /mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:148: undefined reference to OpenBabel::OBConversion::OBConversion(std::istream*, std::ostream*)'
/mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:149: undefined reference to OpenBabel::OBMol::OBMol()' /mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:150: undefined reference to OpenBabel::OBConversion::ReadFile(OpenBabel::OBBase*, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)'
/mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:154: undefined reference to OpenBabel::OBMol::AddHydrogens(bool, bool, double)' /mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:149: undefined reference to OpenBabel::OBMol::~OBMol()'
/mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:148: undefined reference to OpenBabel::OBConversion::~OBConversion()' /mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:149: undefined reference to OpenBabel::OBMol::~OBMol()'
/mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:148: undefined reference to `OpenBabel::OBConversion::~OBConversion()'
collect2: error: ld returned 1 exit status
test/CMakeFiles/test_gridmaker_cpp.dir/build.make:108: recipe for target 'bin/test_gridmaker_cpp' failed
make[2]: *** [bin/test_gridmaker_cpp] Error 1
CMakeFiles/Makefile2:1031: recipe for target 'test/CMakeFiles/test_gridmaker_cpp.dir/all' failed
make[1]: *** [test/CMakeFiles/test_gridmaker_cpp.dir/all] Error 2

and test_gridmaker_cu:

../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::ElementIndexTyper::get_atom_type_index(OpenBabel::OBAtom*) const': /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:211: undefined reference to OpenBabel::OBElements::GetCovalentRad(unsigned int)'
../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::ElementIndexTyper::get_int_type(int) const': /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:218: undefined reference to OpenBabel::OBElements::GetCovalentRad(unsigned int)'
../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::GninaIndexTyper::get_atom_type_index(OpenBabel::OBAtom*) const': /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:88: undefined reference to OpenBabel::OBAtomAtomIter::OBAtomAtomIter(OpenBabel::OBAtom*)'
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:88: undefined reference to OpenBabel::OBAtomAtomIter::operator++()' /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:95: undefined reference to OpenBabel::OBElements::GetSymbol(unsigned int)'
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:101: undefined reference to OpenBabel::OBAtom::IsPolarHydrogen()' /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:113: undefined reference to OpenBabel::OBAtom::IsHbondAcceptor()'
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:107: undefined reference to OpenBabel::OBAtom::IsHbondAcceptor()' /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:104: undefined reference to OpenBabel::OBAtom::IsAromatic() const'
../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::GninaVectorTyper::get_atom_type_vector(OpenBabel::OBAtom*, std::vector<float, std::allocator<float> >&) const': /mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:389: undefined reference to OpenBabel::OBAtom::GetPartialCharge()'
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:390: undefined reference to OpenBabel::OBAtom::IsAromatic() const' ../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::ElementIndexTyper::get_type_radii() const':
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:238: undefined reference to OpenBabel::OBElements::GetCovalentRad(unsigned int)' ../lib/libmolgrid.a(atom_typer.cpp.o): In function libmolgrid::ElementIndexTyper::get_type_namesabi:cxx11 const':
/mnt/shared/releases/source/libmolgrid/src/atom_typer.cpp:228: undefined reference to OpenBabel::OBElements::GetName(unsigned int)' ../lib/libmolgrid.a(coordinateset.cpp.o): In function libmolgrid::CoordinateSet::CoordinateSet(OpenBabel::OBMol*, libmolgrid::AtomTyper const&)':
/mnt/shared/releases/source/libmolgrid/src/coordinateset.cpp:30: undefined reference to OpenBabel::OBMolAtomIter::OBMolAtomIter(OpenBabel::OBMol*)' /mnt/shared/releases/source/libmolgrid/src/coordinateset.cpp:30: undefined reference to OpenBabel::OBMolAtomIter::operator++()'
../lib/libmolgrid.a(coord_cache.cpp.o): In function libmolgrid::CoordCache::set_coords(char const*, libmolgrid::CoordinateSet&)': /mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:148: undefined reference to OpenBabel::OBConversion::OBConversion(std::istream*, std::ostream*)'
/mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:149: undefined reference to OpenBabel::OBMol::OBMol()' /mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:150: undefined reference to OpenBabel::OBConversion::ReadFile(OpenBabel::OBBase*, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)'
/mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:154: undefined reference to OpenBabel::OBMol::AddHydrogens(bool, bool, double)' /mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:149: undefined reference to OpenBabel::OBMol::~OBMol()'
/mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:148: undefined reference to OpenBabel::OBConversion::~OBConversion()' /mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:149: undefined reference to OpenBabel::OBMol::~OBMol()'
/mnt/shared/releases/source/libmolgrid/src/coord_cache.cpp:148: undefined reference to `OpenBabel::OBConversion::~OBConversion()'
collect2: error: ld returned 1 exit status
test/CMakeFiles/test_gridmaker_cu.dir/build.make:108: recipe for target 'bin/test_gridmaker_cu' failed
make[2]: *** [bin/test_gridmaker_cu] Error 1
CMakeFiles/Makefile2:1005: recipe for target 'test/CMakeFiles/test_gridmaker_cu.dir/all' failed
make[1]: *** [test/CMakeFiles/test_gridmaker_cu.dir/all] Error 2

Could you please help? I am wondering what I am doing wrong.
I am installing on Ubuntu 18.04.

Thank you and Best Regards,
Davide

conda obabel and molgrid

Hi,
Thank you so much for creating this project. I am hoping to get some ideas about how to debug an install issue I am running into. I am trying to use the CoordinateSet module within molgrid so that it may interface obabel. To that end, I started with a fresh conda python 3.6 or 3.7 environment followed by install obabel from conda and molgrid using either pip or conda. I am seeing random kernel dumps when I try to run the coordinate set tests.

m = pybel.readstring('smi','c1ccccc1CO')
m.addh()
m.make3D()

c = molgrid.CoordinateSet(m) #default gnina ligand types

rootdir: /data/software/libmolgrid/test
plugins: flaky-3.6.0, cov-2.7.1, xdist-1.30.0, Flask-Dance-2.2.0, forked-1.1.3
collected 4 items                                                                                                                                                 

test_coordinateset.py::test_coordset_from_mol FAILED                                                                                                        [ 25%]
test_coordinateset.py::test_coordset_from_mol_vec Fatal Python error: Segmentation fault

Any ideas as to how I should build the libraries so that it may properly interface with obabel? Thank you for all your help!
Best,
Muneeb

libmolgrid on OS X?

Hello!
I've been trying to install libmolgrid on OS X and failing miserably, and, before I spend more time on this I just want to make sure: is libmolgrid meant to work only on Linux?

Thank you!

RDKit failures if molgrid is imported first

I encountered an odd incompatibility between molgrid and rdkit which seems to depend on the order of import statements. I installed molgrid using pip in a conda environment where rdkit has been installed from the conda-forge channel.

The following snippet works as expected:

from rdkit import Chem
from rdkit.Chem import AllChem

m = Chem.MolFromSmiles('C1CCC1OC')
m2 = Chem.AddHs(m)
cids = AllChem.EmbedMultipleConfs(m2, numConfs=2)

If molgrid is imported before rdkit

import molgrid
from rdkit import Chem
from rdkit.Chem import AllChem

m = Chem.MolFromSmiles('C1CCC1OC')
m2 = Chem.AddHs(m)
cids = AllChem.EmbedMultipleConfs(m2, numConfs=2)

I get the following failure on the last line (EmbedMultipleConfs call):

TypeError: No to_python (by-value) converter found for C++ type: std::vector<int, std::allocator<int> >

The error does not appear if molgrid is imported after rdkit:

from rdkit import Chem
from rdkit.Chem import AllChem
import molgrid

m = Chem.MolFromSmiles('C1CCC1OC')
m2 = Chem.AddHs(m)
cids = AllChem.EmbedMultipleConfs(m2, numConfs=2)

conda environment to reproduce the issue:

name: rdkit-molgrid
channels:
  - conda-forge
  - pytorch
dependencies:
  - python=3.7
  - ipython
  - pip

  - rdkit=2021.03.3
  - cudatoolkit=11.1
  - pytorch

  - pip:
    - molgrid==0.2.1

how to make the grid density much sharper?

Hi,
The grid density of compounds with complex structure are well overlapped. Because of it, it is difficult to generate such compounds using liGAN. So I would like to sharpen the grid density. Could you please tell me how to sharpen the grid density in Libmolgrid? Thanks a lot.

visualization of grid density of protein-ligand

Hi prof. Koes,

I am training a CNN model for predicting the binding pose. Also, I plan to visualize the generated grid density of the input protein-ligand 3d data.

If libmolgrid provides some function that we can directly use to visualize the grid density? or are there any source code available from gnina github repos?

Thank you!

"CUDA Error: invalid argument" when use molgrid.GridMaker in multi-GPU setting

Hi,

I am currently using the molgrid.GridMaker in our project to convert data to its grid representation. It works well in the single-GPU setting. But when I tried to use Distributed Data Parallel from pytorch to accelerate the training, the molgrid will throw a RuntimeError like this (with a 2 GPUs setting):

/opt/tiger/libmolgrid/src/grid_maker.cu:417: invalid argumentTraceback (most recent call last):
  File "main_align.py", line 225, in <module>
    main(args)
  File "main_align.py", line 82, in main
    run_ddp_train(args)
  File "/home/tiger/world/ddp_train_utils.py", line 150, in run_ddp_train
    join=True)
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/tiger/world/ddp_train_utils.py", line 135, in ddp_train_fn
    trainer.train(rank, ddp_model, train_dataset, valid_dataset, evaluator)
  File "/home/tiger/world/engine/trainer.py", line 327, in train
    loss, others = model(batch)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/distributed.py", line 619, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/tiger/world/models/grid_rigid_alignment_prediction.py", line 347, in forward
    grid = self.gmaker(coords, data.types, data.radii, data.batch, to_sparse=False)
  File "/home/tiger/world/models/grid_rigid_alignment_prediction.py", line 331, in gmaker
    grid = Coords2GridFunction.apply(self.grid_maker, (0, 0, 0), coords.to(self.device), types_onehot.to(self.device), radii.to(self.device)).to(self.device)
  File "/usr/local/lib/python3.7/dist-packages/molgrid-0.2.1.dev47+g2375f0b-py3.7.egg/molgrid/torch_bindings.py", line 80, in forward
    gmaker.forward(center, coords, types, radii, output)
RuntimeError: CUDA Error: invalid argument

In this code, the "self.grid_maker" is an instance of GridMaker, and "self.device" indicate the rank for each process. This code can work correctly if I change "self.device" to "cpu" (which means only run Coords2GridFunction on cpu and run the rest of the parts on different GPUs), but it will slow down the training. I cannot figure out what is wrong here. Would you like to provide some advice for me? Thanks!

Iterating over ExampleDataset

molgrid version: 0.5.1
Really minor bug report is that the ExampleDataset object is not safe to iterate over.

import molgrid
data_root = '...'
types_file = '...'  # with 3 members
examples = molgrid.ExampleDataset(data_root=data_root)
examples.populate(types_file)
for example in examples:
    print(example)

Out:

<molgrid.molgrid.Example object at 0x7f202c2f5e30>
<molgrid.molgrid.Example object at 0x7f202c33af30>
<molgrid.molgrid.Example object at 0x7f202c2f5e30>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-93-f0381f095000> in <module>
----> 1 for example in examples:
      2     print(example)

ValueError: Invalid index: 3 > 3

Currrent solution is to use range and index the object:

for i in range(len(examples)):
    print(examples[i])

As I said, very minor since the solution is trivial but I thought I'd report anyway.

Unsupported GPU architecture 'compute_75'

When I try to compile libmolgrid (on a Singularity container bootstrapped from nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04) I get the following error:

nvcc fatal   : Unsupported gpu architecture 'compute_75'

I might be wrong because I never dug deep into CUDA, but I believe that compute_75 requires CUDA 10; changing CMakeLists.txt from

set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Werror cross-execution-space-call,deprecated-declarations \
 -gencode arch=compute_35,code=sm_35  \
 -gencode arch=compute_50,code=sm_50 \
 -gencode arch=compute_60,code=sm_60 \
 -gencode arch=compute_70,code=sm_70 \
 -gencode arch=compute_75,code=sm_75 \
 ")

set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Werror cross-execution-space-call,deprecated-declarations \
 -gencode arch=compute_35,code=sm_35  \
 -gencode arch=compute_50,code=sm_50 \
 -gencode arch=compute_60,code=sm_60 \
 -gencode arch=compute_70,code=sm_70 \
 ")

get rid of the issue (and all tests non involving Python pass).

Shared Pointer Question

Hello,
I have question regarding your implementation. I was looking for library which generates examples of batches exactly like here. Mainly, I have problem implementing passing a std.shared_ptr to python class using boost python.
I compiled your library and tried to see how this can be done and when i executed this line:
x=molgrid.FileMappedGninaTyper(data+"/recmap")
#passing shared_ptr
e = molgrid.ExampleProvider(x,data_root=datadir+"/structs")
it throws this error:
TypeError: No registered converter was able to produce a C++ rvalue of type std::shared_ptrlibmolgrid::AtomTyper from this Python object of type FileMappedGninaTyper

I am using ubuntu default boost but this is exactly the problem which I am trying to solve in my code

@dkoes
@Jsunseri

errors during make gnina

While I am using CentOS7, anaconda3 and Boost1.67 and cuda-9.0, RDkit has been successfully compiled, and for gnina cmake, everything seems to be ok. When I perform this make command to compile gnina, there are two problems:

~/gnina/build/libmolgrid-prefix/src/libmolgrid/include/libmolgrid/atom_typer.h:13:10: fatal error: openbabel/elements.h: No such file or directory

The dependency target "pycaffe" of target "pytest" does not exist

Can anyone figure out where the problem is?

atom type values higher than 1 when resolution is 1

Hello, I understand that "atom type information is represented as a density distribution around the atom center" and I believe the function you are using is the one presented in this paper: https://doi.org/10.1021/acs.jcim.6b00740. Therefore, I'd expect the atom type values to be constrained between 0 and 1. This is true when the GridMaker has a resolution=0.5 and dimension=23.5. When I change to resolution=1.0 and dimension=23, I observe that some of the values in the grid/tensor are above 1.

When the resolution is 0.5, the max value is 1. When I change the resolution to 1.0 the max values are higher than one. Am I doing something wrong?

Thank you!

Here is a small example, my types file had 10 examples:

import molgrid
import torch

datadir = "data"
dname = "check_voxel.types"


d = molgrid.ExampleProvider(shuffle=False, balanced=False,
                            stratify_receptor=False, data_root=datadir)
d.populate(dname)

gmaker = molgrid.GridMaker(resolution=1.0, dimension=23, binary=False,
                           radius_type_indexed=False, radius_scale=1.0, gaussian_radius_multiple=1.0)
gmaker2 = molgrid.GridMaker(resolution=0.5, dimension=23.5, binary=True,
                           radius_type_indexed=False, radius_scale=1.0, gaussian_radius_multiple=1.0)

ddims = gmaker.grid_dimensions(d.num_types())
ddims2 = gmaker2.grid_dimensions(d.num_types())

batch_size = 1
dtensor_shape = (batch_size,) + ddims
dtensor_shape2 = (batch_size,) + ddims2

dinput_tensor = torch.zeros(dtensor_shape, dtype=torch.float32)
dinput_tensor2 = torch.zeros(dtensor_shape2, dtype=torch.float32)

for i in range(0,10):
    dbatch = d.next_batch(batch_size)
    gmaker.forward(dbatch, dinput_tensor, random_translation=0.0, random_rotation=False)
    gmaker2.forward(dbatch, dinput_tensor2, random_translation=0.0, random_rotation=False)

    print('res 1.0')
    print(torch.max(dinput_tensor))
    print('res 0.5')
    print(torch.max(dinput_tensor2))

Example outputs are:

res 1.0
tensor(2.0447)
res 0.5
tensor(1.)
res 1.0
tensor(2.0685)
res 0.5
tensor(1.)

Is it possible to know the set of examples next_batch extracts?

batch = e.next_batch(batch_size)
In the above code, is it possible to know exactly which set of examples the ExampleProvider object extracts?

Operating on pairs of inputs

Hey Guys, nice library! I've enjoyed using libmolgrid to train over protein-ligand binding affinities but I'd now like to operate over pairs of input PDBs (where each pair is assigned a single label). Do you have any pointers on how one might achieve this with libmolgrid while still making use of the structure cache – I am working with pytorch. Thanks in advance

Memory leak when using cache_mols

I noticed a lot of my jobs using Tensorflow with molgrid were getting OOM errors at about 6 hours runtime. I tracked the memory consumed by the python process:

This equates to about 3.5 gb/hour and my tensorflow models often take 24+ hours to run, resulting in an OOM error.

I isolated the leak to one line of code:

batch = e.next_batch(batch_size)

where e is a molgrid.ExampleProvider object. After rooting around in the docs, I found the 'cache_structs' option to be suspect, so I constructed my ExampleProvider with cache_structs=False, and the leak disappeared:

Perhaps an (configurable?) ceiling on the size of the cache would be useful?

pymolgrid test failing

The pymolgrid test, more specifically the test_make_vector_types_ex_provider in test_example_provider, fails after commit 176b37b:

11: =================================== FAILURES ===================================
11: ______________________ test_make_vector_types_ex_provider ______________________
11: 
11: capsys = <_pytest.capture.CaptureFixture object at 0x7f12b6a2c588>
11: 
11:     def test_make_vector_types_ex_provider(capsys):
11:         fname = datadir+"/ligonly.types"
11:         e = molgrid.ExampleProvider(molgrid.NullIndexTyper(),molgrid.defaultGninaLigandTyper, data_root=datadir+"/structs",make_vector_types=True)
11:         e.populate(fname)
11:         batch_size = 10
11:         b = e.next_batch(batch_size)
11:     
11:         gmaker = molgrid.GridMaker(dimension=23.5,radius_type_indexed=True)
11:         shape = gmaker.grid_dimensions(molgrid.defaultGninaLigandTyper.num_types())
11:         mgrid = molgrid.MGrid5f(batch_size,*shape)
11:     
11:         c = b[0].merge_coordinates()
11:         tv = c.type_vector.tonumpy()
11: >       assert tv.shape == (10,14) #no dummy type
11: E       assert (0, 0) == (10, 14)
11: E         At index 0 diff: 0 != 10
11: E         Use -v to get the full diff

In the previous commit c1749bd all tests pass.

Info

Tested within a Singularity container based on gnina Docker image:

Bootstrap: docker
From: dkoes/gnina

%setup

%environment

%post

    # Update
    apt-get -y update

    #Install essentials
    apt-get -y install vim

%runscript

    export SINGULARITYENV_CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES

%test

Memory leak using cache_structs

I noticed a lot of my jobs using Tensorflow with molgrid were getting OOM errors at about 6 hours runtime. I tracked the memory consumed by the python process:

This equates to about 3.5 gb/hour and my models often take 24+ hours to run, hence the OOMs.

I isolated the leak to one line of code:

batch = e.next_batch(batch_size)

where e is an ExampleProvider object. After rooting around in the docs, I found the 'cache_structs' option to be suspect, so I constructed my ExampleProvider with cache_structs=False, and the leak disappeared:

I do not know if this is a leak in the strictest sense or intended behaviour. Perhaps an (configurable?) ceiling on the size of the cache would be useful?

What is the difference between molgrid in libmolgrid and gnina?

Hi,

I would like to use this repository with great interest in your libmolgrid.

I want to ask what is the difference between molgrid in libmolgrid and gnina.
Is it okay to understand that molgrid in libmolgrid is implemented with reference to the function to generate grid in gnina?

And you read "Do not use" in the Readme.
Is it written in the meaning of not guaranteeing operation?
Is it possible to use it under an Apache license?

Thanks for your help.

Missing attribute version

molgrid (installed from PyPI) does not have a __version__ attribute, which might be useful in some cases.

cmake error

-- The C compiler identification is GNU 7.3.0
-- The CXX compiler identification is GNU 7.3.0
-- The CUDA compiler identification is NVIDIA 9.0.176
-- Check for working C compiler: /mnt1/lijinqiang/conda/env/gnina/bin/x86_64-conda_cos6-linux-gnu-cc
-- Check for working C compiler: /mnt1/lijinqiang/conda/env/gnina/bin/x86_64-conda_cos6-linux-gnu-cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /mnt1/lijinqiang/conda/env/gnina/bin/x86_64-conda_cos6-linux-gnu-c++
-- Check for working CXX compiler: /mnt1/lijinqiang/conda/env/gnina/bin/x86_64-conda_cos6-linux-gnu-c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found Git: /usr/bin/git
-- Current git revision is 57f737a
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda (found version "9.0")
-- Boost found.
-- Found Boost components:
regex;unit_test_framework;program_options;system;filesystem;iostreams
-- Found Open Babel include files at /usr/local/include/openbabel3
-- Found Open Babel library at /lib64/libopenbabel.so
Setting openbabel found TRUE
-- Found PythonLibs: /usr/lib64/libpython3.6m.so (found version "3.6.8")
-- Found PythonInterp: /mnt1/lijinqiang/conda/env/gnina/bin/python3.6 (found version "3.6.7")
CMake Error at /usr/local/boost/lib/cmake/Boost-1.72.0/BoostConfig.cmake:120 (find_package):
Found package configuration file:

/usr/local/boost/lib/cmake/boost_python-1.72.0/boost_python-config.cmake

but it set boost_python_FOUND to FALSE so package "boost_python" is
considered to be NOT FOUND. Reason given by package:

No suitable build variant has been found.

The following variants have been tried and rejected:

libboost_python27.so.1.72.0 (2.7, Boost_PYTHON_VERSION=3.6)
libboost_python27.a (2.7, Boost_PYTHON_VERSION=3.6)

Call Stack (most recent call first):
/usr/local/boost/lib/cmake/Boost-1.72.0/BoostConfig.cmake:185 (boost_find_component)
/usr/local/share/cmake-3.14/Modules/FindBoost.cmake:266 (find_package)
python/CMakeLists.txt:17 (find_package)

-- Configuring incomplete, errors occurred!

I use python3.6 and camke3.14 and boost1.72 , what is the problem? please help me

install error

Could you tell me how to install libmolgrid into python? Thank you! it's not work when I use "python setup.py install" to install.

running install
running build
running build_py
error: package directory '${CMAKE_CURRENT_BINARY_DIR}/molgrid' does not exist

Protein only

Hey David,

I am only interested in the protein channels but I would still like to center on the centroid of a ligand (or pseudo-ligand).

I have a working example where I throw away the final 14 channels but this seems like wasted calculation
I have also tried using the NullIndexTyper() for the ligand and while this does only calculate protein channels, it removes the ability to calculate the centroid (and therefore the voxels end up in the wrong place) – a fix for this could be to read the ligand file separately using rdkit and transform using gmaker.forward but I bet the library allows a smarter way!

What should my approach be here? Many thanks again

Segmentation fault with CoordinateSet in Python

When trying to explicitly build a CoordinateSet object in Python as follows

import molgrid
from openbabel import pybel

obmol = next(pybel.readfile("sdf", "benzene.sdf"))
obmol.addh()

cs = molgrid.CoordinateSet(obmol.OBMol)

I get a segmentation fault:

[1]    13710 segmentation fault (core dumped)  python mg.py

I also tried to pass obmol directly (PyBel molecule) and to pass a molgrid.GninaIndexTyper object as second argument, but always obtained the same error (or just a MemoryError).

OS: Ubuntu 18.04
conda environment:

name: test
channels:
  - conda-forge
  - pytorch
  - open3d-admin
dependencies:
  - python
  - ipython
  - pip
  - numpy
  - scipy
  - openbabel
  - open3d
  - pytorch
  - pip:
    - molgrid

Question about center of the grid

Hello,

I would just like to confirm that I understood something correctly. When using this forward method in GridMaker
forward((GridMaker)arg1, (Example)example, (Grid4f)grid[, (float)random_translation=0.0[, (bool)random_rotation=False]]) → None

it says in the documentation that the grid center will be the center of the last coordinate set before transformation. Will this always be the center of the ligand?

As a side note, this method in the documentation has an extra parameter
"param center:
grid center to use, if not provided will use center of the last coordinate set before transformation"

but the method doesn't have this parameter. Is the method wrong, or is the documentation wrong?

Thank you!

Segmentation fault

Hello again!

I have a problem in which the next() method of ExampleProvider is causing a Segmentation Fault. After digging a bit into it, I found out that this happens if matplotlib (or something else that matplotlib is importing) is imported before molgrid:

>>> import matplotlib
>>> import molgrid
>>>
>>> e = molgrid.ExampleProvider()
>>> e.populate('single.types')
>>>
>>> e.next()
Segmentation fault

This doesn't happen if the order of the imports changes:

>>> import molgrid
>>> import matplotlib
>>> 
>>> e = molgrid.ExampleProvider()
>>> e.populate('single.types')
>>>
>>> e.next()
<molgrid.molgrid.Example object at 0x7fee343d69b0>
>>>

I was just wondering if you have come by this issue or a similar one before.
Thank you!

nvcc warning : The -std=c++14 flag is not supported with the configured host compiler. Flag will be ignored.

Hi,

I'm working on CentOS7, cuda9.0, anaconda3.

While I am "make"ing gnina, I got such an error:

`
Scanning dependencies of target libmolgrid_shared

[ 1%] Building CXX object src/CMakeFiles/libmolgrid_shared.dir/libmolgrid.cpp.o

[ 3%] Building CXX object src/CMakeFiles/libmolgrid_shared.dir/atom_typer.cpp.o

[ 5%] Building CXX object src/CMakeFiles/libmolgrid_shared.dir/example.cpp.o

[ 7%] Building CXX object src/CMakeFiles/libmolgrid_shared.dir/exampleref_providers.cpp.o

[ 9%] Building CXX object src/CMakeFiles/libmolgrid_shared.dir/example_extractor.cpp.o

[ 10%] Building CXX object src/CMakeFiles/libmolgrid_shared.dir/example_provider.cpp.o

[ 12%] Building CXX object src/CMakeFiles/libmolgrid_shared.dir/grid_maker.cpp.o

[ 14%] Building CUDA object src/CMakeFiles/libmolgrid_shared.dir/grid_maker.cu.o

nvcc warning : The -std=c++14 flag is not supported with the configured host compiler. Flag will be ignored.

In file included from /usr/include/c++/4.8.2/array:35:0,
from /data/AI_projects/program/gnina/build/libmolgrid/include/libmolgrid/grid_maker.h:13,
from /data/AI_projects/program/gnina/build/libmolgrid/src/grid_maker.cu:1:
/usr/include/c++/4.8.2/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
#error This file requires compiler and library support for the
^
make[2]: *** [src/CMakeFiles/libmolgrid_shared.dir/grid_maker.cu.o] Error 1
make[1]: *** [src/CMakeFiles/libmolgrid_shared.dir/all] Error 2
make: *** [all] Error 2
`

It seems that -std=c++11 has to be assigned during the compilation of libmolgrid. During some research on the internet, somebody said that I can add something like CXXFLAGS=-std=c++11 in the CMakelists or in the "make" command.

However, neither of these attempts works, this error arises always during making. I also tried to install libmolgrid seperately, the same error is still there.

I am wondering whether I am missing something??

Question: access or enforce grid center in Gridmaker forward()

Hi,

This seems like a really neat tool. I have been experimenting along the lines of

maker  = molgrid.GridMaker(resolution=res, dimension=dimension)
provider = molgrid.ExampleProvider()
provider .populate('my file')
examples = provider .next_batch(batch_size)
_ = maker.forward(examples , input_tensor)

Might be a silly question but is it possible to obtain the center of the grid created with the forward action each step, and/or to enforce a consistent grid center across batches?

From the documentation I believe I might be able to implement this with Transform object, but I can't quite get the datatypes to match up passing molgrid.Transform() and example batches to maker.forward().

Any advice is greatly appreciated!

Best,
JP

Forces CUDA Device 0

The following occured when using molgrid with pytorch. When I set all of my pytorch tensors to cuda:1 I get an error when running gridmaker.forward

gmaker.forward(batch_1, input_tensor_1,random_translation=2.0, random_rotation=True)
RuntimeError: CUDA Error: invalid argument

input_tensor_1 was initialized to device cuda:1

Imports fail in "Grid single molecule" tutorial

Hi,

I am trying to follow the tutorial Grid single molecule and unfortunately the import of OpenBabel fails. I am not sure what I could be doing wrong since I'm using a fresh conda installation and it's the third line in the tutorial.

      1 import torch                                                                                                                              
      2 import molgrid                                                                                                                            
----> 3 from molgrid.openbabel import pybel as pybel                                                                                              
                                                                                                                                                  
ModuleNotFoundError: No module named 'molgrid.openbabel'

My installation was done with

conda create -n molgrid python=3.7
source activate molgrid
conda install -c conda-forge rdkit
conda install -c conda-forge openbabel
conda install -c jsunseri molgrid
conda install pytorch
conda install ipython

Many thanks in advance.

Using GninaVectorTyper creates 'NaN' values

When using the default GninaVectorTyper with ExampleProvider. The pytorch tensor created with gridmaker.forward has 'NaN' elements in multiple channels.

The following code produces input_tensor_1 with 'NaN' elements in multiple channels.

teste = molgrid.ExampleProvider(molgrid.GninaVectorTyper(),shuffle=True, duplicate_first=True,data_root='separated_sets/')
teste.populate(args.testfile)

gmaker = molgrid.GridMaker()
dims = gmaker.grid_dimensions(molgrid.GninaVectorTyper().num_types()*4)

tensor_shape = (batch_size,)+dims
input_tensor_1 = torch.zeros(tensor_shape, dtype=torch.float32, device='cuda')

batch_1 = test_data.next_batch(batch_size)
gmaker.forward(batch_1, input_tensor_1,random_translation=2.0, random_rotation=True)

Memory leak using CoordinateSet init with OBMol

Hi,

I am currently working on a project where I would like to grid multiple protein pockets (~10 000).
Using the Grid single molecule tutorial, I figured that it would be quick to use the CoordinateSet objects to create the grids, instead of using the ExampleProvider with the (gnina)types files. However, when I trained a model with these grids as input, my model reached OOM errors.

A snapshot of my code is this one:

gmaker = molgrid.GridMaker()
grid_dims = gmaker.grid_dimensions(molgrid.defaultGninaReceptorTyper.num_types())
#pdb_ids is a list of ~ 16 pdb_ids in a batch given to a model
mols = [next(pybel.readfile("pdb", f"../v2019-other-PL/{pdb_id}/{pdb_id}_pocket.pdb")) for pdb_id in pdb_ids] 
coord_set = [molgrid.CoordinateSet(mol, molgrid.defaultGninaReceptorTyper) for mol in mols]
batch_dims = (len(coord_set), *grid_dims)
batch_grid = torch.zeros(batch_dims, dtype=torch.float32).to(self.device)
            
for i in range(batch_grid.shape[0]) :
    gmaker.forward(coord_set[i].center(), coord_set[i], batch_grid[i])

I traced back the memory leak to be depending on the CoordinateSet initialization line. I created a script to show that the memory usage can be reproduced using the tutorial molecule. Creating 100 000 times the CoordinateSet for the sdf takes 1.5 GB of RAM, which is fine if the dataset contains i.e. 100 000 small molecules. However, my dataset contains 10 000 protein pockets, which leads to larger memory leaks that prevents me to run the model for multiple epochs.

import molgrid
import os
import psutil
from molgrid.openbabel import pybel as pybel
import matplotlib.pyplot as plt

memory_usages = []
for _ in range(100000) :
    
    #sdf is the sdf molecule embedded as a string in the tutorial
    mol = pybel.readstring('sdf',sdf)
    coord_set = molgrid.CoordinateSet(mol)

    usage = psutil.Process(os.getpid()).memory_info().rss / 1024 ** 2
    memory_usages.append(usage)
    
plt.plot(memory_usages)
plt.title('Creating CoordinateSet from the OBMol for a single molecule 100 000 times')
plt.xlabel('Iteration')
plt.ylabel('Memory usage (MB)')
plt.savefig('memory_leak_coordinate_set', bbox_inches='tight')

A potential fix would be to store the CoordinateSets for all my protein pockets (in a python dict or in pickle files), but I think it would be easier if we were able to compute it on the fly at each train iteration.

Do you know if you can fix this memory leak ? If not, do you have another quick and easy code to grid proteins on the fly, or should I just store the CoordinateSets ?

I use molgrid v0.5.1 installed with pip

Thanks a lot

What is the input format?

Could you describe what is the input format? The documentation is quite scarce. I have only found this tutorial which gets the data from datadir = os.getcwd() +'/data'. But there are not many instructions beyond that.

Let's say I have a PDB file with a receptor. Or a ligand in a certain conformation, represented as a RDKit molecule object. How could I get a Numpy array or a PyTorch tensor containing a voxelized representation of them?

Thanks a lot in advance.

OpenBabel 3

The OpenBabel master branch now contains the alpha release of OpenBabel 3, therefore FindOpenBabel2.cmake does not work anymore.

I'm working on a solution, but it's taking more time than expected...

Does libmolgrid allows pytorch distribtued data parallel?

Fix radius multiple bug

We allow radius multiple to be set to other than 1.5, but the math for calculating the switchover from Gaussian to quadratic is for a fixed value.

Truncated Gaussian if multiple <= 1.0

Behavior for only ligands in types file and ExampleProvider

Hi,

From what I understood, when I have a types file with protein-ligand pairs (3rd and 4th columns) and I use this types file to populate an ExampleProvider, each example will have 28 channels. The first 14 are the defaultGninaReceptorTyper and the next 14 are the defaultGninaLigandTyper.

What is the behavior when I have a types file that only contains a ligand (3rd and last column)? My guess is that the channels will be the defaultGninaReceptorTyper.

Since I only have a ligand, I will only have 14 channels ddims = gmaker.grid_dimensions(14) but how can I make sure that the ligand channels in the Example will be the defaultGninaLigandTyper ?

Thank you!

problem with type conversion?

In the below, I get an error if I don't explicitly cast to Grid4f, but this isn't an issue if types is 2D (type_vector instead of index). Doesn't make any sense to me and don't have time to track it down right now...

import molgrid
import torch
import numpy as np

gmaker = molgrid.GridMaker(resolution=0.5,dimension=3.0)

coords = torch.zeros(1,3,dtype=torch.float32)
types = torch.zeros(1,dtype=torch.float32)
radii = torch.ones(1,dtype=torch.float32)
outgrid = torch.zeros(*gmaker.grid_dimensions(1),dtype=torch.float32)

gmaker.forward((0,0,0),coords,types,radii,molgrid.Grid4f(outgrid))

Implement support for pickling objects

does libmolgrid automatically center the grid around ligand?

Hi,

I want to predict the binding pose using the method described your CNN paper. But I want to ask how to make the receptor and ligand grid centered around the binding pocket? or libmolgrid automatically center the grid around ligand?

Thank you.

Can't use ExempleProvider.populate()

I'm trying to load .pdb files using molgrid.ExampleProvider().

This is the code I wrote:

import molgrid
import os

thisdir = os.getcwd() + '/'
fname = thisdir + 'bind.types'

exprovider = molgrid.ExampleProvider(data_root=thisdir)
exprovider.populate(fname)

I'm getting this error: ValueError: Missing molecular data in line: 1 9.278 a2weg.pdb

My directory setup is like this:

|-- molvox/
    |-- a2weg.pdb
    |-- bind.types
    |-- molvox.ipynb

And the .types file I'm using has just one line:
1 9.278 a2weg.pdb

What's wrong? Thank you

Could NOT find Boost (missing: python37) [cmake 3.15, boost 1.67.0]

I'm building everything from source on CentOS 7 and encounter the following error when I run cmake for libmolgrid: -

-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda-10.2 (found version "10.2") 
-- Found Boost: /opt/boost/include (found version "1.67.0") found components:  regex unit_test_framework program_options system filesystem iostreams 
-- Found PythonLibs: /usr/local/lib/libpython3.7m.so (found version "3.7.5") 
-- Found PythonInterp: /usr/bin/python3 (found version "3.7.5") 
CMake Error at /usr/local/share/cmake-3.15/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
  Could NOT find Boost (missing: python37) (found version "1.67.0")
Call Stack (most recent call first):
  /usr/local/share/cmake-3.15/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  /usr/local/share/cmake-3.15/Modules/FindBoost.cmake:2161 (find_package_handle_standard_args)
  python/CMakeLists.txt:12 (find_package)

I believe I have installed boost with python but I'm perplexed by the error. I've not written C/C++ for a long time and although I don't fully understand the cmake syntax I think I understand the basic. It's the error that is not that clear to me. Does it mean cmake has found boost and python but does not think boost is built with python (which I believe it is)?

My environment is: -

centos 7
cmake 3.15.5
gcc 8.3.1
python 3.7.5 (installed from source)
boost 1.67.0 (from source)
openbabel 3.0.0 (from source)

If I nobble (comment-out) the offending line from python//CMakeLists.txt, i.e. if I remove:-

find_package( Boost COMPONENTS system filesystem python${PYTHON_VERSION_MAJOR}${PYTHON_VERSION_MINOR} REQUIRED )
``

...then `cmake` completes and I am able to successfully build `libmolgrid` (although this is clearly wrong).

1. What have I not done to cause this error? Or...
1. What `-D` options can I use to safely progress through `cmake`?

gnina / libmolgrid Goto Github PK

libmolgrid's Introduction

Help

Citation

Docker

Installation

Ubuntu 22.04

Usage

CNN Scoring

Training

License

libmolgrid's People

Contributors

Stargazers

Watchers

Forkers

libmolgrid's Issues

Info

Recommend Projects

Recommend Topics

Recommend Org

Jobs