GithubHelp home page GithubHelp logo

schrodinger / gpusimilarity Goto Github PK

View Code? Open in Web Editor NEW
94.0 19.0 26.0 490 KB

A Cuda/Thrust implementation of fingerprint similarity searching

License: BSD 3-Clause "New" or "Revised" License

C++ 45.05% Cuda 16.74% Python 29.15% HTML 1.21% CMake 6.35% Dockerfile 1.33% C 0.18%
gpu chemistry cheminformatics similarity-analysis

gpusimilarity's Introduction

gpusimilarity

A brute-force GPU implementation of chemical fingerprint similarity searching. Its intended use is to be kept alive as a service with an entire library loaded into graphics card memory. It has python scripts included which use RDKit to generate fingerprints, but the C++/Cuda backend are agnostic to the data once it's been created.

Architecture and benchmarks were presented in a presentation at the 2018 RDKit European UGM.

Incentive Version

The commercial GPUSimilarity product ("FPSim GPU") with additional enhancements, maintenance and support is available from Schrödinger. Enhancements to the incentive version will be periodically merged into the open source version, similar to Incentive PyMOL.

Basic Benchmark

On a machine with four Tesla V100, searching one billion compounds takes ~0.2 seconds.

See RDKit Presentation for much more in depth benchmarks (that are slightly out of date).

Example integration

Here is a video of this backend being utilized for immediate-response searching inside Schrödinger's LiveDesign application:

GPUSimilarity Gadget

Using GPUSimilarity

It is highly recommended that you use docker for building/running.

See Our Docker Readme

Dependencies for Building (recommended only for development)

  • RDKit (At Python level, not compilation)
  • Qt 5.2+ (including QtNetwork)
  • PyQt
  • Cuda SDK, CUDACXX env variable pointing to nvcc
  • cmake 3.10.2+
  • C++11 capable compiler
  • Boost test libraries
  • Optional: Doxygen for generating documents

Building with CMake and running unit tests with CTest

Recommended only for development, see Docker

From parent directory of source:
mkdir bld
cd bld
ccmake ../gpusimilarity
make -j5
ctest

If Cuda, boost or doxygen are not found, start ccmake with the following options:

ccmake -DCMAKE_CUDA_COMPILER=/path/to/nvcc -DBOOST_ROOT=/path/to/boost/directory -DDOXYGEN_EXECUTABLE=/path/to/doxygen

Generate the documentation

Install doxygen on system

make doc_doxygen

The result is in bld/doc/html

Running

Recommended only for development, see Docker

For basic json-response http endpoint:

From build directory: python3 ${SRC_DIR}/python/gpusim_server.py <fingerprint fsim file>

For testing (insecure):

From build directory: python3 ${SRC_DIR}/python/gpusim_server.py <fingerprint fsim file> --http_interface

For generating databases:

Easiest from rdkit conda with pyqt installed:

From source python directory: python3 gpusim_createdb.py <input smi.gz file> <fingerprint fsim file>

For debugging Cuda server, avoiding python/http server altogether:

From build directory:
./gpusimserver <dbname>.fsim
python3 python ${SRC_DIR}/python/gpusim_search.py <dbname>

Note: No .fsim extension is used for gpusim_search.py

This may be useful to determine if the backend is having Cuda/GPU problems.

gpusimilarity's People

Contributors

christgau avatar francoisbertelatschrodinger avatar greglandrum avatar lilleswing avatar lorton avatar sunhwan avatar torcolvin avatar vbabin avatar wisakedjack avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpusimilarity's Issues

Can't get results for search query

Hello,

I'm trying to launch gpusimilarity server through pre-built docker image. I'm able to create database and launch the server, but when I submit query I see no results in browser and the following message in the terminal:

Processing request 112620330
QObject: Cannot create children for a parent that is in a different thread.
(Parent is QNativeSocketEngine(0x2c7d610), parent's thread is QThread(0x2696c80), current thread is QThread(0x7f4c74016f40)
QSocketNotifier: Can only be used with threads started with QThread
Unknown database  "default"  requested.
Search completed, time elapsed: 0

I would appreciate any ideas on why this is happening and how to fix it.

Thanks!

Port fingerprint folding to GPU

Right now the code is relatively GPU-friendly, but only runs on the CPU. It takes about 1 minute per 10 million compounds at startup, so a 1 billion compound database would take almost 2 hours. We could likely speed it up ~100x by moving to the GPU.

Support multi-GPU setups

We should split up the GPU database across all available GPUs, and perform the searches in parallel and aggregate their results.

error in make

Hello Iorton,

I am following the instructions to build gpusimilarity. Everything seems fine until the last step. When I ran make -j5, I got the following error messgae:

[ 11%] Building CXX object CMakeFiles/fastsim.dir/fastsim.cpp.o
[ 22%] Building CUDA object CMakeFiles/fastsim.dir/fingerprintdb_cuda.cu.o
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
[ 33%] Linking CXX static library libfastsim.a
[ 33%] Built target fastsim
[ 44%] Building CXX object CMakeFiles/fastsimserver.dir/main.cpp.o
[ 55%] Linking CUDA device code CMakeFiles/fastsimserver.dir/cmake_device_link.o
[ 66%] Linking CXX executable fastsimserver
/home/jizhou/install/anaconda/lib/libQt5Core.so.5.9.5: undefined reference to __cxa_throw_bad_array_new_length@CXXABI_1.3.8' /home/jizhou/install/anaconda/lib/libQt5Network.so.5.9.5: undefined reference to operator delete(void*, unsigned long)@CXXABI_1.3.9'
/home/jizhou/install/anaconda/lib/libQt5Network.so.5.9.5: undefined reference to `operator delete[](void*, unsigned long)@CXXABI_1.3.9'
collect2: error: ld returned 1 exit status
make[2]: *** [fastsimserver] Error 1
make[1]: *** [CMakeFiles/fastsimserver.dir/all] Error 2
make: *** [all] Error 2

I did some research on this, but not able to find the solution. Do you have any suggestion?

Thanks

Merging multiple databases not working

Hey @lorton

I tried to launch a search over a huge database and used gpusim_mergedb.py to process the database files in parallel. However, merging didn't work. It created empty files. After a little bit of digging into the problem, I found the cause.

The gpusim_createdb.py writes 4 values to the top of each database

    qds.writeInt(DATABASE_VERSION)
    qds.writeString(args.dbkey.encode())
    qds.writeInt(gpusim_utils.BITCOUNT)
    qds.writeInt(count)

However, gpusim_mergedb.py read (and then writes to a merged fsim file) only 3: everything except for dbkey. First reading the dbkey for each database and then writing it to a merged file solves the problem.

gpu server launching issue

Hi,

Im facing issue while issuing gpu server command, please see the below snap for details. Can you please help me out here.

image

Regards,
Suneel

Structure fingerprint not matching

For some reason ZINC92580503 (CCCc1c2c(n(n1)C)n(c(n2)N)c3ccccc3) isn't matching against a current build of the database using default settings (Morgan fingerprint) - it's only coming back as a 68% match.

Through a good deal of testing, this seems to be quite rare or entirely unique to this compound, but it is very worrisome.

Get building working for open source version

Original version was written inside the Schrödinger build system, the open source version will need a version that can compile more generically.

The obvious known dependencies are:
Cuda, Qt, C++11, Python3, PyQt

Rearchitect code so single backend is aware of all databases

Right now we wrap many different instances of the GPUSimilarity backend in the Python script fastsim_server.py, then submit searches to each instance of the backend. This makes loading up multiple databases that all need to be shrunk impossible, so we need to rearchitect so that a single Cuda backend knows about and searches all the FingerprintDBs itself.

Allow searching on data sets that don't fit in GPU memory

GPUSimilarity should support searching multi-billion compound sets, which won't fit in GPU memory. We should figure out an efficient tactic for handling this.

NOTE: It's fair to expect the entire data set to fit in system memory.

'qInfo' was not declared in this scope

Hello,

We are trying to install the gpusimilarity on a Centos 6 machine. We followed these steps:

From parent directory of source:
mkdir bld
cd bld
ccmake ../gpusimilarity
make -j5
ctest

When we do "make -j5", an error occurred with this error message, "'qInfo' was not declared in this scope". We tried many ways to solve the problem but failed. Do you have any suggestion?

Thanks

building error

I have tried to install this tool and failed. I have rebuilt all the dependencies successfully but failed again:
Installed so far:
gcc-8.1.0
Python 3.7.0 with numpy, sip, pyqt5, rapidjson, simplejson
cmake-3.12.2
boost 1.67.0
cairo1.15.12
non-python linked rapidjson1.1.0
non-python linked eigen3.3
rdkit
Qt5.10.1
And
/home/user/progs/cmake/bld/bin/ccmake
-DCMAKE_INSTALL_PREFIX:PATH=/home/user/progs/gpusimilarity/bld
-DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
-DBoost_DIR=/home/user/progs/boost/boost
-DBoost_INCLUDE_DIR=/home/user/progs/boost/boost
-DBoost_LIBRARY_DIR_DEBUG=/home/user/progs/boost/boost/stage/lib
-DBoost_LIBRARY_DIR_RELEASE=/home/user/progs/boost/boost/stage/lib
-DBoost_UNIT_TEST_FRAMEWORK_LIBR=/home/user/progs/boost/boost/stage/lib/libboost_unit_test_framework.so
-DSLURM_SBATCH_COMMAND=/usr/local/slurm/bin/sbatch
-DSLURM_SRUN_COMMAND=/usr/local/slurm/bin/srun
-DCMAKE_CXX_COMPILER=/home/user/progs/gcc/bld/bin/g++
-DCMAKE_CXX_COMPILER_AR=/home/user/progs/gcc/bld/bin/gcc-ar
-DCMAKE_CXX_COMPILER_RANLIB=/home/user/progs/gcc/bld/bin/gcc-ranlib
-DQt5Core_DIR=/home/user/progs/qt5/bld/lib/cmake/Qt5Core
-DQt5Network_DIR=/home/user/progs/qt5/bld/lib/cmake/Qt5Network
-DQt5Concurrent_DIR=/home/user/progs/qt5/bld/lib/cmake/Qt5Concurrent
/home/user/progs/gpusimilarity/
ccmake finished successfully.
When typing: make
[ 45%] Built target gpusim
[ 54%] Linking CXX executable gpusimserver
libgpusim.a(gpusim.cpp.o): In function gpusim::GPUSimServer::GPUSimServer(QStringList const&, int)': gpusim.cpp:(.text+0x88f): undefined reference to std::invalid_argument::invalid_argument(char const*)'
libgpusim.a(gpusim.cpp.o): In function gpusim::GPUSimServer::extractData(QString const&, int&, int&, std::vector<std::vector<char, std::allocator<char> >, std::allocator<std::vector<char, std::allocator<char> > > >&, std::vector<char*, std::allocator<char*> >&, std::vector<char*, std::allocator<char*> >&)': gpusim.cpp:(.text+0xe04): undefined reference to std::runtime_error::runtime_error(char const*)'
collect2: error: ld returned 1 exit status
make[2]: *** [gpusimserver] Error 1
make[1]: *** [CMakeFiles/gpusimserver.dir/all] Error 2
make: *** [all] Error 2

When typing make -j5:
[ 63%] Linking CXX executable test_gpusim
CMakeFiles/test_gpusim.dir/test_gpusim.cpp.o: In function boost::basic_wrap_stringstream<char>::basic_wrap_stringstream()': test_gpusim.cpp:(.text._ZN5boost23basic_wrap_stringstreamIcEC2Ev[_ZN5boost23basic_wrap_stringstreamIcEC5Ev]+0x19): undefined reference to std::__cxx11::basic_ostringstream<char, std::char_traits, std::allocator >::basic_ostringstream(std::_Ios_Openmode)'
test_gpusim.cpp:(.text._ZN5boost23basic_wrap_stringstreamIcEC2Ev[_ZN5boost23basic_wrap_stringstreamIcEC5Ev]+0x2b): undefined reference to `std::__cxx11::basic_string<char, std::char_traits, std::allocator >::basic_string()'
...

Can you suggest me something what should I do?

Best regards,

Peter Pogany

Switch CPU sorting from home-spun bubble sort

We are currently using a home-spun bubble sort that had guaranteed time (return_count * N, where return count is very small). Now that we're allowing users to define return count, and they could potentially define it large, this implementation is bad.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.