GithubHelp home page GithubHelp logo

dmasif's Introduction

dMaSIF - Fast end-to-end learning on protein surfaces

Method overview

Abstract

Proteins’ biological functions are defined by the geometric and chemical structure of their 3D molecular surfaces. Recent works have shown that geometric deep learning can be used on mesh-based representations of proteins to identify potential functional sites, such as binding targets for potential drugs. Unfortunately though, the use of meshes as the underlying representation for protein structure has multiple drawbacks including the need to pre-compute the input features and mesh connectivities. This becomes a bottleneck for many important tasks in protein science.

In this paper, we present a new framework for deep learning on protein structures that addresses these limitations. Among the key advantages of our method are the computation and sampling of the molecular surface on-the-fly from the underlying atomic point cloud and a novel efficient geometric convolutional layer. As a result, we are able to process large collections of proteins in an end-to-end fashion, taking as the sole input the raw 3D coordinates and chemical types of their atoms, eliminating the need for any hand-crafted pre-computed features.

To showcase the performance of our approach, we test it on two tasks in the field of protein structural bioinformatics: the identification of interaction sites and the prediction of protein-protein interactions. On both tasks, we achieve state-of-the-art performance with much faster run times and fewer parameters than previous models. These results will considerably ease the deployment of deep learning methods in protein science and open the door for end-to-end differentiable approaches in protein modeling tasks such as function prediction and design.

Hardware requirements

Models have been trained on either a single NVIDIA RTX 2080 Ti or a single Tesla V100 GPU. Time and memory benchmarks were performed on a single Tesla V100.

Software prerequisites

Scripts have been tested using the following two sets of core dependencies:

Dependency First Option Second Option
GCC 7.5.0 8.4.0
CMAKE 3.10.2 3.16.5
CUDA 10.0.130 10.2.89
cuDNN 7.6.4.38 7.6.5.32
Python 3.6.9 3.7.7
PyTorch 1.4.0 1.6.0
PyKeops 1.4 1.4.1
PyTorch Geometric 1.5.0 1.6.1

Code overview

Usage:

  • In order to train models, run main_training.py with the appropriate flags. Available flags and their descriptions can be found in Arguments.py.

  • The command line options needed to reproduce the benchmarks can be found in benchmark_scripts/.

  • To make inference on the testing set using pretrained models, use main_inference.py with the flags that were used for training the models. Note that the --experiment_name flag should be modified to specify the training epoch to use.

Implementation:

  • Our surface generation algorithm, curvature estimation method and quasi-geodesic convolutions are implemented in geometry_processing.py.

  • The definition of the neural network along with surface and input features can be found in model.py. The convolutional layers are implemented in benchmark_models.py.

  • The scripts used to generate the figures of the paper can be found in data_analysis/.

License

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Reference

Sverrisson, F., Feydy, J., Correia, B. E., & Bronstein, M. M. (2020). Fast end-to-end learning on protein surfaces. bioRxiv.

dmasif's People

Contributors

freyrs avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.