GithubHelp home page GithubHelp logo

arikguitarik / deepmol Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 2.0 258 KB

Deep Learning-Based Property Prediction and Generation of Molecules

License: MIT License

Python 89.96% Jupyter Notebook 10.04%
deep-learning quantum-chemistry molecular-design vae machine-learning neural-networks

deepmol's Introduction

DeepMol – Deep Learning-Based Property Prediction and Generation of Molecules

A TensorFlow implementation of deep learning models for property prediction and generation of molecules, featurized by atom types and molecular geometry, trained on the QM9 data set. Written for my master's thesis at Freie Universität Berlin.

Requirements
  • tensorflow
  • sklearn
  • numpy, scipy
  • rdkit
  • pybel, openbabel
Data Set

A shuffled and partitioned version of the data set that is ready-to-use for this project can be downloaded here.

Message Passing Neural Network

Property Prediction is performed with a message passing neural network (MPNN, see Neural Message Passing for Quantum Chemistry):

  • A hidden state vector is assigned to each atom.
  • Based on their interatomic distance, the atoms "pass messages" to each other via continuous convolutions.
  • A Gated Recurrect Unit is used to update the hidden states based on the incoming messages.
  • After multiple message passes, a read-out function (based on an LSTM and an attention mechanism) creates a vector embedding of the molecule which can be mapped to the respective property.

With implicit hydrogen and on a test set of 10k molecules, the model achieved a mean absolute error of 0.42kcal/mol on atomization energy.

Use train_mpnn.py to run trainings and evaluations.

Molecular Variational Autoencoder

A message passing-based encoder is extended by a fully-connected neural network decoder (see figure below).

When using it as an autoencoder, the model reconstructs heavy atoms with an accuracy of 99.9% and atom positions with a mean absolute error of 0.05Å.

When sampling molecules from the latent prior in VAE mode, the atoms have a mean absolute deviation of 0.4Å from their position after a DFT-based geometry optimization.

Loss
  • atom types: cross entropy
  • distances: mean squared error, weighted by the probability that the involved atoms exist in both molecules
  • geometry: an additional term penalizes violations of the triangle inequality. It did not turn out to improve model performance though
Evaluation
  • reconstruction accuracy of atom types
  • reconstruction error of atom positions
  • SMILES accuracy: for original and reconstruction, infer bonds, generate SMILES and compare
  • generated mols: DFT-based geometry optimization is performed using orca (see respective notebook)

Use train_vae.py to run trainings and evaluations. To use the model as an autoencoder, set the parameter variational to false.

Architecture of the MolVAE

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.