GithubHelp home page GithubHelp logo

aimnetnse's Introduction

AIMNet-NSE: Prediction of energies and spin-polarized charges with neural network potential

This repository contains supplementary data and code for the manuscript

"Teaching a neural network to attach and detach electrons from molecules" by Roman Zubatyuk, Justin S. Smith, Benjamin T. Nebgen, Sergei Tretiak, Olexandr Isayev https://www.nature.com/articles/s41467-021-24904-0

Models

The models directory contains JIT-compiled Pytorch AIMNet-NSE trained models . Five models were trained on 80/20 cross-validation splits of the training dataset. It is advised to use average prediction of these 5 models to get the most accurate results. The model was trained for neutral and ion-radical states of non-equilibrium conformations of organic molecules containing {H, C, N, O, F, Si, P, S, Cl} elements. Given molecular conformation and charge state, it predicts PBE0/ma-def2-SVP energies and NBO spin-polarized partial charges, as well as derived properties, such as ionization potential, electron affinity, Fukui functions, electronegativity, hardness, etc.

The models could be loaded with the torch.jit.load function. As an input, they accept dingle argument of type Dict[str, Tensor] with following data:

coords: shape (m, n, 3) - atomic coordinates in Angstrom 
numbers: shape (m, n) - atomic numbers
charge: shape (m, 2) - total alpha and beta molecular charges

For the convenience, eval.py script has a function to convert charge and multiplicity to the total alpha and beta molecular charges as: ab_charges = 0.5 * torch.stack([charge - mult + 1, charge + mult - 1])

Test datasets

The Ions-16 and ChEMBL-20 datasets are available at http://doi.org/10.5281/zenodo.5007980

The datasets contain PBE0/ma-Def2-SVP energies and NBO atomic charges for the non-equilibrium conformers of neutral organic molecules randomly sampled from PubChem database (Ions-16) and B97-3c optimized conformations of neutral organic molecules randomly sampled from ChEMBL database (ChEMBL-20). The number in the dataset name corresponds to the maximum number of non-hydrogen atoms in the molecules. The dataset which was used for training the AIMNet-NSE model contains molecules up to 12 non-H atoms, whereis ons-16 and ChEMBL-20 contain the molecules with 13 non-hydrogen atoms or more.

The datasets formatted as HDF5 files. Data group names have format as _???, where ??? corresponds to the number of atoms in molecules. Each group contain data for M molecules, each having N atoms. The groups contain following datasets:

Name Data type Shape Description
mol_id S24 M Molecule ID
coord float32 M, N, 3 Cartesian coordinates, Å
numbers uint8 M, N Atomic numbers
charge int8 M Molecular charge
mult uint8 M Spin multiplicity
energy float64 M PBE0/ma-def2-SVP energy, eV
charges float32 M, N, 2 α and β NBO charges

The molecule ID is a hash of molecular conformation. In each group, there are up to 3 entries with the same mol_id value, but with different charge. Those correspond to neutral, cation-radical and anion-radical states.

Test datasets could be evaluated with AIMNet-NSE model wth eval.py script:

python eval.py test_datasets/chembl20.h5 models/aimnet-nse-cv?.jpt

Inference script

usage: eval_mols.py [-h] [--models MODELS [MODELS ...]] [--in-file [IN_FILE]]
                    [--out OUT] [--allow-charged]

optional arguments:
  -h, --help            show this help message and exit
  --models MODELS [MODELS ...]
  --in-file [IN_FILE]   Multi-molecule input file. Extension should be an
                        acceptable to OpenBabel file type.
  --out OUT             Output multi-line JSON file with computed properties.
  --allow-charged       Skip check for molecule neutral charge. Useful for
                        reading XYZ files, e.g. when OpenBabel guess for
                        molecular charge is wrong.

The script reads several files with compiled models and constructs an ensembled AIMNet-NSE model. For each molecule in the in-file it calculates a set of properties and writes a json-formatted dict to the out file (stdout by default). The output keys are the following: energy, charges, ip, ea, f_el, f_nuc, f_rad, chi, eta, omega, omega_el, omega_nuc, omega_rad. The units are eV and e.

aimnetnse's People

Contributors

zubatyuk avatar

Stargazers

Santiago Vargas avatar Mingi Kang avatar Feitong Song avatar  avatar Oliver Kanders avatar John Chodera avatar Zhou Dingyi avatar Shang Zhu avatar Mikhail Andronov avatar  avatar Jin Xiao avatar Jonas Verhellen avatar Nikita avatar Raimondas Galvelis avatar  avatar  avatar Zhen(Jack) Liu  avatar Nathanael Kusanda avatar Adam Moyer avatar Rafael avatar Geyan Ye avatar Jinze Xue avatar Jeff Yarger avatar  avatar  avatar Rocco Meli avatar Kevin M Jablonka avatar Gabe Gomes avatar Arindam Das avatar Ishan Gupta avatar STYLIANOS IORDANIS avatar Ahmet Sarıgün avatar Omer Tayfuroglu avatar Qiancheng Xia avatar Geoff Hutchison avatar Justin Smith avatar Leela S. Dodda avatar Roxana Noelia Villafañe avatar Olexandr Isayev avatar

Watchers

James Cloos avatar Olexandr Isayev avatar  avatar Maria Korshunova avatar  avatar

aimnetnse's Issues

ASE Calculator

Hi, everyone.

I'm writing this issue post to request guidance on using AIMNetNSE with the Atomic Simulation Environment (ASE). Specifically, I'm creating a calculator based on the one written for AIMNet. I'd like to know if there's an existing ASE calculator I could use or if anyone has any tips on going about this process. I'd be happy to collaborate with others who are also working on this.

Thanks in advance for your help!

License

Hello,

Could you add a license to this repo, or let us know what the license is?

Thanks,
Adam

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.