GithubHelp home page GithubHelp logo

wutobias / r2z Goto Github PK

View Code? Open in Web Editor NEW
12.0 1.0 0.0 1.8 MB

A python class for building a ZMatrix from a RDKit molecule. We can also do coordinate transformations between ZMatrix and Cartesian space!

License: MIT License

Python 100.00%

r2z's Introduction

Z-matrix conversion with RDkit molecules

With this little python implementation, RDkit molecules can be converted to a Z-Matrix topology representation. If one has Cartesian coordinates for the RDKit molecule, one can also generate the Z-Matrix coordinates or convert the Z-Matrix coordinates back to Cartesian coordinates. During all conversions, one easily can make a mistake by using wrong units. Therefore, the openmm package is used for dealing with units and enforcing the correct ones during every step.

Coordinate transformations

The transformation from Cartesian coordinates to ZMatrix coordinates is straightforward. However, the back conversion is more tricky and somewhat error prone. One difficulty in this context is to avoid numerical instabilities (for instance through round-offs), since these might propagate through the Z-Matrix during the conversion. In order to circumvent this, I implemented the Natural Extension Reference Frame algorithm, which minimizes these errors. The algorithm is described in detail in these two articles https://doi.org/10.1002/jcc.20237 and https://doi.org/10.1002/jcc.25772

A general remark on transformations is that the conversion from ZMatrix coordinates to Cartesian coordinates requires 3 reference Cartesian coordinates (e.g. coordinates in a host or protein molecule) and 3 torsion angles, 2 angles and 1 bond length with respect to these coordinates. These coordinates are also called 'virtual coordinates'. For instance, let's say we have the Zmatrix coordinates for a molecule (e.g. a ligand) with 5 atoms: A-B-C-D-E and we want to convert them back to Cartesian space. First, we need 3 Cartesian coordinates X, Y, and Z in the reference frame, which can be basically anything in the lab coordinate system, as long as it is well defined. Second, we need the torsion angles X-Y-Z-A, Y-Z-A-B, Z-A-B-C, the angles Y-Z-A, Z-A-B and the bond length Z-A. By convention, the Cartesian coordinates of our molecule A-B-C-D-E will be shifted to the position of Z. The calculation of all these virtual coordinates can be carried out with this python class.

Requirements

  • pint
  • rdkit
  • numpy

Examples

Different examples for how to use this python class can be found in examples.

Remark

The algorithm is quite inefficient for large molecules. Peptides are still ok, but computing proteins might take a while. If the RDkit molecule object contains more than one molecule, you won't be able to build the ZMatrix. In case you want to process more than one molecule (e.g. a host guest complex), either add a bond between them or build the Z matrices for each molecule separately. The second option is the cleanest procedure and would require to build a reference frame from the first molecule that is used for transformations of the second one.

r2z's People

Contributors

wutobias avatar

Stargazers

Sonu Kumar avatar  avatar PilsunYoo avatar  avatar Oscar Wu avatar Sevy Harris avatar Dmytro Yehorov avatar Sul  avatar Kexin Huang avatar Mykola Bordyuh avatar Matt Thompson avatar Jeff Wagner avatar

Watchers

 avatar

r2z's Issues

error when running r2z (AttributeError: 'numpy.ndarray' object has no attribute 'in_units_of')

Hi Tobias, thanks for this package. I am new to this and just testing to see if I can get it to work with psi4. My initial attempt was using a google colab notebook (see attached notebook to reproduce the issue, this is essentially your zmatrix-demo notebook with very minimal tweaks).

Because r2z is not available as a python package to install using pip or conda, I just copied all functions in zmatrix.py and z_helpers.py into a cell. Running this cell before running code from the examples provides access to these functions. No need to then run the specific imports as in the zmatrix-demo.ipynb examples.

This is the code from the examples I am running:

from rdkit import Chem
from rdkit.Chem import AllChem
from simtk import unit
import numpy as np
import glob

from rdkit.Chem.Draw import IPythonConsole

rdmol = Chem.MolFromSmiles("CCOCC")
rdmol = AllChem.AddHs(rdmol)
setLabels(rdmol)
rdmol.Compute2DCoords()
rdmol

zm = ZMatrix(rdmol)
print(zm.z) # works OK!

AllChem.EmbedMolecule(rdmol)
AllChem.MMFFOptimizeMolecule(rdmol)
cart_crds = np.array(rdmol.GetConformers()[0].GetPositions())*unit.angstrom
print(cart_crds) # works OK!

z_crds = zm.build_z_crds(cart_crds) # FAILS!
print(z_crds) 

z_string  = zm.build_pretty_zcrds(cart_crds) # FAILS!
print(z_string)

And this is the errors I'm getting when attempting to call build_z_crds() or build_pretty_zcrds():

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-14-05b22af86ed5>](https://localhost:8080/#) in <cell line: 1>()
----> 1 z_crds = zm.build_z_crds(cart_crds)
      2 print(z_crds)

[<ipython-input-9-c12dad6e5d86>](https://localhost:8080/#) in build_z_crds(self, crds)
    435             z_crds_dict[z_idx] = list()
    436             if z_idx == 0:
--> 437                 z_crds_dict[z_idx].append(crds[atm_idxs[0]].in_units_of(unit.nanometer))
    438             if z_idx > 0:
    439                 dist = pts_to_bond(crds[atm_idxs[0]],

AttributeError: 'numpy.ndarray' object has no attribute 'in_units_of'


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-8-d3110345b62c>](https://localhost:8080/#) in <cell line: 1>()
----> 1 z_string  = zm.build_pretty_zcrds(cart_crds)
      2 print(z_string)


[<ipython-input-2-35efd1e7df81>](https://localhost:8080/#) in build_pretty_zcrds(self, crds)
    379     def build_pretty_zcrds(self, crds):
    380 
--> 381         z_crds_dict = self.build_z_crds(crds)
    382         z_string    = []
    383         for z_idx, atm_idxs in self.z.items():

[<ipython-input-2-35efd1e7df81>](https://localhost:8080/#) in build_z_crds(self, crds)
    401             z_crds_dict[z_idx] = list()
    402             if z_idx == 0:
--> 403                 z_crds_dict[z_idx].append(crds[atm_idxs[0]].in_units_of(unit.nanometer))
    404             if z_idx > 0:
    405                 dist = pts_to_bond(crds[atm_idxs[0]],

AttributeError: 'numpy.ndarray' object has no attribute 'in_units_of'

There are also other errors of the same kind when calling pts_to_bond() in other cells.

zmatrix_demo_for_colab.zip

rsz_reproduce_issue.zip

SS-1: Stereochemistry

I don't think the algorithm accounts for @ or @@ enforced in SMILES stereochemistry. I could be wrong.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.