GithubHelp home page GithubHelp logo

charnley / rmsd Goto Github PK

View Code? Open in Web Editor NEW
465.0 18.0 115.0 560 KB

Calculate Root-mean-square deviation (RMSD) of two molecules, using rotation, in xyz or pdb format

License: BSD 2-Clause "Simplified" License

Python 99.66% Makefile 0.34%
rmsd xyz pdb alignment molecule structure reordering kabsch atoms assignment

rmsd's Introduction

Calculate Root-mean-square deviation (RMSD) of Two Molecules Using Rotation

The root-mean-square deviation (RMSD) is calculated, using Kabsch algorithm (1976) or Quaternion algorithm (1991) for rotation, between two Cartesian coordinates in either .xyz or .pdb format, resulting in the minimal RMSD.

For more information please read RMSD and Kabsch algorithm.

Motivation

You have molecule A and B and want to calculate the structural difference between those two. If you just calculate the RMSD straight-forward you might get a too big of a value as seen below. You would need to first recenter the two molecules and then rotate them unto each other to get the true minimal RMSD. This is what this script does.

No Changes Re-centered Rotated
begin translate rotate
========== =========== ==========
RMSD 2.50 RMSD 1.07 RMSD 0.25

Citation

Please cite this project when using it for scientific publications.

Installation

Easiest is to get the program vis PyPi under the package name rmsd,

or download the project from GitHub via

There is only one Python file, so you can also download calculate_rmsd.py and put it in your bin folder.

Usage examples

Use calculate_rmsd --help to see all the features. Usage is pretty straight forward, call calculate_rmsd with two structures in either .xyz or .pdb. In this example Ethane has the exact same structure, but is translated in space, so the RMSD should be zero.

It is also possible to ignore all hydrogens (useful for larger molecules where hydrogens move around indistinguishable) and print the rotated structure for visual comparison. The output will be in XYZ format.

If the atoms are scrambled and not aligned you can use the --reorder argument which will align the atoms from structure B unto A. Use --reorder-method to select what method for reordering. Choose between Hungarian (default), distance (very approximate) and brute force (slow).

It is also possible to use RMSD as a library in other scripts, see example.py for example usage.

Problems?

Submit issues or pull requests on GitHub.

Contributions

Please note that we are using black with line length of 99. Easiest way to abide to the code standard is to install the following package.

and run the following command in your repository

This will install a hook in your git and re-format your code to adhere to the standard. As well as check for code quality.

rmsd's People

Contributors

aandi avatar andersx avatar ashafix avatar benjfitz avatar biomadeira avatar charlielaughton avatar charnley avatar cstein avatar dubinnyi avatar hmcezar avatar iribirii avatar kamurani avatar kplauritzen avatar larsbratholm avatar mcocdawc avatar nbehrnd avatar termehansen avatar xg590 avatar yurivict avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rmsd's Issues

add mass-weight to RMSD calculations

Hi all!

I trust you guys are all safe and healthy during this COVID-19 pandemic.

Quick question: is there any way to calculate the RMSD contemplating the mass-weighting? If not, would someone be willing to add it? :-)

Please see http://archive.ambermd.org/200805/0482.html for further info.

Compared to, for example, QMol, it seems like enabling/disabling such an option yields different results.

For example:
Results from calculate_rmsd.py and QMol with "mass weighting" disabled:
01 vs v2 10.902
v1 vs v3 0.179
v1 vs v4 5.950
v1 vs v5 10.929

They are identical.

Now with QMol and "mass weighting" enabled:
01 vs v2 10.874
v1 vs v3 0.179
v1 vs v4 5.933
v1 vs v5 10.902

As you can appreciate, there is a slight difference but it makes a rather huge change...

QMol's source-code can be downloaded here:
http://www.ccl.net/cca/software/MS-WIN95-NT/qmol/index.shtml

I only found references to "mass weight" in QMol's kabsch.cpp and kabsch.h

I know that there is a weighted Kabsch algorithm (thanks Jimmy for the info)
https://github.com/charnley/rmsd/blob/master/rmsd/calculate_rmsd.py#L229

but, from what I see in the calculate_rmsd.py options, there are no ways to call that function.

Volunteers? :-)

Thanks a lot in advance for your time and consideration!

I look forward to hearing from you soon!

Please stay safe!

Best,
Martin

rmsd package requires typing_extensions but missing from setup.py

Create a new virtual environment:

% python3 -m venv /tmp/v3
% source /tmp/v3/bin/activate
(v3) % pip install rmsd
Collecting rmsd
Using cached rmsd-1.5.0-py3-none-any.whl (17 kB)
Collecting scipy
Using cached scipy-1.9.3-cp310-cp310-macosx_10_9_x86_64.whl (34.3 MB)
Collecting numpy
Using cached numpy-1.24.0-cp310-cp310-macosx_10_9_x86_64.whl (19.8 MB)
Installing collected packages: numpy, scipy, rmsd
Successfully installed numpy-1.24.0 rmsd-1.5.0 scipy-1.9.3

% python3
Python 3.10.9 (main, Dec 15 2022, 18:25:35) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

from rmsd.calculate_rmsd import (NAMES_ELEMENT, centroid, check_reflections, rmsd)

Traceback (most recent call last):
File "", line 1, in
File "/private/tmp/v3/lib/python3.10/site-packages/rmsd/init.py", line 2, in
from .calculate_rmsd import *
File "/private/tmp/v3/lib/python3.10/site-packages/rmsd/calculate_rmsd.py", line 25, in
from typing_extensions import Protocol
ModuleNotFoundError: No module named 'typing_extensions'

Problems with check_reflections function

I know this function is new to rmsd and so might not be fully vetted, but I am having issues with check_reflections. I tested it by using two sets of identical (fake) coordinates, and this function calculates an rmsd between them, as well as suggests reflecting the x-axis. The code I run is:

import calculate_rmsd as rmsd
import numpy as np

atoms = ["C","O","O"]
coord1 = np.array([[1.0,1.0,1.0],[2.0,2.0,2.0],[3.0,3.0,3.0]])
coord2 = coord1

atoms=np.array(atoms)

a, b, c, d = rmsd.check_reflections(atoms,atoms,coord1,coord2)
print("min_rmsd = %s" % a, "\nmin_swap = %s" % b, "\nmin_reflection = %s" % c, "\nmin_review = %s" % d)

The output is:

min_rmsd = 3.4641016151377526
min_swap = [0 1 2]
min_reflection = [-1 1 1]
min_review = [0 1 2]

Is there something I'm doing wrong in implementing this function, or is it a bug I can't figure out?

infer the two coordinate sets for 3D data

Dear Charnley,

How can I infer the fitted coordinates for a pair of 3D datasets?
Beside to the rmsd value, I would like to plot the two dataset next to each other.

Thanks in advance.
Best Regards,
Attila

reordering while preparing for output

Hi,
using the module I got geometries which, in my humble opinion, are too far from each other for a very small RMSD calculated. These molecules look similar if I take original geometries.
When the --reorder option is turned on, the q_coord list is reordered (lines 2021-2022). Aren't the lines (2046-2047)

        if q_review is not None:
            q_coord = q_coord[q_review]
            q_all_atoms = q_all_atoms[q_review]

changing their order again, when output is requested? When I commented out the mentioned part, the new geometries look more aligned with each other and my expectations.

printed structure does not obey --use-reflections

Using the --use-reflections option with the calculate_rmsd script affects the computed rmsd, but it doesn't seem to affect the output structure when -p is used too:

$ cat a.xyz
8

C     -0.75194898     0.15712980    -0.62038769
H     -1.32975650    -0.44037197    -1.31263409
H     -0.93962923     1.21050962    -0.77368922
C      0.75194846    -0.15712928    -0.62038611
H      0.93962586    -1.21050994    -0.77368854
H      1.32975920     0.44036972    -1.31263191
O     -1.10924506    -0.23012201     0.71758377
O      1.10924528     0.23012185     0.71758355

$ cat b.xyz 
8

C     -0.78959873     0.10620675    -0.52734402
H     -1.27819677    -0.85803167    -0.54888891
H     -0.99000386     0.63677309    -1.45273479
C      0.78962678    -0.10600531    -0.52735255
H      0.98992895    -0.63685564    -1.45258775
H      1.27843070     0.85811519    -0.54907319
O     -1.33661735     0.82461649     0.53999508
O      1.33642931    -0.82482111     0.53973588

$ calculate_rmsd a.xyz b.xyz 
0.8308965795175342

$ calculate_rmsd a.xyz b.xyz --use-reflections
0.3006966184789406

$ calculate_rmsd a.xyz b.xyz -p > b1.xyz

$ calculate_rmsd a.xyz b.xyz --use-reflections -p > b2.xyz

$ calculate_rmsd a.xyz b1.xyz
0.8308965795175342

$ calculate_rmsd a.xyz b2.xyz
0.8308965795175342

$ diff -s b1.xyz b2.xyz
Files b1.xyz and b2.xyz are identical

$ calculate_rmsd --version
rmsd 1.3.2

See https://github.com/charnley/rmsd for citation information

Moreover, with the latest pip version, --use-reflections does almost nothing at all:

$ calculate_rmsd --version
rmsd 1.4

See https://github.com/charnley/rmsd for citation information

$ calculate_rmsd a.xyz b.xyz 
0.830896579517534

$ calculate_rmsd a.xyz b.xyz --use-reflections
0.8308965795175339

Citations and license are outputted to terminal with the help message

There is no reason to print these messages when the user ask for the help message

license:
  https://github.com/charnley/rmsd/blob/master/LICENSE

citation:
  Kabsch algorithm:
    Kabsch W., 1976, A solution for the best rotation to relate two sets of
    vectors, Acta Crystallographica, A32:922-923, doi:10.1107/S0567739476001873

  Quaternion algorithm:
    Michael W. Walker and Lejun Shao and Richard A. Volz, 1991, Estimating 3-D
    location parameters using dual number quaternions, CVGIP: Image
    Understanding, 54:358-367, doi: 10.1016/1049-9660(91)90036-o

  Implementation:
    Calculate RMSD for two XYZ structures, GitHub,
    http://github.com/charnley/rmsd

This is fine in the github readme, but when using it in terminal we only need the Usage information

error: Structures not same size

I used pd2_ca2main to generate a PDB file.
I am trying to compare the new file with the old one, but I can't.

user_name@server_name:~/bbq_spatial$ python3 calculate_rmsd.py  ./bbq_input_pdb/pdb1akp.pdb  ./bbq_output_pdb/pdb1akp.pdb
error: Structures not same size
user_name@server_name:~/bbq_spatial$

error: cannot reorder atoms and print result with a view

Hello,

The atoms of my structures A and B are not in the same order so I am trying to use --reorder to align the atoms. I also need to print the rotated structure in an output file for further use.
However when I use --reorder and --print at the same time (python calculate_rmsd.py --reorder --no-hydrogen --print A.xyz B.xyz > C.xyz), I have the following error message:
error: cannot reorder atoms and print result with a view.

Could you please help me with this?

Many Thanks

error: Structures not same size

HELLO,

I would like to calculate the rmsd beween native and redocked ligand.

calculate_rmsd --reorder native.pdb redocked.pdb

native.txt
redocked.txt

error: Structures not same size

The number of atoms in both files are same. Could you please help me to solve this simple error?

Redundant matrix calculation in quaternion method

Since the coordinates have been move to the centroid before applying quaternion rotation, the matrices C2, C3 and A become redundant and the minimal eigenvalue of C1 minimize the RMSD.

def quaternion_rotate(X, Y):
"""
Calculate the rotation
"""
N = X.shape[0]
W = np.asarray([makeW(*Y[k]) for k in range(N)])
Q = np.asarray([makeQ(*X[k]) for k in range(N)])
Qt_dot_W = np.asarray([np.dot(Q[k].T, W[k]) for k in range(N)])
W_minus_Q = np.asarray([W[k] - Q[k] for k in range(N)])
A = -np.sum(Qt_dot_W, axis=0)
eigen = np.linalg.eigh(A)
r = eigen[1][:,eigen[0].argmin()]
rot = quaternion_transform(r)
return rot

Fails to read XYZ files written by ASE

Example

19
Properties=species:S:1:pos:R:3:forces:R:3 2S-2-Amino-3-methylbutanoic=T acid=T energy=-10875.507103851405 dipole="-0.11693122986387085 0.2720462997103968 0.17472161467512684" magmom=0.0 pbc="F F F"
C       0.22656880      -0.56580271       0.37053473       0.00113129      -0.01146712       0.00236542
N       1.53104275      -1.25352933       0.37698414      -0.00611923       0.00493652       0.00447372
C       0.24487845       0.94565579       0.82884627      -0.00375381      -0.00766189      -0.01280409
C      -1.18278502       1.53035594       0.91007832      -0.00606780      -0.00133697      -0.00051422
C       0.94733822       1.06970643       2.19799655       0.00257110       0.00982161      -0.01501524
C      -0.22962089      -0.60972299      -1.08178558       0.00298248      -0.00293106       0.00534789
O       0.51218343      -0.53807897      -2.07032273      -0.00113129       0.00951308      -0.00071991
O      -1.61832134      -0.67908207      -1.18057738       0.00833037       0.00956450       0.00287964
H      -0.50313117      -1.10973073       0.99585130      -0.00431945      -0.00077133       0.00025711
H       2.01037607      -1.19295784       1.28583059      -0.01007873      -0.01013015       0.00190262
H       2.11635497      -0.89127820      -0.39447838      -0.00961593      -0.01177565      -0.01110717
H       0.83331353       1.49436672       0.06509178       0.00406234       0.01362685       0.01254698
H      -1.74131800       1.35179101      -0.02207033       0.00478225      -0.01043868      -0.00380523
H      -1.14539339       2.61527111       1.11156942      -0.00133697       0.00385666       0.00534789
H      -1.73810741       1.04465878       1.73441880       0.00622207      -0.01064437       0.00421661
H       0.99277380       2.12810382       2.50783434      -0.00390808       0.01177565       0.01383254
H       1.98011451       0.68259268       2.15005599       0.00725051       0.00128555       0.00642776
H       0.38544800       0.50413825       2.96567565       0.00833037       0.00930739      -0.00658202
H      -1.83911536      -0.65210679      -2.16786412       0.00056564      -0.00647918      -0.00910171                                                                                                                                                                 

Error:
Reading the .xyz file failed in line 2. Please check the format.

pdb coordinate reader: error: Parsing coordinates for the following line

if x_column == None: try: # look for x column for i, x in enumerate(tokens): if "." in x and "." in tokens[i + 1] and "." in tokens[i + 2]: x_column = i break except IndexError: exit("error: Parsing coordinates for the following line: \n{0:s}".format(line))
If the pdb line is like 'ATOM 383 C6 C B 122 -2.217 -2.542-103.749' (the value of x and that of z are connected), the code will exit, and the coordinates cannot be obtained.

Issue with Importing RMSD in Python

Hello.

Upon installing rmsd, I tried using the functions part of rmsd, as I am working on a script. However, I keep getting the following error message (even when uninstalling/reinstalling):

AttributeError: module 'rmsd' has no attribute 'kabsch_rmsd'

I'd like to know how I can work around this issue. Thanks!

Citing charnley/rmsd (with a DOI)

Hi Jimmy,

Thanks a lot for having written (and distributing) charnley/rmsd. I have used your code as part of my PhD work and am looking at the best way of citing your work. Have you considered starting using Zenodo on your project? This would automatically give you a DOI for every release, and is very easy to set up.

Thanks, Bertrand

Pip version has slight error

If installing directly from pip (i.e. pip install rmsd), running with --output does not generate the correct geometry. It appears the pip version is missing the step where p_all is translated by Qc (Q centroid coordinates):

pip version:

U = kabsch(P, Q)
p_all -= Pc
p_all = np.dot(p_all, U)
write_coordinates(p_atoms, p_all, title="{} translated".format(args.structure_a))
quit()

GitHub version:

U = kabsch(P, Q)
p_all -= Pc
p_all = np.dot(p_all, U)
p_all += Qc
write_coordinates(p_atoms, p_all, title="{} translated".format(args.structure_a))
quit()

Can't install with pip

I started a new python2 virtualenv with only numpy installed.

I cloned this repository and changed to the pip branch.
Trying to install I get the following error

(test_rmsd_2) primdal at Kaspers-MBP in ~/dev/rmsd at [13:58] (pip)
$ pip install .
Processing /Users/primdal/dev/rmsd
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/6l/r_g4km7552zd0s6jyvlyn6100000gn/T/pip-zDQPdV-build/setup.py", line 11, in <module>
        from rmsd.calculate_rmsd import __version__
      File "rmsd/__init__.py", line 2, in <module>
        from rmsd.calculate_rmsd import *
      File "rmsd/calculate_rmsd.py", line 17, in <module>
        from builtins import range
    ImportError: No module named builtins

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/6l/r_g4km7552zd0s6jyvlyn6100000gn/T/pip-zDQPdV-build/

builtins seems to be a python3-only package

Doing the same with a python3 virtualenv is fine. No errors encountered

saving rotated coordinate?

Hi,
I came to know about this tool from openbabel github page.
I'm interested in translating and then rotating a pdb structure such that a specified atom (X) is at the center and X-Y bond is aligned in the -z direction.

Thus I"m just curious is it possible to save rotated structures in addition to printing rmsd with this "rmsd" program.
Looking forward to valuable suggestions.
thank you and best regards,
Vaibhav

--reorder-method qml --reorder-method none currently not available

This issue report evolves from an earlier discussion here.

With the test data provided in the .zip attached below, the RMSD of an alignment works fine e.g.,

$ calculate_rmsd --reorder --reorder-method brute 1.xyz 2.xyz 
0.0012379383034675595

However, this is not the case opting for qml:

$ calculate_rmsd --reorder --reorder-method qml 1.xyz 2.xyz 
Traceback (most recent call last):
  File "/usr/local/bin/calculate_rmsd", line 8, in <module>
    sys.exit(main())
  File "/home/USER/.local/lib/python3.9/site-packages/rmsd/calculate_rmsd.py", line 1965, in main
    q_review = reorder_method(p_atoms, q_atoms, p_coord, q_coord)
  File "/home/USER/.local/lib/python3.9/site-packages/rmsd/calculate_rmsd.py", line 829, in reorder_similarity
    p_vecs = qml.representations.generate_fchl_acsf(
AttributeError: module 'qml.representations' has no attribute 'generate_fchl_acsf'

despite qml is installed with pip for Python 3 (version 0.2.1, Anders S. Christensen (2016)), version information after import qml in Python and a subsequent help(qml)). It's PyPi page states version 0.4.0.27 as the one currently provided, suggesting a discrepancy.

Second, contrasting to the documentation displayed by calculate_rmsd --help, the level none does not work:

calculate_rmsd --reorder --reorder-method none 1.xyz 2.xyz 
Traceback (most recent call last):
  File "/usr/local/bin/calculate_rmsd", line 8, in <module>
    sys.exit(main())
  File "/home/USER/.local/lib/python3.9/site-packages/rmsd/calculate_rmsd.py", line 1965, in main
    q_review = reorder_method(p_atoms, q_atoms, p_coord, q_coord)
UnboundLocalError: local variable 'reorder_method' referenced before assignment

The observations refer to an installation in Linux Debian 12/bookworm (branch testing), Python 3.9.10, and rmsd 1.4 installed via pip for Python 3.

2022-03-04_data.zip

RMSD operation in a numpy array of moelcules

I want to calculate RMSD between A and B, both having the following shape:

[1000,57,3] where the first dimension correspond to the number of molecules, the second one is the number of atoms and the third one x,y,z positions.

It would be great if it is possible to do this operation without any iteration over the first dimension. I've tried but np.dot(A.T,B) throw an error, so I'm thinking that to accomplish this I'll need to modify the code.

Please let me know your comments.

feature suggest: output of *.xyz .AND. RMSD

I would like to suggest, as additional option, output of the aligned
structure B, in *.xyz format, and RMSD in one step.
For the scrutiny of larger clusters taking longer time to scrutinize,
and with the issue of single-core computation pending, this may
cut the computational time to spend by half.

My understanding of the argparse-defining section (lines #795
following) is too little to achive this by addition of line like
parser.add_argument('P', action='store_true', help='print out structure B, after refined fit in XYZ format and provide RMSD'
(lead by upper case P), tentatively placed on line #1040 and
followed by an (edit) copy-paste of section line #999 till and
including line #1039.

What worked for me as short-cut is the copy-paste of line #1036 just
after line #1023 to obtain a permanent record of RMSD with
the corresponding alignment which however works only when either
--use-reflections -p
or
--use-reflections-keep-stereo -p
is used. The numeric value of RMSD however is the same as when
calling calculate_rmsd.py without --use-reflections or --use-reflections-keep-stereo
without the optional -p parameter. Perhaps I missed an entry
corresponding to filename of either fileA, and fileB easing the
generation of a permanent record of aligned molecule B if the
script is called within a loop, but may harvest both types of
information from the CLI piped output /via/ grep.

The attached *.txt is the slightly modfified script, rather as source of
inspiration than as commit.
20190326-modified-calculate_rmsd.py.txt

error: Structures not same size

Sorry to bother you with this little problem
The two PDB files I want to compare have different sizes
"error: Structures not same size" what this means is that the two structures size need same ?

numpy needs to be installed first, separately, because of rmsd's setup.py structure

Currently the rmsd setup.py has install_requires=['numpy',]. In theory this should mean that you can do pip install rmsd on a system with no numpy installed, and pip will install both rmsd and numpy.

However currently this is wasted, and trying to do so will throw a ModuleNotFoundError: No module named 'numpy'. This is because rmsd's setup.py imports rmsd, which imports numpy, which hasn't been installed yet.

I think this could be fixed by changing import rmsd to from rmsd import __version__. I can make a pull request if you would prefer.

Otherwise it is currently hard to make rmsd a dependency because then installing those projects with pip will fail.

Thanks!

Reordering two differently sized but similar molecules

I'm trying to reorder two differently sized molecules so that they share as much of the atom numbering in common. I seen you've got this feature for molecules of the same size which is brilliant, but how could I go about comparing two differently sized molecules? Any help would be very appreciated!

Cheers,

Jon

Modify W instead of V

rmsd/rmsd/calculate_rmsd.py

Lines 155 to 157 in cd8af49

if d:
S[-1] = -S[-1]
V[:, -1] = -V[:, -1]

Why do we have to modify the value of S, it's not needed anywhere in the function? Also instead of modifying

V[:, -1] = -V[:, -1] 

can't we just do this

W[-1]  = -W[-1]

Both properly changes the sign of rotation matrix, but in later case we just have to traverse over the outer index. This can result in a (very) slight gain in computation time for larger number of particles

How to align both position and orientation?

Thanks for great library! It looks like just what I was looking for apart from one thing and I was wondering if someone could leave some comments.

I have two sets, P and Q. Elements of P correspond to elements of Q. Each element has a set of three coordinates in the 3D space and also orientation coordinates (which can be expressed using Euler angles, quaternions, rotational matrix, anything that is most convenient). How to align P and Q?

According to the Quaternion Algorithm paper [2] cited in the README, both position and orientation are getting aligned. Quoting from the abstract

This paper describes a new algorithm for estimating the position and orientation of objects (...)

However, when I look up the tests I see that even with both positions and orientations available, only the position coordinates are taken into account

result = rmsd.quaternion_rmsd(p_coord, q_coord)

I think I might have misunderstood something. Was the the orientation alignment purposely skipped (as it might not be relevant to the usual use-case). Could someone recommend how to solve my problem? Perhaps different algorithm would be more appropriate.

Many thanks in advance!

ValueError: operands could not be broadcast together with shapes (3) (0)

while rmsd'ing these two xyz files getting the following error:

fedor@slater:~/tmp$ python calculate_rmsd.py B2H6_.xyz B2H6.xyz
Normal RMSD: 0.0205857986547
Traceback (most recent call last):
File "calculate_rmsd.py", line 168, in
Qc = centroid(Q)
File "calculate_rmsd.py", line 92, in centroid
C = sum(X)/len(X)
ValueError: operands could not be broadcast together with shapes (3) (0)

The two files are:
(B2H6.xyz):

B2H6
optimized geometry:
B1 0.000000 0.000000 0.897697
B2 0.000000 0.000000 -0.897697
H3 -0.964674 0.000000 0.000000
H4 0.964674 0.000000 0.000000
H5 0.000000 1.031433 1.471441
H6 0.000000 -1.031433 1.471441
H7 0.000000 1.031433 -1.471441
H8 0.000000 -1.031433 -1.471441

(B2H6_.xyz):

8
B2H6_MPW1K-LACVP#+.01
B 0.00000 0.00000 0.87380
B 0.00000 0.00000 -0.87380
H -0.97570 0.00000 0.00000
H 0.97570 0.00000 0.00000
H 0.00000 1.03820 1.45010
H 0.00000 -1.03820 1.45010
H 0.00000 1.03820 -1.45010
H 0.00000 -1.03820 -1.45010

Align N molecules to M other molecules

Code if this will be implemented in the future.

Where P is (N, natoms, 3) and Q is (M, natoms, 3).

C = np.dot(np.transpose(P, [0,2,1]), Q.T)
V, S, W = np.linalg.svd(np.transpose(C, [0,3,1,2]))
d = (np.linalg.det(V) * np.linalg.det(W)) < 0.0
S[d,-1] *= -1
V[d, :, -1] *= -1

U = np.dot(V[0,1], W[0,1])
U = np.matmul(V, W)

P = np.einsum('ijk,iabk->ijba', P, U)

delta = P - np.transpose(Q, [1,2,0])

rmsd = np.sqrt(np.sum(delta**2, axis=(1,2)))

How to get full transformation in a script?

I'd like to apply a full alignment, with possible reflection and reordering of atoms, and I need to re-use the resulting transformation matrix(ces) to transform other data (e.g. forces). With -e -ur I can do the alignment in command line, but how can I do the same inside a script (where the coordinates are not necessarily written in a file) and get the transformations?

multiprocessing with calculate_rmsd.py

Interested to deploy calculate_rmsd.py in a series of comparisons with
"true clusters" (cf. vide infra) my intent was to deploy calculate_rmsd.py
with a moderator script. For smaller tests (one molecule A ./. one molecule
B), using a linear approach was fast enough. Now facing larger clusters,
the thought occured to me that the mere testing then could be performed
in parallel, on multiple CPU cores. My call of module multiprocessing,
however, yields a situation where each CPU seems to see the list of iterable
tests to perform as its own, to the extent that each core engaged performs
the task -- even if an other core determined RMSD already.

Question: Does an other user of calculate_rmsd.py has experience to
parallelize the script? Is there a keyword I did not spot yet to organize the
computation within this loop?

Disclaimer : because it is the first time that I search for a more parallelized
approach, possibly other approaches than the multiprocessing module may
work. The intent is to run scrutinies with files like the attached ones for series
of 50...150 test clusters each, with about a dozen of molecules per cluster,
edited such that the scipt may work with them. Rapidly, this will increase the
computational cost (i.e., n * (n-1)/2 tests per n clusters). Suggestions may
be helpful to related issues such as deposit here, too.

simplifiedModerator.py.txt
modelData.zip

reading .xyz files doesn't work when they contain scientific notation

for example, this file,

19

C -2.293946 -3e-06 -0.156296
C -1.072129 0.756112 -0.6106
C -1.072122 -0.756104 -0.610611
C -0.06715 -1.123802 0.475015
C 1.358616 -1.188877 -0.086502
O 1.677474 3e-06 -0.799713
C 1.358611 1.18888 -0.086499
C -0.067151 1.123797 0.475026
O -0.11043 -8e-06 1.371348
H -2.543398 -1.2e-05 0.89919
H -3.143761 -8e-06 -0.829798
H -1.086381 1.326009 -1.5315
H -1.086375 -1.325977 -1.531525
H -0.309135 -2.019187 1.056162
H 2.055688 -1.340367 0.752686
H 1.481047 -2.012895 -0.797179
H 2.055691 1.340376 0.752679
H 1.481026 2.012901 -0.797177
H -0.309129 2.019175 1.056186

gives the error message,
Reading the .xyz file failed in line 2. Please check the format.

Issue with capitalization of elements >1 letter

I noticed this issue

atoms, coords = rmsd.get_coordinates(file_with_chlorine, "xyz")
print(atoms)

Gives, for example

['C' 'C' 'CL' 'C' 'C' 'H' 'H' 'H' 'H' 'H' 'H' 'H']

Even if the input file denotes chlorine as "Cl".

This means that

output = rmsd.set_coordinates(atoms, aligned_coordinates, title="wtf)

produces output with CL rather than Cl.

IMO it should either (1) preserve the input capitalization or (2) output the standard capitalization.

I will make a PR with (1) unless there are good reasons for (2), or if this is the intended behavior?

Willing to add the rmsd value to title line?

Sometimes I am interested in knowing the rmsd value for the modified structure. Now I need to run calculate_rmsd.py twice (with and without -p).
With this simple change the rmsd value is printed in the title line of the xyz file.

line 2052 from:
xyz = set_coordinates(q_all_atoms, q_coord, title=f"{settings.structure_b} - modified")
to
xyz = set_coordinates(q_all_atoms, q_coord, title=f"{settings.structure_b} - modified. RMSD = {result_rmsd}")

Output rotation matrix

Is there a flag that would output not (only) the minimized rmsd and the rotated B structure, but the rotation matrix (and translation vector) that needs to be applied? This would be useful when the same transformation has to be applied to other data, like multipole moments, surrounding molecules, etc.

Edit: I'm talking about the command-line script calculate_rmsd.

Incorrect values for proteins

Hi there, so I'm not sure where exactly to put this - and it might be my fault - but I am having problems with some protein comparisons.

I have taken the protein 1L14 from PDB, taken the sequence and ran it through AlphaFold to see how it folds. I've also ran the 1L14 through Rosetta scoring function just to have it attach the missing hydrogens so they can be compared with the AlphaFold results. AlphaFold gave some nice results, which I've compared through PyMol and that gives me RMSD of 0.393. I'm submitting all of them here (original 1L14 ; 1L14 with hydrogens as given by Rosetta; and the best AlphaFold result) as TXT files, since PDBs cannot be uploaded.

1L14.txt
1L14-withH.txt
ranked_0.txt

However, when I use calculate_rmsd --reorder 1L14-withH.pdb ranked_0.pdb to get the result, I am getting RMSD of 10.84, and using --no-hydrogen does not make it much better (10.82). This value is about 30 times higher than what PyMol gives me.

My question is how, why, and is it my mistake? Or have I missed something? The structure has 164 residues and around 2400 atoms, so it cannot be that I forgot to divide by something. Test examples work fine (ethane/translated ethane gives me something very close to zero). Thanks for any help.

Pre-specify residuals?

I'm looking for a method that lets me specify a list of distances to minimize the RMSD against. i.e.  Given two point clouds each with size N, and a 1-1 correspondence, as opposed to having the distances be as close to zero as possible between each pair, have it instead be as close to some distance k_i (varying for each pair). 

Kabsch not zero for same file

When using

calculate_rmsd configuration1.xyz configuration1.xyz

the result should be 0, since the data sets are identical. However, in case of attached file, they return

Normal RMSD: 0.0
Kabsch RMSD: 6.16698752446e-16
Quater RMSD: 0.0

Close enough, but why?


Example: 2-bromo-cyclohexan-1-one conformation1.xyz

17
SCF done       0.00000000
C          0.21071       -0.08068       -1.01215
C         -0.22200        1.24756       -0.37911
C          0.16684        1.30751        1.08071
C         -0.22332        0.10072        1.90828
C          0.16529       -1.20784        1.23556
C         -0.32656       -1.26651       -0.21039
H          1.30419       -0.12448       -1.04911
H         -0.16157       -0.12916       -2.03942
H         -1.30963        1.34512       -0.43942
H          0.22983        2.09315       -0.90165
H          0.22862        0.18105        2.89713
H          1.25813       -1.28534        1.25583
H         -0.24087       -2.04734        1.80504
H         -1.42103       -1.25606       -0.22425
H          0.00821       -2.20042       -0.66980
O          0.74759        2.24101        1.56474
Br        -2.17038        0.22320        2.17266

msg = f"error: Parsing atomtype for the following line:" f" \n{line}"

Tried downloading the code and running it in the command line. This is what I got?

./calculate_rmsd.py pablo3a.xyz pablo3b.xyz
File "./calculate_rmsd.py", line 1467
msg = f"error: Parsing atomtype for the following line:" f" \n{line}"
^
SyntaxError: invalid syntax

Would you help me understand what this means?

Reorder with Hungarian and distance gives larger RMSD

I'm noticing some strange behavior when using the reorder option.
Sometimes, the RMSD after reordering is larger than without reordering.

For example, for the two structures below:
[hmcezar@bitz-dell teste]$ calculate_rmsd 47.xyz 48.xyz
1.670565405538033
[hmcezar@bitz-dell teste]$ calculate_rmsd 47.xyz 48.xyz -e
4.956142899937815

Is this an expected behavior?

The .xyz files are at https://pastebin.com/4srhWypu.

partially defunct structure export.

calculate_rmsd.py permits loading of structural models, their superposition, determination of the RMSD and optional export of the second structure -- then aligned to the first one in the *.xyz format. As I notice, this structure export is affected badly already in the master branch version 1.3.0 (which, to ascertain the situation, was freshly checked-out by mine today); yet equally the pull request by xg590. It is observed regardless if *.pdb, or *.xyz data provide the model data to scrutinize once the optional --p parameter is used while simultaneously requesting either --use-reflections, or --use-reflections-keep-stereo.

I became aware of this issue while cross-checking with *.xyz model data derived from crystallographic models about tartaric acid. Initially, I aimed to check xg590's pull request, the option --use-reflections may invert the sterochemical information, which xg590's extension --use-reflection-keep-stereo aims to rectify. The test data for this query are derived from crystallographic models by the Cambridge Crystallographic Database about tartaric acid intentionally simplified to contain nothing but one single molecule of the compound; hydrogens were removed intentionally, too. For this bug report, only two data sets derived from the entries TARTAC and TARTAC02 are considered, which is about the D-(-)-(2S,3S) isomer among the three possibilities.

After running the computation on the CLI, both the prisitine first model, as well as the second model now described as "aligned" were displayed simultaenously with Jmol. In the case refraining from any form of --use-reflection (no pun intended), the result is well acceptable:

tartac00-tartac02-noreflectionsatall-master

The situation however is bad for --use-reflection as if the two molecules were solely brought in vicinity of each other only:

tartac00-tartac02--use-reflections-master-xyz

In the attached *.zip archives, I enclose the raw *.xyz files of the models, the *.xyz exported by calculate_rmsd.py along python calculate_rmsd.py --reorder -p [--use-reflection] modelA modelB, as well as *.png screen photo and *.wrl export of the screne (accessible e.g. by view3dscene in Debian) of two runs with the current master branch version 1.3.0 of the script.

It is noteworthy that the test on these models is not affected if the script is used with or without the optional parameter of --use-reflection or --use-reflections-keep-stereo (RMSD about 0.098) because they represent the same stereoisomer.

sameIsomer-noReflectionsAtAll-master.zip
sameIsomer--use-reflections-master.zip

.pdb coordinate reader can be improved

The get_coordinates_pdb function in calculate_rmsd.py would benefit from the inclusion of 'HETATM' when reading in coordinates and atoms.

if line.startswith("ATOM"):

I was able to change line 688 to:
if line.startswith("ATOM") or line.startswith("HETATM"):

This allowed the program to also read in .pdb's formatted with HETATM

Wrong result for Kabsch RMSD

Try this:

P = [[-1., 0., 0.], [0., 2., 0.], [0., 1., 0.], [0., 1., 1.]]
Q = [[0., -1., -1.], [0., -1., 0.], [0., 0., 0.], [-1., 0., 0.]]

kabsch_rmsd(P, Q)

returns:

1.232398

but the correct least RMSD is:

0.519309

Function docstrings are a bit confusing about whether centroids are subtracted or not

This is a minor nitpick, but in particular the kabsch docstring is confusing (IMHO) in this respect:

def kabsch(P, Q):
"""
The optimal rotation matrix U is calculated and then used to rotate matrix
P unto matrix Q so the minimum root-mean-square deviation (RMSD) can be
calculated.
Using the Kabsch algorithm with two sets of paired point P and Q, centered
around the centroid. Each vector set is represented as an NxD
matrix, where D is the the dimension of the space.
The algorithm works in three steps:
- a translation of P and Q
- the computation of a covariance matrix C
- computation of the optimal rotation matrix U
http://en.wikipedia.org/wiki/Kabsch_algorithm
Parameters
----------
P : array
(N,D) matrix, where N is points and D is dimension.
Q : array
(N,D) matrix, where N is points and D is dimension.
Returns
-------
U : matrix
Rotation matrix (D,D)
Example
-----
TODO
"""

This mentions "a translation of P and Q", but this is not done in the function. So IMO the explanation of the Kabsch algorithm should be put somewhere else or differentiate between what is done in this function and what has to be done by its user, and it should be made clear that these functions (also e.g. quaternion_rotate) compute matrices for rotation around the origin, not around the centroids - which is great by the way, because it allows more general use of this library for finding optimal rotations around points other than the centroid.

I'll send a PR if you agree.

mirroring / inversion of relative alignments; increment version counter.

As tested today with a *.zip copy obtained Saturday (Nov. 3rd), the implementation of
an optional mirroring and inversion of the test structures nicely automates what
otherwise was performed manually with aRMSD.

It really merits to increase the version counter (still today [Nov. 5] at the level 1.2.7, which
was about the state of the art in the second half of September) -- despite your new achievements.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.