charnley / rmsd Goto Github PK

Calculate Root-mean-square deviation (RMSD) of two molecules, using rotation, in xyz or pdb format

License: BSD 2-Clause "Simplified" License

Python 99.66% Makefile 0.34%

rmsd xyz pdb alignment molecule structure reordering kabsch atoms assignment

rmsd's Introduction

Calculate Root-mean-square deviation (RMSD) of Two Molecules Using Rotation

The root-mean-square deviation (RMSD) is calculated, using Kabsch algorithm (1976) or Quaternion algorithm (1991) for rotation, between two Cartesian coordinates in either .xyz or .pdb format, resulting in the minimal RMSD.

For more information please read RMSD and Kabsch algorithm.

Motivation

You have molecule A and B and want to calculate the structural difference between those two. If you just calculate the RMSD straight-forward you might get a too big of a value as seen below. You would need to first recenter the two molecules and then rotate them unto each other to get the true minimal RMSD. This is what this script does.

No Changes	Re-centered	Rotated

RMSD 2.50	RMSD 1.07	RMSD 0.25

Citation

Implementation:

Calculate Root-mean-square deviation (RMSD) of Two Molecules Using Rotation, GitHub, http://github.com/charnley/rmsd, <git commit hash or version number>
Kabsch algorithm:

Kabsch W., 1976, A solution for the best rotation to relate two sets of vectors, Acta Crystallographica, A32:922-923, doi: http://dx.doi.org/10.1107/S0567739476001873
Quaternion algorithm:

Michael W. Walker and Lejun Shao and Richard A. Volz, 1991, Estimating 3-D location parameters using dual number quaternions, CVGIP: Image Understanding, 54:358-367, doi: http://dx.doi.org/10.1016/1049-9660(91)90036-o

Please cite this project when using it for scientific publications.

Installation

Easiest is to get the program vis PyPi under the package name rmsd,

pip install rmsd

or download the project from GitHub via

git clone https://github.com/charnley/rmsd

There is only one Python file, so you can also download calculate_rmsd.py and put it in your bin folder.

wget -O calculate_rmsd https://raw.githubusercontent.com/charnley/rmsd/master/rmsd/calculate_rmsd.py
chmod +x calculate_rmsd

Usage examples

Use calculate_rmsd --help to see all the features. Usage is pretty straight forward, call calculate_rmsd with two structures in either .xyz or .pdb. In this example Ethane has the exact same structure, but is translated in space, so the RMSD should be zero.

calculate_rmsd tests/ethane.xyz tests/ethane_translate.xyz

It is also possible to ignore all hydrogens (useful for larger molecules where hydrogens move around indistinguishable) and print the rotated structure for visual comparison. The output will be in XYZ format.

calculate_rmsd --no-hydrogen --print tests/ethane.xyz tests/ethane_mini.xyz

If the atoms are scrambled and not aligned you can use the --reorder argument which will align the atoms from structure B unto A. Use --reorder-method to select what method for reordering. Choose between Hungarian (default), distance (very approximate) and brute force (slow).

calculate_rmsd --reorder tests/water_16.xyz tests/water_16_idx.xyz

It is also possible to use RMSD as a library in other scripts, see example.py for example usage.

Problems?

Submit issues or pull requests on GitHub.

Contributions

Please note that we are using black with line length of 99. Easiest way to abide to the code standard is to install the following package.

pip install pre-commit

and run the following command in your repository

pre-commit install

This will install a hook in your git and re-format your code to adhere to the standard. As well as check for code quality.

rmsd's People

Contributors

Stargazers

Watchers

Forkers

ryancoleman borntodie2012 huangy6 antmd deividribeiro aandi kplauritzen claire2015 mcocdawc ddemapan namkhanhtran larsbratholm termehansen hmcezar cleblond dbanda ashafix andersx biomadeira zamiljitu alvarovm gf712 aspirincode andriesvh96 rebeen psyche11 berhane feteya xg590 jlusquad nbehrnd kjappelbaum yurivict charlielaughton dongshuyan tccyl peawagon dicaeopolis lepy ismarou zhenglz dubinnyi pamelavargas03 monobot hwpang molloykp liujdincs m-rivera mochen0607 jkha-unist ledragna minghao2016 sailfish009 benjfitz ldruizsan hengkang1 muhrin plin1112 tsitsvero debnoob pattanaikl emmafranklyn yochannah tteke mdlakic caiyingchun codebian tanxiaoqin888 sarisworld gharib85 rnaimehaom kyqiu21 gkxiao songsiwei dpadula85 xmgign tiger-tiger qitsweauca erikzhang-9762 danielschulz dzyla subata20 yidapa nityatalasila yangyinuo823 superxiang enscm hongmeiyin mbenahmed1 66ming99 maikuraky aariam chrinide daphnetsolissou raghurama123 hongxu66 rexfoneteng r2stanton infant83 andrydella

rmsd's Issues

Problems with check_reflections function

I know this function is new to rmsd and so might not be fully vetted, but I am having issues with check_reflections. I tested it by using two sets of identical (fake) coordinates, and this function calculates an rmsd between them, as well as suggests reflecting the x-axis. The code I run is:

import calculate_rmsd as rmsd
import numpy as np

atoms = ["C","O","O"]
coord1 = np.array([[1.0,1.0,1.0],[2.0,2.0,2.0],[3.0,3.0,3.0]])
coord2 = coord1

atoms=np.array(atoms)

a, b, c, d = rmsd.check_reflections(atoms,atoms,coord1,coord2)
print("min_rmsd = %s" % a, "\nmin_swap = %s" % b, "\nmin_reflection = %s" % c, "\nmin_review = %s" % d)

The output is:

min_rmsd = 3.4641016151377526
min_swap = [0 1 2]
min_reflection = [-1 1 1]
min_review = [0 1 2]

Is there something I'm doing wrong in implementing this function, or is it a bug I can't figure out?

Reorder with Hungarian and distance gives larger RMSD

I'm noticing some strange behavior when using the reorder option.
Sometimes, the RMSD after reordering is larger than without reordering.

For example, for the two structures below:
[hmcezar@bitz-dell teste]$ calculate_rmsd 47.xyz 48.xyz
1.670565405538033
[hmcezar@bitz-dell teste]$ calculate_rmsd 47.xyz 48.xyz -e
4.956142899937815

Is this an expected behavior?

The .xyz files are at https://pastebin.com/4srhWypu.

infer the two coordinate sets for 3D data

Dear Charnley,

How can I infer the fitted coordinates for a pair of 3D datasets?
Beside to the rmsd value, I would like to plot the two dataset next to each other.

Thanks in advance.
Best Regards,
Attila

Modify W instead of V

rmsd/rmsd/calculate_rmsd.py

Lines 155 to 157 in cd8af49

 if d: 

 S[-1] = -S[-1] 

 V[:, -1] = -V[:, -1]

Why do we have to modify the value of S, it's not needed anywhere in the function? Also instead of modifying

V[:, -1] = -V[:, -1]

can't we just do this

W[-1]  = -W[-1]

Both properly changes the sign of rotation matrix, but in later case we just have to traverse over the outer index. This can result in a (very) slight gain in computation time for larger number of particles

Citing charnley/rmsd (with a DOI)

Hi Jimmy,

Thanks a lot for having written (and distributing) charnley/rmsd. I have used your code as part of my PhD work and am looking at the best way of citing your work. Have you considered starting using Zenodo on your project? This would automatically give you a DOI for every release, and is very easy to set up.

Thanks, Bertrand

error: Structures not same size

HELLO,

I would like to calculate the rmsd beween native and redocked ligand.

calculate_rmsd --reorder native.pdb redocked.pdb

native.txt
redocked.txt

error: Structures not same size

The number of atoms in both files are same. Could you please help me to solve this simple error?

What does --normal, --kabsch and --quater do?

As far as I can tell these parameters are not used at all.

RMSD operation in a numpy array of moelcules

I want to calculate RMSD between A and B, both having the following shape:

[1000,57,3] where the first dimension correspond to the number of molecules, the second one is the number of atoms and the third one x,y,z positions.

It would be great if it is possible to do this operation without any iteration over the first dimension. I've tried but np.dot(A.T,B) throw an error, so I'm thinking that to accomplish this I'll need to modify the code.

Please let me know your comments.

printed structure does not obey --use-reflections

Using the --use-reflections option with the calculate_rmsd script affects the computed rmsd, but it doesn't seem to affect the output structure when -p is used too:

$ cat a.xyz
8

C     -0.75194898     0.15712980    -0.62038769
H     -1.32975650    -0.44037197    -1.31263409
H     -0.93962923     1.21050962    -0.77368922
C      0.75194846    -0.15712928    -0.62038611
H      0.93962586    -1.21050994    -0.77368854
H      1.32975920     0.44036972    -1.31263191
O     -1.10924506    -0.23012201     0.71758377
O      1.10924528     0.23012185     0.71758355

$ cat b.xyz 
8

C     -0.78959873     0.10620675    -0.52734402
H     -1.27819677    -0.85803167    -0.54888891
H     -0.99000386     0.63677309    -1.45273479
C      0.78962678    -0.10600531    -0.52735255
H      0.98992895    -0.63685564    -1.45258775
H      1.27843070     0.85811519    -0.54907319
O     -1.33661735     0.82461649     0.53999508
O      1.33642931    -0.82482111     0.53973588

$ calculate_rmsd a.xyz b.xyz 
0.8308965795175342

$ calculate_rmsd a.xyz b.xyz --use-reflections
0.3006966184789406

$ calculate_rmsd a.xyz b.xyz -p > b1.xyz

$ calculate_rmsd a.xyz b.xyz --use-reflections -p > b2.xyz

$ calculate_rmsd a.xyz b1.xyz
0.8308965795175342

$ calculate_rmsd a.xyz b2.xyz
0.8308965795175342

$ diff -s b1.xyz b2.xyz
Files b1.xyz and b2.xyz are identical

$ calculate_rmsd --version
rmsd 1.3.2

See https://github.com/charnley/rmsd for citation information

Moreover, with the latest pip version, --use-reflections does almost nothing at all:

$ calculate_rmsd --version
rmsd 1.4

See https://github.com/charnley/rmsd for citation information

$ calculate_rmsd a.xyz b.xyz 
0.830896579517534

$ calculate_rmsd a.xyz b.xyz --use-reflections
0.8308965795175339

rmsd package requires typing_extensions but missing from setup.py

Create a new virtual environment:

% python3 -m venv /tmp/v3
% source /tmp/v3/bin/activate
(v3) % pip install rmsd
Collecting rmsd
Using cached rmsd-1.5.0-py3-none-any.whl (17 kB)
Collecting scipy
Using cached scipy-1.9.3-cp310-cp310-macosx_10_9_x86_64.whl (34.3 MB)
Collecting numpy
Using cached numpy-1.24.0-cp310-cp310-macosx_10_9_x86_64.whl (19.8 MB)
Installing collected packages: numpy, scipy, rmsd
Successfully installed numpy-1.24.0 rmsd-1.5.0 scipy-1.9.3

% python3
Python 3.10.9 (main, Dec 15 2022, 18:25:35) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

from rmsd.calculate_rmsd import (NAMES_ELEMENT, centroid, check_reflections, rmsd)

Traceback (most recent call last):
File "", line 1, in
File "/private/tmp/v3/lib/python3.10/site-packages/rmsd/init.py", line 2, in
from .calculate_rmsd import *
File "/private/tmp/v3/lib/python3.10/site-packages/rmsd/calculate_rmsd.py", line 25, in
from typing_extensions import Protocol
ModuleNotFoundError: No module named 'typing_extensions'

add mass-weight to RMSD calculations

Hi all!

I trust you guys are all safe and healthy during this COVID-19 pandemic.

Quick question: is there any way to calculate the RMSD contemplating the mass-weighting? If not, would someone be willing to add it? :-)

Please see http://archive.ambermd.org/200805/0482.html for further info.

Compared to, for example, QMol, it seems like enabling/disabling such an option yields different results.

For example:
Results from calculate_rmsd.py and QMol with "mass weighting" disabled:
01 vs v2 10.902
v1 vs v3 0.179
v1 vs v4 5.950
v1 vs v5 10.929

They are identical.

Now with QMol and "mass weighting" enabled:
01 vs v2 10.874
v1 vs v3 0.179
v1 vs v4 5.933
v1 vs v5 10.902

As you can appreciate, there is a slight difference but it makes a rather huge change...

QMol's source-code can be downloaded here:
http://www.ccl.net/cca/software/MS-WIN95-NT/qmol/index.shtml

I only found references to "mass weight" in QMol's kabsch.cpp and kabsch.h

I know that there is a weighted Kabsch algorithm (thanks Jimmy for the info)
https://github.com/charnley/rmsd/blob/master/rmsd/calculate_rmsd.py#L229

but, from what I see in the calculate_rmsd.py options, there are no ways to call that function.

Volunteers? :-)

Thanks a lot in advance for your time and consideration!

I look forward to hearing from you soon!

Please stay safe!

Best,
Martin

msg = f"error: Parsing atomtype for the following line:" f" \n{line}"

Tried downloading the code and running it in the command line. This is what I got?

./calculate_rmsd.py pablo3a.xyz pablo3b.xyz
File "./calculate_rmsd.py", line 1467
msg = f"error: Parsing atomtype for the following line:" f" \n{line}"
^
SyntaxError: invalid syntax

Would you help me understand what this means?

error: Structures not same size

Sorry to bother you with this little problem
The two PDB files I want to compare have different sizes
"error: Structures not same size" what this means is that the two structures size need same ?

Fails to read XYZ files written by ASE

Example

19
Properties=species:S:1:pos:R:3:forces:R:3 2S-2-Amino-3-methylbutanoic=T acid=T energy=-10875.507103851405 dipole="-0.11693122986387085 0.2720462997103968 0.17472161467512684" magmom=0.0 pbc="F F F"
C       0.22656880      -0.56580271       0.37053473       0.00113129      -0.01146712       0.00236542
N       1.53104275      -1.25352933       0.37698414      -0.00611923       0.00493652       0.00447372
C       0.24487845       0.94565579       0.82884627      -0.00375381      -0.00766189      -0.01280409
C      -1.18278502       1.53035594       0.91007832      -0.00606780      -0.00133697      -0.00051422
C       0.94733822       1.06970643       2.19799655       0.00257110       0.00982161      -0.01501524
C      -0.22962089      -0.60972299      -1.08178558       0.00298248      -0.00293106       0.00534789
O       0.51218343      -0.53807897      -2.07032273      -0.00113129       0.00951308      -0.00071991
O      -1.61832134      -0.67908207      -1.18057738       0.00833037       0.00956450       0.00287964
H      -0.50313117      -1.10973073       0.99585130      -0.00431945      -0.00077133       0.00025711
H       2.01037607      -1.19295784       1.28583059      -0.01007873      -0.01013015       0.00190262
H       2.11635497      -0.89127820      -0.39447838      -0.00961593      -0.01177565      -0.01110717
H       0.83331353       1.49436672       0.06509178       0.00406234       0.01362685       0.01254698
H      -1.74131800       1.35179101      -0.02207033       0.00478225      -0.01043868      -0.00380523
H      -1.14539339       2.61527111       1.11156942      -0.00133697       0.00385666       0.00534789
H      -1.73810741       1.04465878       1.73441880       0.00622207      -0.01064437       0.00421661
H       0.99277380       2.12810382       2.50783434      -0.00390808       0.01177565       0.01383254
H       1.98011451       0.68259268       2.15005599       0.00725051       0.00128555       0.00642776
H       0.38544800       0.50413825       2.96567565       0.00833037       0.00930739      -0.00658202
H      -1.83911536      -0.65210679      -2.16786412       0.00056564      -0.00647918      -0.00910171

Error:
Reading the .xyz file failed in line 2. Please check the format.

error: Structures not same size

I used pd2_ca2main to generate a PDB file.
I am trying to compare the new file with the old one, but I can't.

user_name@server_name:~/bbq_spatial$ python3 calculate_rmsd.py  ./bbq_input_pdb/pdb1akp.pdb  ./bbq_output_pdb/pdb1akp.pdb
error: Structures not same size
user_name@server_name:~/bbq_spatial$

Issue with Importing RMSD in Python

Hello.

Upon installing rmsd, I tried using the functions part of rmsd, as I am working on a script. However, I keep getting the following error message (even when uninstalling/reinstalling):

AttributeError: module 'rmsd' has no attribute 'kabsch_rmsd'

I'd like to know how I can work around this issue. Thanks!

reordering while preparing for output

Hi,
using the module I got geometries which, in my humble opinion, are too far from each other for a very small RMSD calculated. These molecules look similar if I take original geometries.
When the --reorder option is turned on, the q_coord list is reordered (lines 2021-2022). Aren't the lines (2046-2047)

        if q_review is not None:
            q_coord = q_coord[q_review]
            q_all_atoms = q_all_atoms[q_review]

changing their order again, when output is requested? When I commented out the mentioned part, the new geometries look more aligned with each other and my expectations.

cite

mirroring / inversion of relative alignments; increment version counter.

As tested today with a *.zip copy obtained Saturday (Nov. 3rd), the implementation of
an optional mirroring and inversion of the test structures nicely automates what
otherwise was performed manually with aRMSD.

It really merits to increase the version counter (still today [Nov. 5] at the level 1.2.7, which
was about the state of the art in the second half of September) -- despite your new achievements.

numpy needs to be installed first, separately, because of rmsd's setup.py structure

Currently the rmsd setup.py has install_requires=['numpy',]. In theory this should mean that you can do pip install rmsd on a system with no numpy installed, and pip will install both rmsd and numpy.

However currently this is wasted, and trying to do so will throw a ModuleNotFoundError: No module named 'numpy'. This is because rmsd's setup.py imports rmsd, which imports numpy, which hasn't been installed yet.

I think this could be fixed by changing import rmsd to from rmsd import __version__. I can make a pull request if you would prefer.

Otherwise it is currently hard to make rmsd a dependency because then installing those projects with pip will fail.

Thanks!

How to align both position and orientation?

Thanks for great library! It looks like just what I was looking for apart from one thing and I was wondering if someone could leave some comments.

I have two sets, P and Q. Elements of P correspond to elements of Q. Each element has a set of three coordinates in the 3D space and also orientation coordinates (which can be expressed using Euler angles, quaternions, rotational matrix, anything that is most convenient). How to align P and Q?

According to the Quaternion Algorithm paper [2] cited in the README, both position and orientation are getting aligned. Quoting from the abstract

This paper describes a new algorithm for estimating the position and orientation of objects (...)

However, when I look up the tests I see that even with both positions and orientations available, only the position coordinates are taken into account

rmsd/tests/test_quaternion.py

Line 24 in 55eebc3

result = rmsd.quaternion_rmsd(p_coord, q_coord)

I think I might have misunderstood something. Was the the orientation alignment purposely skipped (as it might not be relevant to the usual use-case). Could someone recommend how to solve my problem? Perhaps different algorithm would be more appropriate.

Many thanks in advance!

Issue with capitalization of elements >1 letter

I noticed this issue

atoms, coords = rmsd.get_coordinates(file_with_chlorine, "xyz")
print(atoms)

Gives, for example

['C' 'C' 'CL' 'C' 'C' 'H' 'H' 'H' 'H' 'H' 'H' 'H']

Even if the input file denotes chlorine as "Cl".

This means that

output = rmsd.set_coordinates(atoms, aligned_coordinates, title="wtf)

produces output with CL rather than Cl.

IMO it should either (1) preserve the input capitalization or (2) output the standard capitalization.

I will make a PR with (1) unless there are good reasons for (2), or if this is the intended behavior?

partially defunct structure export.

calculate_rmsd.py permits loading of structural models, their superposition, determination of the RMSD and optional export of the second structure -- then aligned to the first one in the *.xyz format. As I notice, this structure export is affected badly already in the master branch version 1.3.0 (which, to ascertain the situation, was freshly checked-out by mine today); yet equally the pull request by xg590. It is observed regardless if *.pdb, or *.xyz data provide the model data to scrutinize once the optional --p parameter is used while simultaneously requesting either --use-reflections, or --use-reflections-keep-stereo.

I became aware of this issue while cross-checking with *.xyz model data derived from crystallographic models about tartaric acid. Initially, I aimed to check xg590's pull request, the option --use-reflections may invert the sterochemical information, which xg590's extension --use-reflection-keep-stereo aims to rectify. The test data for this query are derived from crystallographic models by the Cambridge Crystallographic Database about tartaric acid intentionally simplified to contain nothing but one single molecule of the compound; hydrogens were removed intentionally, too. For this bug report, only two data sets derived from the entries TARTAC and TARTAC02 are considered, which is about the D-(-)-(2S,3S) isomer among the three possibilities.

After running the computation on the CLI, both the prisitine first model, as well as the second model now described as "aligned" were displayed simultaenously with Jmol. In the case refraining from any form of --use-reflection (no pun intended), the result is well acceptable:

The situation however is bad for --use-reflection as if the two molecules were solely brought in vicinity of each other only:

In the attached *.zip archives, I enclose the raw *.xyz files of the models, the *.xyz exported by calculate_rmsd.py along python calculate_rmsd.py --reorder -p [--use-reflection] modelA modelB, as well as *.png screen photo and *.wrl export of the screne (accessible e.g. by view3dscene in Debian) of two runs with the current master branch version 1.3.0 of the script.

It is noteworthy that the test on these models is not affected if the script is used with or without the optional parameter of --use-reflection or --use-reflections-keep-stereo (RMSD about 0.098) because they represent the same stereoisomer.

sameIsomer-noReflectionsAtAll-master.zip
sameIsomer--use-reflections-master.zip

Pip version has slight error

If installing directly from pip (i.e. pip install rmsd), running with --output does not generate the correct geometry. It appears the pip version is missing the step where p_all is translated by Qc (Q centroid coordinates):

pip version:

U = kabsch(P, Q)
p_all -= Pc
p_all = np.dot(p_all, U)
write_coordinates(p_atoms, p_all, title="{} translated".format(args.structure_a))
quit()

GitHub version:

U = kabsch(P, Q)
p_all -= Pc
p_all = np.dot(p_all, U)
p_all += Qc
write_coordinates(p_atoms, p_all, title="{} translated".format(args.structure_a))
quit()

Kabsch not zero for same file

When using

calculate_rmsd configuration1.xyz configuration1.xyz

the result should be 0, since the data sets are identical. However, in case of attached file, they return

Normal RMSD: 0.0
Kabsch RMSD: 6.16698752446e-16
Quater RMSD: 0.0

Close enough, but why?

Example: 2-bromo-cyclohexan-1-one conformation1.xyz

17
SCF done       0.00000000
C          0.21071       -0.08068       -1.01215
C         -0.22200        1.24756       -0.37911
C          0.16684        1.30751        1.08071
C         -0.22332        0.10072        1.90828
C          0.16529       -1.20784        1.23556
C         -0.32656       -1.26651       -0.21039
H          1.30419       -0.12448       -1.04911
H         -0.16157       -0.12916       -2.03942
H         -1.30963        1.34512       -0.43942
H          0.22983        2.09315       -0.90165
H          0.22862        0.18105        2.89713
H          1.25813       -1.28534        1.25583
H         -0.24087       -2.04734        1.80504
H         -1.42103       -1.25606       -0.22425
H          0.00821       -2.20042       -0.66980
O          0.74759        2.24101        1.56474
Br        -2.17038        0.22320        2.17266

Wrong result for Kabsch RMSD

Try this:

P = [[-1., 0., 0.], [0., 2., 0.], [0., 1., 0.], [0., 1., 1.]]
Q = [[0., -1., -1.], [0., -1., 0.], [0., 0., 0.], [-1., 0., 0.]]

kabsch_rmsd(P, Q)

returns:

1.232398

but the correct least RMSD is:

0.519309

Align N molecules to M other molecules

Code if this will be implemented in the future.

Where P is (N, natoms, 3) and Q is (M, natoms, 3).

C = np.dot(np.transpose(P, [0,2,1]), Q.T)
V, S, W = np.linalg.svd(np.transpose(C, [0,3,1,2]))
d = (np.linalg.det(V) * np.linalg.det(W)) < 0.0
S[d,-1] *= -1
V[d, :, -1] *= -1

U = np.dot(V[0,1], W[0,1])
U = np.matmul(V, W)

P = np.einsum('ijk,iabk->ijba', P, U)

delta = P - np.transpose(Q, [1,2,0])

rmsd = np.sqrt(np.sum(delta**2, axis=(1,2)))

Redundant matrix calculation in quaternion method

Since the coordinates have been move to the centroid before applying quaternion rotation, the matrices C2, C3 and A become redundant and the minimal eigenvalue of C1 minimize the RMSD.

def quaternion_rotate(X, Y):
"""
Calculate the rotation
"""
N = X.shape[0]
W = np.asarray([makeW(*Y[k]) for k in range(N)])
Q = np.asarray([makeQ(*X[k]) for k in range(N)])
Qt_dot_W = np.asarray([np.dot(Q[k].T, W[k]) for k in range(N)])
W_minus_Q = np.asarray([W[k] - Q[k] for k in range(N)])
A = -np.sum(Qt_dot_W, axis=0)
eigen = np.linalg.eigh(A)
r = eigen[1][:,eigen[0].argmin()]
rot = quaternion_transform(r)
return rot

Willing to add the rmsd value to title line?

Sometimes I am interested in knowing the rmsd value for the modified structure. Now I need to run calculate_rmsd.py twice (with and without -p).
With this simple change the rmsd value is printed in the title line of the xyz file.

line 2052 from:
xyz = set_coordinates(q_all_atoms, q_coord, title=f"{settings.structure_b} - modified")
to
xyz = set_coordinates(q_all_atoms, q_coord, title=f"{settings.structure_b} - modified. RMSD = {result_rmsd}")

Output rotation matrix

Is there a flag that would output not (only) the minimized rmsd and the rotated B structure, but the rotation matrix (and translation vector) that needs to be applied? This would be useful when the same transformation has to be applied to other data, like multipole moments, surrounding molecules, etc.

Edit: I'm talking about the command-line script calculate_rmsd.

Why these two pdbs can not calculate RMSD?

The first file contain hydrogen but when i use --no-hydrogen and --reorder ,it's still reported that structure not same size .They are both 169 residues.

--reorder-method qml --reorder-method none currently not available

This issue report evolves from an earlier discussion here.

With the test data provided in the .zip attached below, the RMSD of an alignment works fine e.g.,

$ calculate_rmsd --reorder --reorder-method brute 1.xyz 2.xyz 
0.0012379383034675595

However, this is not the case opting for qml:

$ calculate_rmsd --reorder --reorder-method qml 1.xyz 2.xyz 
Traceback (most recent call last):
  File "/usr/local/bin/calculate_rmsd", line 8, in <module>
    sys.exit(main())
  File "/home/USER/.local/lib/python3.9/site-packages/rmsd/calculate_rmsd.py", line 1965, in main
    q_review = reorder_method(p_atoms, q_atoms, p_coord, q_coord)
  File "/home/USER/.local/lib/python3.9/site-packages/rmsd/calculate_rmsd.py", line 829, in reorder_similarity
    p_vecs = qml.representations.generate_fchl_acsf(
AttributeError: module 'qml.representations' has no attribute 'generate_fchl_acsf'

despite qml is installed with pip for Python 3 (version 0.2.1, Anders S. Christensen (2016)), version information after import qml in Python and a subsequent help(qml)). It's PyPi page states version 0.4.0.27 as the one currently provided, suggesting a discrepancy.

Second, contrasting to the documentation displayed by calculate_rmsd --help, the level none does not work:

calculate_rmsd --reorder --reorder-method none 1.xyz 2.xyz 
Traceback (most recent call last):
  File "/usr/local/bin/calculate_rmsd", line 8, in <module>
    sys.exit(main())
  File "/home/USER/.local/lib/python3.9/site-packages/rmsd/calculate_rmsd.py", line 1965, in main
    q_review = reorder_method(p_atoms, q_atoms, p_coord, q_coord)
UnboundLocalError: local variable 'reorder_method' referenced before assignment

The observations refer to an installation in Linux Debian 12/bookworm (branch testing), Python 3.9.10, and rmsd 1.4 installed via pip for Python 3.

2022-03-04_data.zip

How to get full transformation in a script?

I'd like to apply a full alignment, with possible reflection and reordering of atoms, and I need to re-use the resulting transformation matrix(ces) to transform other data (e.g. forces). With -e -ur I can do the alignment in command line, but how can I do the same inside a script (where the coordinates are not necessarily written in a file) and get the transformations?

Incorrect values for proteins

Hi there, so I'm not sure where exactly to put this - and it might be my fault - but I am having problems with some protein comparisons.

I have taken the protein 1L14 from PDB, taken the sequence and ran it through AlphaFold to see how it folds. I've also ran the 1L14 through Rosetta scoring function just to have it attach the missing hydrogens so they can be compared with the AlphaFold results. AlphaFold gave some nice results, which I've compared through PyMol and that gives me RMSD of 0.393. I'm submitting all of them here (original 1L14 ; 1L14 with hydrogens as given by Rosetta; and the best AlphaFold result) as TXT files, since PDBs cannot be uploaded.

1L14.txt
1L14-withH.txt
ranked_0.txt

However, when I use calculate_rmsd --reorder 1L14-withH.pdb ranked_0.pdb to get the result, I am getting RMSD of 10.84, and using --no-hydrogen does not make it much better (10.82). This value is about 30 times higher than what PyMol gives me.

My question is how, why, and is it my mistake? Or have I missed something? The structure has 164 residues and around 2400 atoms, so it cannot be that I forgot to divide by something. Test examples work fine (ethane/translated ethane gives me something very close to zero). Thanks for any help.

saving rotated coordinate?

Hi,
I came to know about this tool from openbabel github page.
I'm interested in translating and then rotating a pdb structure such that a specified atom (X) is at the center and X-Y bond is aligned in the -z direction.

Thus I"m just curious is it possible to save rotated structures in addition to printing rmsd with this "rmsd" program.
Looking forward to valuable suggestions.
thank you and best regards,
Vaibhav

be aware of reflection operation

reflection operation would not be aligned after centered or kabsch rotation.

pdb coordinate reader: error: Parsing coordinates for the following line

if x_column == None: try: # look for x column for i, x in enumerate(tokens): if "." in x and "." in tokens[i + 1] and "." in tokens[i + 2]: x_column = i break except IndexError: exit("error: Parsing coordinates for the following line: \n{0:s}".format(line))
If the pdb line is like 'ATOM 383 C6 C B 122 -2.217 -2.542-103.749' (the value of x and that of z are connected), the code will exit, and the coordinates cannot be obtained.

error: cannot reorder atoms and print result with a view

Hello,

The atoms of my structures A and B are not in the same order so I am trying to use --reorder to align the atoms. I also need to print the rotated structure in an output file for further use.
However when I use --reorder and --print at the same time (python calculate_rmsd.py --reorder --no-hydrogen --print A.xyz B.xyz > C.xyz), I have the following error message:
error: cannot reorder atoms and print result with a view.

Could you please help me with this?

Many Thanks

feature suggest: output of *.xyz .AND. RMSD

I would like to suggest, as additional option, output of the aligned
structure B, in *.xyz format, and RMSD in one step.
For the scrutiny of larger clusters taking longer time to scrutinize,
and with the issue of single-core computation pending, this may
cut the computational time to spend by half.

My understanding of the argparse-defining section (lines #795
following) is too little to achive this by addition of line like
parser.add_argument('P', action='store_true', help='print out structure B, after refined fit in XYZ format and provide RMSD'
(lead by upper case P), tentatively placed on line #1040 and
followed by an (edit) copy-paste of section line #999 till and
including line #1039.

What worked for me as short-cut is the copy-paste of line #1036 just
after line #1023 to obtain a permanent record of RMSD with
the corresponding alignment which however works only when either
--use-reflections -p
or
--use-reflections-keep-stereo -p
is used. The numeric value of RMSD however is the same as when
calling calculate_rmsd.py without --use-reflections or --use-reflections-keep-stereo
without the optional -p parameter. Perhaps I missed an entry
corresponding to filename of either fileA, and fileB easing the
generation of a permanent record of aligned molecule B if the
script is called within a loop, but may harvest both types of
information from the CLI piped output /via/ grep.

The attached *.txt is the slightly modfified script, rather as source of
inspiration than as commit.
20190326-modified-calculate_rmsd.py.txt

How to calculate rotation matrix using rmsd?

How to calculate the rotation matrix between two set paired sets of points using rmsd?

Pre-specify residuals?

I'm looking for a method that lets me specify a list of distances to minimize the RMSD against. i.e. Given two point clouds each with size N, and a 1-1 correspondence, as opposed to having the distances be as close to zero as possible between each pair, have it instead be as close to some distance k_i (varying for each pair).

Citations and license are outputted to terminal with the help message

There is no reason to print these messages when the user ask for the help message

license:
  https://github.com/charnley/rmsd/blob/master/LICENSE

citation:
  Kabsch algorithm:
    Kabsch W., 1976, A solution for the best rotation to relate two sets of
    vectors, Acta Crystallographica, A32:922-923, doi:10.1107/S0567739476001873

  Quaternion algorithm:
    Michael W. Walker and Lejun Shao and Richard A. Volz, 1991, Estimating 3-D
    location parameters using dual number quaternions, CVGIP: Image
    Understanding, 54:358-367, doi: 10.1016/1049-9660(91)90036-o

  Implementation:
    Calculate RMSD for two XYZ structures, GitHub,
    http://github.com/charnley/rmsd

This is fine in the github readme, but when using it in terminal we only need the Usage information

Reordering two differently sized but similar molecules

I'm trying to reorder two differently sized molecules so that they share as much of the atom numbering in common. I seen you've got this feature for molecules of the same size which is brilliant, but how could I go about comparing two differently sized molecules? Any help would be very appreciated!

Cheers,

Jon

ValueError: operands could not be broadcast together with shapes (3) (0)

while rmsd'ing these two xyz files getting the following error:

fedor@slater:~/tmp$ python calculate_rmsd.py B2H6_.xyz B2H6.xyz
Normal RMSD: 0.0205857986547
Traceback (most recent call last):
File "calculate_rmsd.py", line 168, in
Qc = centroid(Q)
File "calculate_rmsd.py", line 92, in centroid
C = sum(X)/len(X)
ValueError: operands could not be broadcast together with shapes (3) (0)

The two files are:
(B2H6.xyz):

B2H6
optimized geometry:
B1 0.000000 0.000000 0.897697
B2 0.000000 0.000000 -0.897697
H3 -0.964674 0.000000 0.000000
H4 0.964674 0.000000 0.000000
H5 0.000000 1.031433 1.471441
H6 0.000000 -1.031433 1.471441
H7 0.000000 1.031433 -1.471441
H8 0.000000 -1.031433 -1.471441

(B2H6_.xyz):

8
B2H6_MPW1K-LACVP#+.01
B 0.00000 0.00000 0.87380
B 0.00000 0.00000 -0.87380
H -0.97570 0.00000 0.00000
H 0.97570 0.00000 0.00000
H 0.00000 1.03820 1.45010
H 0.00000 -1.03820 1.45010
H 0.00000 1.03820 -1.45010
H 0.00000 -1.03820 -1.45010

.pdb coordinate reader can be improved

The get_coordinates_pdb function in calculate_rmsd.py would benefit from the inclusion of 'HETATM' when reading in coordinates and atoms.

rmsd/rmsd/calculate_rmsd.py

Line 688 in cd8af49

if line.startswith("ATOM"):

I was able to change line 688 to:
if line.startswith("ATOM") or line.startswith("HETATM"):

This allowed the program to also read in .pdb's formatted with HETATM

Function docstrings are a bit confusing about whether centroids are subtracted or not

This is a minor nitpick, but in particular the kabsch docstring is confusing (IMHO) in this respect:

rmsd/rmsd/calculate_rmsd.py

Lines 68 to 94 in 94b100c

 def kabsch(P, Q): 

 """ 

  The optimal rotation matrix U is calculated and then used to rotate matrix 

  P unto matrix Q so the minimum root-mean-square deviation (RMSD) can be 

  calculated. 

  Using the Kabsch algorithm with two sets of paired point P and Q, centered 

  around the centroid. Each vector set is represented as an NxD 

  matrix, where D is the the dimension of the space. 

  The algorithm works in three steps: 

  - a translation of P and Q 

  - the computation of a covariance matrix C 

  - computation of the optimal rotation matrix U 

  http://en.wikipedia.org/wiki/Kabsch_algorithm 

  Parameters 

  ---------- 

  P : array 

  (N,D) matrix, where N is points and D is dimension. 

  Q : array 

  (N,D) matrix, where N is points and D is dimension. 

  Returns 

  ------- 

  U : matrix 

  Rotation matrix (D,D) 

  Example 

  ----- 

  TODO 

  """

This mentions "a translation of P and Q", but this is not done in the function. So IMO the explanation of the Kabsch algorithm should be put somewhere else or differentiate between what is done in this function and what has to be done by its user, and it should be made clear that these functions (also e.g. quaternion_rotate) compute matrices for rotation around the origin, not around the centroids - which is great by the way, because it allows more general use of this library for finding optimal rotations around points other than the centroid.

I'll send a PR if you agree.

multiprocessing with calculate_rmsd.py

Interested to deploy calculate_rmsd.py in a series of comparisons with
"true clusters" (cf. vide infra) my intent was to deploy calculate_rmsd.py
with a moderator script. For smaller tests (one molecule A ./. one molecule
B), using a linear approach was fast enough. Now facing larger clusters,
the thought occured to me that the mere testing then could be performed
in parallel, on multiple CPU cores. My call of module multiprocessing,
however, yields a situation where each CPU seems to see the list of iterable
tests to perform as its own, to the extent that each core engaged performs
the task -- even if an other core determined RMSD already.

Question: Does an other user of calculate_rmsd.py has experience to
parallelize the script? Is there a keyword I did not spot yet to organize the
computation within this loop?

Disclaimer : because it is the first time that I search for a more parallelized
approach, possibly other approaches than the multiprocessing module may
work. The intent is to run scrutinies with files like the attached ones for series
of 50...150 test clusters each, with about a dozen of molecules per cluster,
edited such that the scipt may work with them. Rapidly, this will increase the
computational cost (i.e., n * (n-1)/2 tests per n clusters). Suggestions may
be helpful to related issues such as deposit here, too.

simplifiedModerator.py.txt
modelData.zip

reading .xyz files doesn't work when they contain scientific notation

for example, this file,

C -2.293946 -3e-06 -0.156296
C -1.072129 0.756112 -0.6106
C -1.072122 -0.756104 -0.610611
C -0.06715 -1.123802 0.475015
C 1.358616 -1.188877 -0.086502
O 1.677474 3e-06 -0.799713
C 1.358611 1.18888 -0.086499
C -0.067151 1.123797 0.475026
O -0.11043 -8e-06 1.371348
H -2.543398 -1.2e-05 0.89919
H -3.143761 -8e-06 -0.829798
H -1.086381 1.326009 -1.5315
H -1.086375 -1.325977 -1.531525
H -0.309135 -2.019187 1.056162
H 2.055688 -1.340367 0.752686
H 1.481047 -2.012895 -0.797179
H 2.055691 1.340376 0.752679
H 1.481026 2.012901 -0.797177
H -0.309129 2.019175 1.056186

gives the error message,
Reading the .xyz file failed in line 2. Please check the format.

Can't install with pip

I started a new python2 virtualenv with only numpy installed.

I cloned this repository and changed to the pip branch.
Trying to install I get the following error

(test_rmsd_2) primdal at Kaspers-MBP in ~/dev/rmsd at [13:58] (pip)
$ pip install .
Processing /Users/primdal/dev/rmsd
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/6l/r_g4km7552zd0s6jyvlyn6100000gn/T/pip-zDQPdV-build/setup.py", line 11, in <module>
        from rmsd.calculate_rmsd import __version__
      File "rmsd/__init__.py", line 2, in <module>
        from rmsd.calculate_rmsd import *
      File "rmsd/calculate_rmsd.py", line 17, in <module>
        from builtins import range
    ImportError: No module named builtins

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/6l/r_g4km7552zd0s6jyvlyn6100000gn/T/pip-zDQPdV-build/

builtins seems to be a python3-only package

Doing the same with a python3 virtualenv is fine. No errors encountered

Option to calculate RMSD without hydrogen atoms

Need option to calculate RMSD for "heavy" atoms only, since in most cases the Hydrogens are too flexible and does not matter in such comparisons.

	def kabsch(P, Q):
	"""
	The optimal rotation matrix U is calculated and then used to rotate matrix
	P unto matrix Q so the minimum root-mean-square deviation (RMSD) can be
	calculated.
	Using the Kabsch algorithm with two sets of paired point P and Q, centered
	around the centroid. Each vector set is represented as an NxD
	matrix, where D is the the dimension of the space.
	The algorithm works in three steps:
	- a translation of P and Q
	- the computation of a covariance matrix C
	- computation of the optimal rotation matrix U
	http://en.wikipedia.org/wiki/Kabsch_algorithm
	Parameters
	----------
	P : array
	(N,D) matrix, where N is points and D is dimension.
	Q : array
	(N,D) matrix, where N is points and D is dimension.
	Returns
	-------
	U : matrix
	Rotation matrix (D,D)
	Example
	-----
	TODO
	"""

charnley / rmsd Goto Github PK

rmsd's Introduction

Calculate Root-mean-square deviation (RMSD) of Two Molecules Using Rotation

Motivation

Citation

Installation

Usage examples

Problems?

Contributions

rmsd's People

Contributors

Stargazers

Watchers

Forkers

rmsd's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs