GithubHelp home page GithubHelp logo

Comments (8)

nbehrnd avatar nbehrnd commented on May 18, 2024

from rmsd.

charnley avatar charnley commented on May 18, 2024

Hi @tccyl ,

Where is the PDB file from? From rcsb.org?

I am not a heavy .pdb fileformat user. I've read the http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM fileformat documentation and it seems PDB is column width based and not space.split as currently implemented.

You are very welcome to make a pull request solving this formatting, including a .pdb file where rmsd fails.

from rmsd.

nbehrnd avatar nbehrnd commented on May 18, 2024

Dear Jimmy,

calculate_rmsd.py is sometimes deployed by mine for data derived from single crystal
diffraction; natively deposit as .cif, a reading format equally recognized by openbabel.
Here, I'm able to second @tccyl as well as the documentation in and around the script
that not all .pdb are equally well suited to enter successfully the Kabsch test and
tentatively attribute different formatting as well as their content contributing to some
of the issues.

Converting .cif to .pdb with openbabel yields files generally unsuitable for
calculate_rmsd.py. Which is why I typically either

  • convert them further to .xyz with openbabel, then passing successfully; or

  • deploy Olex2 to write either .pdb, or .xyz. Both
    types interact well with calculate_rmsd.py. Because it was not perceived as an obstacle,
    I didn't spend additional time on this issue.

Possibly some of the documentation attached may illustrate the experience.

2019-Jun-07_calculate_rmsd_pdb_corrected.zip

from rmsd.

tccyl avatar tccyl commented on May 18, 2024

Probably the issue is the missing space between y- and z-component of the coordinate. Instead of a manual correction, such an omission may be corrected on the CLI with openbabel (openbabel.org) in a pattern of babel -ipdb notworking.pdb -opdb now_working.pdb where -ipdb defines the input format as .pdb, and similar, -opdb specifies the output format as .pdb. Depending on the version of calculate_rmsd.py, the .pdb generated by openbabel might not work well. In this case, in case you do not need crystallographic information like space group symmetry, you may better work with the least complex file type instead, .xyz. If so a call from the terminal in pattern of obabel .pdb -oxyz -m will convert in a batch all .pdb in your directory into .xyz files. Give this a try, and if not working post again. Norwid

On Wed, 05 Jun 2019 08:56:32 -0700 tccyl @.
> wrote: if x_column == None: try: # look for x column for i, x in enumerate(tokens): if "." in x and "." in tokens[i + 1] and "." in tokens[i + 2]: x_column = i break except IndexError: exit("error: Parsing coordinates for the following line: \n{0:s}".format(line)) If the pdb line is like 'ATOM 383 C6 C B 122 -2.217 -2.542-103.749' (the value of x and that of z are connected), the code will exit, and the coordinates cannot be obtained.

Hi, nbehrnd,
Thanks for your nice suggestion. However, using openbabel to convert the pdb format is still not able to solve this issue.
Because as @charnley said, pdb format is column width based, not space.split as currently implemented. But neither pdb format from rcsb nor that from openbabel, the column of x, y, z coordinates are the same and they follow the format:
try:
x = line[30:38]
y = line[38:46]
z = line[46:54]
V.append(np.asarray([x, y ,z], dtype=float))
May be the way to obtain the x, y, z coordinates can directly use the above codes and not by looking for x_column.

from rmsd.

tccyl avatar tccyl commented on May 18, 2024

Hi @tccyl ,

Where is the PDB file from? From rcsb.org?

I am not a heavy .pdb fileformat user. I've read the http://www.wwpdb.org/documentation/file-format-content/format33/sect9.html#ATOM fileformat documentation and it seems PDB is column width based and not space.split as currently implemented.

You are very welcome to make a pull request solving this formatting, including a .pdb file where rmsd fails.
Yes, it is from rcsb.org.

from rmsd.

nbehrnd avatar nbehrnd commented on May 18, 2024

Hi @tccyl ,

it seems my reply by email earlier didn't pass through. Anyway, meanwhile, there was
some work on the script, aiming to enable .pdb written by the popular openbabel to
pass the Kabsch test because the current version 1.3.2 (released in January 2019) does
not work successfully with .pdb by openbabel.

For a small test molecules (benzamide) consisting of C, H, N, and O, the addition of some
keywords to the instructions in the script now allows to work with such files successfully. It
is deposit here and equally deposit as
pull request #58 -- including additional test data
(.pdb newly written by openbabel) known to work, too. Still labeled as version 1.3.2 (Jan
2019), awaiting an action by Jimmy.

Meanwhile, give it a try; perhaps your (test) data reveal additional keywords should be
added, too. Be welcome to deposit your two files in question here -- perhaps there are
additional keywords to consider which should be included. You need to know that there are
multiple 'dialects' of .pdb files around, which contributes to the issues here (which is why
.xyz represent a resort, at some expense, of course).

from rmsd.

tccyl avatar tccyl commented on May 18, 2024

@nbehrnd Thank you so much~
The two example files where rmsd failed are below:
two_fragment_files.zip

from rmsd.

nbehrnd avatar nbehrnd commented on May 18, 2024

Hi @tccyl

in short, after passing the .pdb to openbabel, the RMSD calculate_rmsd.py determines
for either variant of the Kabsch test equals to about 0.7983. Below both the script's copy
used, as well as documenting (two .zip).

The detailed story:
An initial inspection of the files in an editor revealed that both describe the same number
of atoms per atom type. The subsequent check in avogadro revealed that the mutual distance
of these atoms are beyond the van der Waals radii and in this sight not adjacent to each other.
In their original form, the two files are not suitable for a Kabsch test with either the current
version of calculate_rmsd.py (1.3.2 by January 2019), nor my changes from last week.

I passed your .pdb to openbabel (version 2.4.1 by November 2018) to be rewritten:

babel -ipdb 4L81_10_CPCN.pdb -opdb 4L81_10_CPCN_babel.pdb
babel -ipdb 4L81_11_CPCN.pdb -opdb 4L81_11_CPCN_babel.pdb 

In both instances, openbabel indicated difficulties working with the orginal data. This suggests
the export from the original source file should be revised; which obviously is not the topic of this
thread. One of the error logs is included as error.log.

However atom label, (x,y,z) and atom type seem to pass into the newly written .pdb, which
indeed includes retention of missing a space between the y- and z-component of the
coordinates. Maybe characteristic for working with protein data, instead of small molecule data.

diffView

The openbabel-written .pdb then passed smoothly either of the three variants of the Kabsch
test with engaged --reorder option (default / classical Kabsch test, --use-reflections,
--use-reflections-keep-stereo) with the same numerical RMSD of about 0.7983. As a
comparison, the .pdb were converted with babel into .xyz; again, the Kabsch tests state a
RMSD of about 0.7983.

With the .xyz in hand, it may be interesting to inspect the 'best alignment' of the the two
selected sets of atoms. Using 4L81_10_CPCN_babel.xyz as fixed model_A, and 4L81_11_CPCN_babel.xyz as model_B to be aligned in respect to model_A, the new
coordinates of model_B were harvested by

python3 calculate_rmsd.py --reorder -p 4L81_10_CPCN_babel.xyz 4L81_11_CPCN_babel.xyz > new_alignment_11.xyz

Both model_A as well as the update of model_B (new_alignment_11.xyz) were read by jmol. Their corresponding selection of atoms were connected manually ('connect strut' instruction after selection
of the atoms in question) with struts dyed either red (model_A) or blue (new_alignment_11 / updated model_B), labelled (model_A red, model_B blue) in an otherwise cpk-color scheme. They were
exported as static .png and interactive .wrl (e.g., view3dscene) to walk around the superposition.

alignment

The labeling in jmol's display of the superposition is worth a word:
Except the two opposite termini, the atoms were labeled in a pattern of C1/1.1 #1, where C1
stands for the first carbon atom in (1.1) the first model of the first file read. By same way, 2.1
is about the first model in the second file read by jmol. #1 refers then to the first atom in this
model read, a counting independent of the atom type or atom label met in the file read.

rmsd-babel_issue.zip
reporting.zip

from rmsd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.