GithubHelp home page GithubHelp logo

sensaas / sensaas Goto Github PK

View Code? Open in Web Editor NEW
17.0 1.0 6.0 17.64 MB

Shape-based alignment of molecules using 3D point-based representation

License: BSD 3-Clause "New" or "Revised" License

Python 58.34% C 41.66%
cheminformatics bioinformatics computational-chemistry drug-design chemicals molecular-modeling molecules

sensaas's Introduction

SENSAAS

badgepython forthebadge

SENSAAS is a shape-based alignment software which allows to superimpose molecules. It is based on the publication SenSaaS: Shape-based Alignment by Registration of Colored Point-based Surfaces

example

Documentation: Full documentation is available at https://github.com/SENSAAS/sensaas/blob/main/docs/index.rst

Website: A web demo to use SENSAAS or SENSAAS-Flex is available at https://chemoinfo.ipmc.cnrs.fr/SENSAAS/index.html

Tutorial: These videos on YouTube provide tutorials

Requirements

SENSAAS relies on the open-source library Open3D. The current release of SENSAAS uses Open3D version 0.12.0 along with Python3.7.

Visit the following URL for using Python packages distributed via PyPI: http://www.open3d.org/docs/release/getting_started.html or conda: https://anaconda.org/open3d-admin/open3d/files. For example, for windows-64, you can download win-64/open3d-0.12.0-py37_0.tar.bz2

Virtual environment for python with conda (for Windows for example)

Install conda or Miniconda from https://conda.io/miniconda.html
Launch Anaconda Prompt, then complete the installation:

conda update conda
conda create -n sensaas
conda activate sensaas
conda install python=3.7 numpy

Once Open3D downloaded:

conda install open3d-0.12.0-py37_0.tar.bz2

(Optional) Additional packages for visualization with PyMOL:

conda install -c schrodinger -c conda-forge pymol-bundle

Retrieve and unzip SENSAAS repository in your desired folder. See below for running the program sensaas.py or meta-sensaas.py. The directory containing executables is called sensaas-main.

Linux

Install:

  1. Python3.7 and numpy
  2. Open3D version 0.12.0 (more information at http://www.open3d.org/docs/release/getting_started.html)

(Optional) Install additional packages for visualization with PyMOL:

  1. PyMOL (a molecular viewer; more information at https://pymolwiki.org)

Retrieve and unzip SENSAAS repository. The directory containing executables is called sensaas-main.

MacOS

Not tested

Information on the third-party program NSC

NSC is used to efficiently generate point clouds of molecules and to calculate their surfaces. It is written in C and was developed by Frank Eisenhaber who kindly licensed its use in SENSAAS. Please be advised that the use of NSC is strictly tied to SENSAAS and its code is released under the following license. If the NSC license is an issue for your application or if you wish to use NSC independently of SENSAAS, please contact the author Frank Eisenhaber (email: [email protected]) who will amicably manage your request.

References :

  1. F. Eisenhaber, P. Lijnzaad, P. Argos, M. Scharf, The Double Cubic Lattice Method: Efficient Approaches to Numerical Integration of Surface Area and Volume and to Dot Surface Contouring of Molecular Assemblies, Journal of Computational Chemistry, 1995, 16, N3, pp.273-284.
  2. F. Eisenhaber, P. Argos, Improved Strategy in Analytic Surface Calculation for Molecular Systems: Handling of Singularities and Computational Efficiency, Journal of Computational Chemistry, 1993,14, N11, pp.1272-1280.

Executables nsc (for Linux) or ncs-win (for windows) are included in this repository.

In case they do not work on your system, you may have to compile it using the source file nsc-300.c in directory src/. Instructions for compilation:

  1. for Windows: The current executable nsc-win.exe was compiled by using http://www.codeblocks.org. Rename the executable as nsc-win.exe because 'nsc-win.exe' is used to set the variable nscexe in the Python script sensaas.py

  2. for Linux:

     cc src/nsc-300.c -lm
    

    rename a.out as nsc because 'nsc' is used to set the variable nscexe in the Python script sensaas.py:

     cp a.out nsc
    

Run Sensaas

To align a Source molecule on a Target molecule, the syntax is:

sensaas.py sdf molecule-target.sdf sdf molecule-source.sdf slog.txt optim

Example:

sensaas.py sdf examples/IMATINIB.sdf sdf examples/IMATINIB_mv.sdf slog.txt optim

You may have to run the script as follows:

python sensaas.py sdf examples/IMATINIB.sdf sdf examples/IMATINIB_mv.sdf slog.txt optim

Don't worry if you get the following warning from Open3D: "Open3D WARNING KDTreeFlann::SetRawData Failed due to no data.". It is observed with conda on windows.

Here, the source file IMATINIB_mv.sdf is aligned (moved) on the target file IMATINIB.sdf (that does not move). The output tran.txt contains the transformation matrix allowing the alignment of the source file (result in Source_tran.sdf). The slog.txt file details results with final scores of the aligned molecule (Source) on the last line. In the current example, the last line must look like:

gfit= 1.000 cfit= 0.999 hfit= 0.996 gfit+hfit= 1.996

There are three different fitness scores but we only use 2 of them, gfit and hfit, to calculate gfit+hfit. More about Fitness scores

  • gfit score estimates the geometric matching of point-based surfaces - it ranges between 0 and 1

  • hfit score estimates the matching of colored points representing pharmacophore features - it ranges between 0 and 1

Thus, we calculate a hybrid score = gfit + hfit scores - gfit+hfit ranges between 0 and 2

  • A gfit+hfit score close to 2.0 means a perfect superimposition.

  • A gfit+hfit score > 1.0 means that similaries were identified.

Run meta-sensaas.py

1. Virtual Screening

This script is suited for performing virtual screenings of sdf files containing several molecules (database mode). For example, if you want to process a sdf file containing several conformers for Target and/or Source. A similarity matrix is provided along with a sdf file that contains all aligned Sources. The syntax is:

meta-sensaas.py molecules-target.sdf molecules-source.sdf

Example

The following example works with 2 files from the directory examples/

meta-sensaas.py examples/IMATINIB.sdf examples/IMATINIB_parts.sdf

You may have to run the script as follows:

python meta-sensaas.py examples/IMATINIB.sdf examples/IMATINIB_parts.sdf

Here, the source file IMATINIB_parts.sdf contains 3 substructures that are aligned (moved) on the target file IMATINIB.sdf (that does not move). Outputs are:

  • the file bestsensaas.sdf that contains the best ranked aligned Source
  • the file catsensaas.sdf that contains all aligned Sources
  • the file matrix-sensaas.txt that contains gfit+hfit scores (rows=Targets and columns=Sources)

Post-processing

Then, to ease the analysis of the results, the script utils/ordered-catsensaas.py can be used to generate files in descending order of score.

utils/ordered-catsensaas.py matrix-sensaas.txt catsensaas.sdf

You may have to run the script as follows:

python utils/ordered-catsensaas.py matrix-sensaas.txt catsensaas.sdf

or if you want to only retrieve solutions having a gfit+hfit score above a defined cutoff:

python utils/ordered-catsensaas-cutoff.py matrix-sensaas.txt catsensaas.sdf 1.1
  • the file ordered-catsensaas.sdf contains all aligned Sources in descending order of score
  • the file ordered-scores.txt contains gfit+hfit scores in descending order

Option -s

When executing meta-sensaas.py, you can also select the score type by using the option -s (default is the score of the Source (-s source)):

meta-sensaas.py molecules-target.sdf molecules-source.sdf -s mean

here the mean of the score of the target and of the aligned source will be used to rank solutions and to fill matrix-sensaas.txt. The option '-s mean' is interesting to favor source molecules that have the same size of the Target. More about Options

2. Finding alternative alignments and Clustering

This option allows to repeat in order to find alternative alignments when they exist as for example when aligning a fragment on a large molecule. The syntax is:

meta-sensaas.py target.sdf source.sdf -r 10

here 10 alignments of the Source will be generated and clustered. Outputs are:

  • the file sensaas-1.sdf with the best ranked alignment - it contains 2 molecules: first is Target and second the aligned Source
  • the file sensaas-2.sdf (if exists) with the second best ranked alignment - it contains 2 molecules: first is Target and second the aligned Source
  • ...
  • file cat-repeats.sdf that contains all aligned Sources

Example

The following example works with 2 files from the directory examples/

meta-sensaas.py examples/VALSARTAN.sdf examples/tetrazole.sdf -r 100

You may have to run the script as follows:

python meta-sensaas.py examples/VALSARTAN.sdf examples/tetrazole.sdf -r 100

As described in the publication, outputs are:

  • sensaas-1.sdf contains the self-matching superimposition
  • sensaas-2.sdf contains the bioisosteric superimposition
  • sensaas-3.sdf contains the geometric-only superimposition

Visualization

You can use any molecular viewer. For instance, you can use PyMOL if installed (see optional packages) to load the Target and the aligned Source(s):

after aligning IMATINIB_mv.sdf on IMATINIB.sdf using sensaas.py:

pymol examples/IMATINIB.sdf Source_tran.sdf 

or after executing meta-sensaas.py with several molecules:

pymol examples/IMATINIB.sdf bestsensaas.sdf catsensaas.sdf

or after the post-processing:

pymol examples/IMATINIB.sdf ordered-catsensaas.sdf

or after executing meta-sensaas.py with the repeat option (State 1 is Target and State 2 is the aligned Source):

pymol examples/VALSARTAN.sdf sensaas-1.sdf

Licenses

  1. SENSAAS code is released under the 3-Clause BSD License

  2. NSC code is released under the following license

Copyright

Copyright (c) 2018-2021, CNRS, Inserm, Université Côte d'Azur, Dominique Douguet and Frédéric Payan, All rights reserved.

Reference

Douguet D. and Payan F., SenSaaS: Shape-based Alignment by Registration of Colored Point-based Surfaces, Molecular Informatics, 2020, 8, 2000081. doi: 10.1002/minf.202000081

Bibtex format :

@article{10.1002/minf.202000081,
author 		= {Douguet, Dominique and Payan, Frédéric},
title 		= {sensaas: Shape-based Alignment by Registration of Colored Point-based Surfaces},
journal 	= {Molecular Informatics},
volume 		= {39},
number 		= {8},
pages 		= {2000081},
keywords 	= {Shape-based alignment, molecular surfaces, point clouds, registration, molecular similarity},
doi 		= {https://doi.org/10.1002/minf.202000081},
url 		= {https://onlinelibrary.wiley.com/doi/abs/10.1002/minf.202000081},
eprint 		= {https://onlinelibrary.wiley.com/doi/pdf/10.1002/minf.202000081},
year 		= {2020}
}

sensaas's People

Contributors

douguet avatar fpayani3s avatar lucasgrandmougin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

sensaas's Issues

IndexError: list index out of range

Hi
I runned sensass on my macbook
but I got a IndexError:

Traceback (most recent call last):
File "sensaas.py", line 59, in
output=sys.argv[5]
IndexError: list index out of range

could u please tell me how to solve this?

xyzrgb2dotspdb produces points on a plane

Using xyzrgb2dotspdb.py

python utils/xyzrgb2dotspdb.py examples/VALSARTAN.xyzrgb

to convert a xyzrgb file into a pdb file for visualization with PyMOL produces points on a plane:
bug

FileNotFoundError

(sensaas) xxxMacBook-Pro:sensaas xxx$ python sensaas.py sdf biggerpocket.sdf sdf newgen.sdf slog.txt optim
sh: nsc: command not found
Traceback (most recent call last):
File "sensaas.py", line 108, in
pcdxyz,pcdrgb,pcd1xyz,pcd1rgb,pcd2xyz,pcd2rgb,pcd3xyz,pcd3rgb,pcd4xyz,pcd4rgb = sdfsurface(target,nscexe)
File "/Users/xiaoshandian/Desktop/sensaas/SDFtoDots.py", line 139, in sdfsurface
psaOut=open('psa.out', 'r')
FileNotFoundError: [Errno 2] No such file or directory: 'psa.out'

Hi :) how can I solve this?

meta-sensaas.py fails with FileNotFoundError

When running meta-sensaas.py from the main directory as described in the documentation:

python meta-sensaas.py examples/IMATINIB.sdf examples/IMATINIB_parts.sdf

I get the following error:

sh: sensaas.py: command not found
Traceback (most recent call last):
  File "meta-sensaas.py", line 526, in <module>
    logfile=open('slog', 'r')
FileNotFoundError: [Errno 2] No such file or directory: 'slog'

I think this occurs because sensaas.py is not necessarily executable or in PATH and therefore it should be invoked as python sensaas.py?

Meta-sensaas.py Permission denied Issue

Installed sensaas package in Ubuntu WSL with all the dependent packages.

When I run meta-sensaas.py using the following command from a folder called test:

python ../meta-sensaas.py ../examples/IMATINIB.sdf ../examples/IMATINIB_parts.sdf

I get the following error message

sh: 1: ../sensaas.py: Permission denied
Traceback (most recent call last):
File "../meta-sensaas.py", line 536, in
logfile=open('slog', 'r')
FileNotFoundError: [Errno 2] No such file or directory: 'slog'

What could be the problem? How to fix it?

Add scores to output file

It would be useful to add the scores (especially gfit + hfit) to the output file, where supported. For example, the score can be added as a data items in SDF files, which would make some post-processing tasks a lot easier (instead of having to read both the molecular file Source_tran.sdf and extract the corresponding score from slog.txt).

Redundant estimation of normals?

I think there might be a redundant estimation of normals. Normals are estimated when preparing the dataset

pcd_down.estimate_normals(o3d.geometry.KDTreeSearchParamHybrid(radius = radius_normal, max_nn = 30))

but they are also (redundantly?) computed in the colored_point_cloud() function

sensaas/GCICP.py

Lines 63 to 64 in 08c97b8

source_down.estimate_normals(o3d.geometry.KDTreeSearchParamHybrid(radius = voxel_size * 2, max_nn = 30))
target_down.estimate_normals(o3d.geometry.KDTreeSearchParamHybrid(radius = voxel_size * 2, max_nn = 30))

to which the prepared point clouds are passed (in gcicp_registration()). Is this evaluation redundant or am I missing something?

meta-sensaas.py fails when sensaas.py fails

sensaas.py is called in meta-seensaas.py using os.system() and a failure in sensaas.py leads to early termination of the parent script meta-seensaas.py. This happens for example when performing virtual screening and no correspondence is found between the source and target point clouds:

RuntimeError: [Open3D Error] (virtual Eigen::Matrix4d open3d::pipelines::registration::TransformationEstimationForColoredICP::ComputeTransformation(const open3d::geometry::PointCloud&, const open3d::geometry::PointCloud&, const CorrespondenceSet&) const) /home/runner/work/Open3D/Open3D/cpp/open3d/pipelines/registration/ColoredICP.cpp:124: No correspondences found between source and target pointcloud.

Traceback (most recent call last):
  File "../../sensaas/meta-sensaas.py", line 592, in <module>
    if(bestgfithfit < float(scoregfithfit[k])):
ValueError: could not convert string to float: '0.157)'

I think it would be great if meta-sensaas.py could detect such failures and simply skip the system (with a score of zero or -np.Inf), so that the remaining systems in the SDF file can be screened as well.

Allow user to name output and temporary files

Currently meta-sensaas.py has hard-coded output file names

sensaas/meta-sensaas.py

Lines 495 to 497 in 82dafb7

output="catsensaas.sdf"
outputbest="bestsensaas.sdf"
outputmatrix="matrix-sensaas.txt"

as well as temporary file names:

sensaas/meta-sensaas.py

Lines 511 to 518 in 82dafb7

while(i < nbt):
i=i+1
searchsdfi(target,i,"tmpt.sdf")
#print ("Target no %s" % i)
j=0
while(j < nbs):
j=j+1
searchsdfi(source,j,"tmps.sdf")

This prevents to run multiple instances of SENSAAS in the same directory, because of conflicting file names. It would be great to be able to specify a prefix for the output and temporary files, so that clashes are avoided.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.