hesther / espsim Goto Github PK
View Code? Open in Web Editor NEWScoring of shape and ESP similarity with RDKit
License: MIT License
Scoring of shape and ESP similarity with RDKit
License: MIT License
Hi-
I love the work you've done on this package. I've been toying around with it and noticed that it can provide stochastic results. Would you welcome a PR that would add seeds to all of the top level functions that gets passed to the conformer generation functions? This would be similar to the param in your function ConstrainedEmbedMultipleConfs
but passed on in the rest of the functions as well.
Hi @hesther ,
Appeciate your work. I am having difficulty understaning the esp similarity formula in the end of short_demonstration.ipynb.
It seems to focus on the sign alignment situation among all space, then why not integral of phi_a * phi_b / (|phi_a|*|phi_b|), maybe I get it wrong. What's the intuition behind the formula.
Hi,
I really like the ESP-Sim package, but I would like to speed up the calculations for larger datasets. I see that ESP-Sim only uses one CPU core on my machine. Is there an easy way to set the number of cores I want to use for ESP-Sim calculations?
(Sorry if this is a noob question)
Thank you in advance!
Robin
Hey Hesther,
Looking at all the appreciation and its presence in various other sources, I feel at loss not being able to use it.
Can you please help me installing it in Google Colab?
Thanks in advance
-Hemant
hello esther,
(I found a fix, please see my comment, but I don't know the root cause)
i've been trying out the espsim
library on some molecules. the demo notebooks are very helpful and the analyses in the paper are also nicely done. great work!
i was trying to compare the ESP similarity of ATP
against another known inhibitor of CDK2
.
Specifically, I used the ATP
co-crystal ligand from PDB ID 1B38
,
(https://models.rcsb.org/v1/1b38/ligand?auth_seq_id=381&label_asym_id=C&encoding=sdf&filename=1b38_C_ATP.sdf)
and the Dinaciclib
co-crystal ligand from PDB ID 5L2W
(https://models.rcsb.org/v1/5l2w/ligand?auth_seq_id=900&label_asym_id=C&encoding=sdf&filename=5l2w_C_1QK.sdf)
I am trying EmbedAlignScore()
on these 2 mols, and the calculation works when I use gasteiger
, mmff
, ml
, but not resp
.
psi4
complains about this error:
RuntimeError:
Fatal Error: RHF: RHF reference is only for singlets.
Error occurred in file: /build/source/psi4/src/psi4/libscf_solver/rhf.cc on line: 92
The most recent 5 function calls were:
psi::PsiException::PsiException(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, char const*, int)
I've also attached the detailed run log for your reference: https://gist.github.com/linminhtoo/84182da4bf727361b23905f34a429a5d
Here was how I run the code (I did rename some variables for my convenience but I didn't change any code logic)
# load co-crystal ATP
mol_atp = Chem.SDMolSupplier(str(RELATIVE / "data/cdk2_knownhits/1b38_C_ATP.sdf"), removeHs=False)[0]
mol_atp = Chem.AddHs(mol_atp, addCoords=True)
# load co-crystal dinaciclib
mol_dina = Chem.SDMolSupplier(str(RELATIVE / "data/cdk2_knownhits/5l2w_C_1QK.sdf"), removeHs=False)[0]
mol_dina = Chem.AddHs(mol_dina, addCoords=True)
# run ESPSim
shape_sims, esp_sims = EmbedAlignScore(
probe_mol=deepcopy(mol_dina),
ref_mols=deepcopy(mol_atp),
probe_num_confs=10,
ref_num_confs=10,
partial_charge_mode="resp",
renormalize=True, # to [0, 1]
getBestESP=True, # more accurate but slower
randomseed=2342,
)
I tried swapping probe_mol
to mol_atp
and ref_mols
to mol_dina
and psi4
did calculate the charges for one molecule before crashing again with the same error, which means it is not happy with Dinaciclib
for some reason, but is fine with ATP
. (assuming it calculates probe_mol
first.
Do you happen to have any idea what's the issue?
Best,
Min Htoo
Hello @hesther, thank you for this nice tool!
I would like to try ML-based charges, but it doesn't work for me:
752 if train_args.features_scaling != predict_args.features_scaling:
753 raise ValueError('If scaling of the additional features was done during training, the '
754 'same must be done during prediction.')
756 # If atom descriptors were used during training, they must be used when predicting and vice-versa
AttributeError: 'Namespace' object has no attribute 'features_scaling'
Am I missing some reqirements?
Thank you++
Hello,
I just downloaded this package and excited to use it. However, when I run benchmark_1_partial_charges.ipynb I'm getting this warning for the ML predictions:
Computing ML charges (will print three warnings for failed predictions)
Warning: could not obtain prediction, defaulting to Gasteiger charges for one molecule
Warning: could not obtain prediction, defaulting to Gasteiger charges for one molecule
Warning: could not obtain prediction, defaulting to Gasteiger charges for one molecule
Do I need to worry about this?
Thanks and have a nice day!
Hi, thank you for the great work.
I meed some problem when I try to run through the exaple notebook with partialCharges='resp'. The error is about key-values of options in helper.py. I try to modify it by removing the list in line 82. Then the charge output becomes only one value rather than a list/array in line 51 of electrostatics.py. I am not an expert of Psi4, hope these description can help you understand the issue.
Appreicate it if you can help, :D!
Best,
Chao
Hello, thanks for this great package!
Do you have any plans to make this package available through conda or pip?
Tharindu
Would that be possible to push a git tag or a pypi release?
I might use it to make a conda package of espsim.
ping @hesther
Hi,
I was interested in using this library to compare docking poses generated using a consensus docking approach. Essentially, I would like to compare the electrostatic field similarity between different poses for a single compound, to use the similarity data to cluster the poses, in a similar fashion that you would do using RMSD values. I am just wondering if it is possible to calculate the field similarity without prior alignment of the two molecules, as the docking poses geometry would need to be conserved. Also the two molecules I would be comparing would be the same compound just in different conformations.
Thanks for any insight you may have,
Tony
Hi, thanks for this great package!
I wanted to speed it up for use in large-scale screening, so vectorized the gaussian integration step:
espsim/espsim/electrostatics.py
Line 114 in 463289c
Here's a minimal-ish example, taken from the scripts
dir:
setup:
## Setup:
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from espsim import EmbedAlignConstrainedScore,ConstrainedEmbedMultipleConfs,GetEspSim, helpers
import numpy as np
from scipy.spatial.distance import cdist
## set up molecules:
refSmiles=['C1=CC=C(C=C1)C(C(=O)O)O','CCC(C(=O)O)O','OC(C(O)=O)c1ccc(Cl)cc1','C1=CC(=CC=C1C(C(=O)O)O)O','COc1ccc(cc1)C(O)C(O)=O','OC(C(O)=O)c1ccc(cc1)[N+]([O-])=O','CCCC(C(=O)O)O','CCC(C)C(C(=O)O)O','CC(C(=O)O)O']
prbSmile='C(C(C(=O)O)O)O'
refMols=[Chem.AddHs(Chem.MolFromSmiles(x)) for x in refSmiles]
prbMol=Chem.AddHs(Chem.MolFromSmiles(prbSmile))
patt=Chem.MolFromSmiles("[H]OC([H])(C)C(=O)O[H]",sanitize=False)
helper=Chem.AddHs(Chem.MolFromSmiles("[H]OC([H])(C)C(=O)O[H]"))
AllChem.EmbedMolecule(helper,AllChem.ETKDG()) #Embed first reference molecule, create one conformer
AllChem.UFFOptimizeMolecule(helper) #Optimize the coordinates of the conformer
core = AllChem.DeleteSubstructs(AllChem.ReplaceSidechains(helper,patt),Chem.MolFromSmiles('*')) #Create core molecule with 3D coordinates
core.UpdatePropertyCache()
# align the molecules:
simShape,simEsp=EmbedAlignConstrainedScore(prbMol,refMols,core)
vectorized integration function (should just be a drop-in for GaussInt):
## Define vectorized gaussian integration functions:
def VecGI(dist, charge1,charge2,):
#These are precomputed coefficients:
a=np.array([[ 15.90600036, 3.9534831 , 17.61453176],
[ 3.9534831 , 5.21580206, 1.91045387],
[ 17.61453176, 1.91045387, 238.75820253]])
b=np.array([[-0.02495 , -0.04539319, -0.00247124],
[-0.04539319, -0.2513 , -0.00258662],
[-0.00247124, -0.00258662, -0.0013 ]])
a_flat = a.flatten()
b_flat = b.flatten()
dist = (dist**2).flatten()
charges = (charge1[:,None]*charge2).flatten()
return ((a_flat[:,None] * np.exp(dist * b_flat[:,None])).sum(0) * charges).sum()
def vecSim(refCoor, prbCoor, refCharge, prbCharge, metric):
distPrbPrb = cdist(prbCoor,prbCoor)
distPrbRef = cdist(prbCoor,refCoor)
distRefRef = cdist(refCoor,refCoor)
intPrbPrb= VecGI(distPrbPrb,prbCharge,prbCharge)
intPrbRef= VecGI(distPrbRef,prbCharge,refCharge)
intRefRef= VecGI(distRefRef,refCharge,refCharge)
return SimilarityMetric(intPrbPrb,intRefRef,intPrbRef,metric)
test equivalence:
prbCoor = prbMol.GetConformer(0).GetPositions()
prbCharge = np.array([a.GetDoubleProp('_GasteigerCharge') for a in prbMol.GetAtoms()])
simEsp_vectorized = []
for refMol in refMols:
refCoor = refMol.GetConformer(0).GetPositions()
refCharge = np.array([a.GetDoubleProp('_GasteigerCharge') for a in refMol.GetAtoms()])
metric = 'tanimoto'
tanimoto_similarity = GetEspSim(prbMol, refMol, metric=metric, partialCharges='gasteiger')
vectorized_tanimoto_similarity = vecSim(refCoor, prbCoor, refCharge, prbCharge,metric)
print(np.isclose(tanimoto_similarity, vectorized_tanimoto_similarity), tanimoto_similarity, vectorized_tanimoto_similarity)
output:
True 0.6460256327061207 0.6460256327061223
True 0.6882976768462027 0.6882976768461678
True 0.6464181000546018 0.6464181000546553
True 0.4060487492735462 0.4060487492735308
True 0.30699421499052204 0.3069942149905257
True 0.23382614291885506 0.23382614291884982
True 0.6689556377160998 0.668955637716057
True 0.7182699415095307 0.718269941509474
True 0.7087896849268721 0.7087896849268774
These take ~6ms vs 100µs. The VecGI
should just be a drop in replacement for GaussInt
. Seems to work for Carbo and Tanimoto, so if it's of interest Im happy to submit a PR.
cheers
Lewis
From the tutorial I have been using the following code block to view the alignment of my prbMol to my refMol.
p = py3Dmol.view(width=400,height=400) dt = {} for i in range(len(prbMols)): dt[i] = [prbMols[i], refMol] interact(draw, ms=dt,p=fixed(p),confIds=fixed([6,6]));
However, there appears to be no clear way to select the confIds that have given the best simShape or simEsp score, and instead you have to manually iterate through the confIds until you see a pair that give reasonable overlay. Is there anyway to automatically retrieve the confID of the 'best scoring' conformations?
Thanks,
Noah
Does espsim support this function:
3D shape and chemical similarity of two identical substructures from two molecule
Or any suggestions?
The installation fails to complete, even with a clean conda installation. Problems include resp and psi4 not being available for the modern version of python aka > 3.6
Trying to install ESPSim locally and conda is taking an age to solve the environment when using the .yml file. It is likely this is due to the .yml file not including specification of the Python or package versions required for a working install. Is it possible to upload a version which contains the dependencies and their versions?
Thanks,
Noah
espsim/espsim/electrostatics.py
Line 180 in da46426
Thank you for an excellent, and easy to use, package. I was wondering if you could shed some light on the origin of these co-efficients, and their significance?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.