gcorso / torsional-diffusion Goto Github PK
View Code? Open in Web Editor NEWImplementation of Torsional Diffusion for Molecular Conformer Generation (NeurIPS 2022)
Home Page: https://arxiv.org/abs/2206.01729
License: MIT License
Implementation of Torsional Diffusion for Molecular Conformer Generation (NeurIPS 2022)
Home Page: https://arxiv.org/abs/2206.01729
License: MIT License
I trained the torsional diffusion model on my own dataset and was interested in generating conformers with likelihoods. I use the following command to generate the conformers:
python generate_confs.py --test_csv test.csv --inference_steps 20 --model_dir run/ --out conformers_20steps.pkl --tqdm --batch_size 128 --ode --likelihood full
But when I check the euclidean_dlogp
the values are between -30 and +30. Do you know why this is? This is the log likelihood, right? So shouldn't it always be negative? I have tried both the full and hutchinson methods.
Thanks for your help in advance!
I would like to provide my own datasest for retraining torisional-diffusion. There are some things that I do not know what value to put in for the pickle file. For example, the conformers dictionary has the following:
{'geom_id': 123368967, 'set': 1, 'degeneracy': 3, 'totalenergy': -23.59133734, 'relativeenergy': 0.0, 'boltzmannweight': 0.8585, 'conformerweights': [0.28617, 0.28617, 0.28616], 'rd_mol': <rdkit.Chem.rdchem.Mol at 0x7f7b42014bd0>}
What should I put for boltzmannweight and degeneracy? Is there a setup script to take molfiles and convert them into the dataset for training?
Hi,
Your work is really interesting! However, when I run your code, I found a code block that runs very slow:
torsional-diffusion/diffusion/torus.py , line 68
score_norm_ = score(
sample(sigma[None].repeat(10000, 0).flatten()),
sigma[None].repeat(10000, 0).flatten()
).reshape(10000, -1)
score_norm_ = (score_norm_ ** 2).mean(0)
I wonder why sigma is repeated 10000 times? Is there any way to make it faster?
Thanks!
torsional-diffusion/diffusion/score_model.py
Line 118 in fcad6fb
The variable new_means
is an empty list e3nn code which leads to this error because it is necessary to have at least a scalar irrep for calculating new_means
in the BatchNorm layer.
Hi.
I want to training torsion-diffusion model using qm9 dataset but how I can get the split.npy file or generated this np file on your code ?
please let me know detail processing gen dataset
Thank!!
Hi,
First of all thank you for making your code available. I was curious to test out your conformer generation, but unfortunately ran into a few issues that I was hoping you could help me with.
A short report on what I did and which errors I ran into (my machine is running Ubuntu 22.04)
../torsional-diffusion/diffusion/torus.py:33: RuntimeWarning: invalid value encountered in divide
score_ = grad(x, sigma[:, None], N=100) / p_
.../.conda/envs/torsional_diffusion/lib/python3.9/site-packages/torch/jit/_check.py:181: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in __init__
. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in torch.jit.Attribute
.
warnings.warn("The TorchScript type system doesn't support "
0it [00:00, ?it/s]
Generated conformers for 0 molecules
Would you have any idea what might be causing this behaviour?
Thanks in advance for any pointers you can give me and kind regards,
Jessica
Putting QM9 in the data folder, running the following
python train.py --log_dir ./test_run --cache data/QM9/cache --data_dir data/QM9/qm9 --std_pickles data/QM9/standardized_pickles --split_path data/QM9/split.npy --dataset=qm9
keeps on giving a num_samples should be a positive integer value, but got num_samples=0
error. I checked and all the directories are correct. Is there something special about qm9 that merits a different treatment? Thanks.
Hello,thanks for ur good work .Would you like to provide checkpoints in the future?
Best wishes to u.
Traceback (most recent call last):
File "e:\Cheminfo_Workshop\5_Docking_Lab\torsional-diffusion-master\generate_confs.py", line 11, in
from diffusion.sampling import *
File "e:\Cheminfo_Workshop\5_Docking_Lab\torsional-diffusion-master\diffusion\sampling.py", line 4, in
from diffusion.likelihood import *
File "e:\Cheminfo_Workshop\5_Docking_Lab\torsional-diffusion-master\diffusion\likelihood.py", line 7, in
from utils.xtb import *
File "e:\Cheminfo_Workshop\5_Docking_Lab\torsional-diffusion-master\utils\xtb.py", line 12, in
os.mkdir(my_dir)
FileNotFoundError: [WinError 3] The system cannot find the path specified: '/tmp/8508'
while I try to set the temp file,
how can I change the temp file. many thanks,
best,
Sh-Y
Hi,
Thanks for your great work! I wonder why we should use the optimization to perform conformer matching. Could we just set the ground truth torsions to our RDKit-generated conformers as our targets? Am I missing something here?
Thank you in advance!
Best regards,
Lin
Hi, I am trying to train a torsional diffusion model. Since torus.score_norm() is very small when sigma is large, the training loss can be enormous. Is this the expected behavior? Do you suggest using a smaller sigma_max for numerical stability?
Hi, I had a question about how you check if a bond is rotatable, it seems you do it by checking if a networkx graph is fully connected after removing that bond - but do you take into account the bond order anywhere?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.