aspuru-guzik-group / chemical_vae Goto Github PK
View Code? Open in Web Editor NEWCode for 10.1021/acscentsci.7b00572, now running on Keras 2.0 and Tensorflow
License: Apache License 2.0
Code for 10.1021/acscentsci.7b00572, now running on Keras 2.0 and Tensorflow
License: Apache License 2.0
Is it possible to use the program just to encode a set of molecules in an sdf file to the vector representation? I would like to use this representation as feature vectors in my code for binding affinity prediction.
Hello,
what is the average runtime for training the zince dataset (70 epochs)? I'm running this on a dual GPU workstation and a single epoch takes a very long time. Are there additional parameter settings I need to change?
Thanks!
Dear All,
In the paper it is mentioned that 250 000 drug-like molecules were used to train the autoencoder-system
And for training the Gaussian-process 2000 Molecules were used.
However the provided command line tool only provides one input.
Therefore the question: Is it possible to train the property-prediction and reconstruction-task separately, or how was this separated training achieved in the paper?
How should the code be executed, assuming we have the following two data sets?:
• a limited number of Molecules with SMILES and properties and
• a large number of Molecules with SMILES and without properties, (maybe overlapping with the smaller dataset)
Kind regards!
If I have a new dataset, how can I use your code to train? It will be great if you could provide a procedure. Do you have any documentations on this code? Many thanks.
I found the variational_layers
gives the posterior log variances for the latent variables from the last encoding layer through a dense layer link. However, this dense in the variational_layers
is not saved after training and thus users cannot obtain the latent posterior log variances or samples.
The example provided only uses the posterior means to decode to smiles. Can you actually encode smiles to posterior samples rather than posterior means after training?
TypeError Traceback (most recent call last)
in
3 environ['KERAS_BACKEND'] = 'tensorflow'
4 # vae stuff
----> 5 from chemvae.vae_utils import VAEUtils
6 from chemvae import mol_utils as mu
7 # import scientific py
c:\Users\yhtru\anaconda3\envs\chemvae\lib\importlib_bootstrap.py in find_and_load(name, import)
c:\Users\yhtru\anaconda3\envs\chemvae\lib\importlib_bootstrap.py in find_and_load_unlocked(name, import)
c:\Users\yhtru\anaconda3\envs\chemvae\lib\importlib_bootstrap.py in _load_unlocked(spec)
c:\Users\yhtru\anaconda3\envs\chemvae\lib\importlib_bootstrap.py in _load_backward_compatible(spec)
c:\Users\yhtru\anaconda3\envs\chemvae\lib\site-packages\chemvae-1.0.0-py3.6.egg\chemvae\vae_utils.py in
3 import random
4 import yaml
----> 5 from .models import load_encoder, load_decoder, load_property_predictor
6 import numpy as np
7 import pandas as pd
c:\Users\yhtru\anaconda3\envs\chemvae\lib\importlib_bootstrap.py in find_and_load(name, import)
...
---> 82 class ArgumentParser(Generic[_T], metaclass=_ArgumentParserCache):
83 """Base class used to parse and convert arguments.
84
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
When I run the command with a data-set specified in exp.json it trains the model, but does not return a file with the final encoded representation of the smiles. How can I change that?
Hi all,
I have installed chemical_vae by the following commands:**
"~/anaconda2/bin/pip install -r requirements.txt
~/anaconda2/bin/conda env create -f environment.yml
source ~/anaconda2/bin/activate chemvae
python setup.py install"
When I enter the instruction "python -m chemvae.train_vae", It gives the following errors:
_"/home/lili/anaconda2/envs/chemvae/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
/home/lili/anaconda2/envs/chemvae/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float
to np.floating
is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type
.
from ._conv import register_converters as _register_converters
Using Theano backend.
Traceback (most recent call last):
File "/home/lili/anaconda2/envs/chemvae/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/lili/anaconda2/envs/chemvae/lib/python3.6/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/home/lili/Programs/chemical_vae-master/chemvae/train_vae.py", line 30, in
from .models import encoder_model, load_encoder
File "/home/lili/Programs/chemical_vae-master/chemvae/models.py", line 10, in
from .tgru_k2_gpu import TerminalGRU
File "/home/lili/Programs/chemical_vae-master/chemvae/tgru_k2_gpu.py", line 72, in
raise NotImplemented("Backend not implemented")
TypeError: 'NotImplementedType' object is not callable"
How can I solve this problem? Thank you very much.
when I run the Decode several attempts part of the intro_to_chemvae.ipynb ,there come some errors
AttributeError Traceback (most recent call last)
D:\Anaconda3\envs\chemvae\lib\site-packages\PIL\ImageFile.py in _save(im, fp, tile, bufsize)
481 try:
--> 482 fh = fp.fileno()
483 fp.flush()
AttributeError: '_idat' object has no attribute 'fileno'
During handling of the above exception, another exception occurred:
SystemError Traceback (most recent call last)
D:\Anaconda3\envs\chemvae\lib\site-packages\IPython\core\formatters.py in call(self, obj)
334 method = get_real_method(obj, self.print_method)
335 if method is not None:
--> 336 return method()
337 return None
338 else:
D:\Anaconda3\envs\chemvae\lib\site-packages\PIL\Image.py in repr_png(self)
655 from io import BytesIO
656 b = BytesIO()
--> 657 self.save(b, 'PNG')
658 return b.getvalue()
659
D:\Anaconda3\envs\chemvae\lib\site-packages\PIL\Image.py in save(self, fp, format, **params)
1928
1929 try:
-> 1930 save_handler(self, fp, filename)
1931 finally:
1932 # do what we can to clean up
D:\Anaconda3\envs\chemvae\lib\site-packages\PIL\PngImagePlugin.py in _save(im, fp, filename, chunk)
819
820 ImageFile._save(im, _idat(fp, chunk),
--> 821 [("zip", (0, 0)+im.size, 0, rawmode)])
822
823 chunk(fp, b"IEND", b"")
D:\Anaconda3\envs\chemvae\lib\site-packages\PIL\ImageFile.py in _save(im, fp, tile, bufsize)
488 if o > 0:
489 fp.seek(o, 0)
--> 490 e.setimage(im.im, b)
491 if e.pushes_fd:
492 e.setfd(fp)
SystemError: tile cannot extend outside image
<PIL.Image.Image image mode=RGBA size=1000x0 at 0x2100EA6F128>
python -m chemvae.train_vae
/usr/local/anaconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/anaconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/anaconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/anaconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/anaconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/anaconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Using Theano backend.
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/chemvae/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/anaconda3/envs/chemvae/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/berenger/src/chemical_vae/chemvae/train_vae.py", line 30, in <module>
from .models import encoder_model, load_encoder
File "/home/berenger/src/chemical_vae/chemvae/models.py", line 10, in <module>
from .tgru_k2_gpu import TerminalGRU
File "/home/berenger/src/chemical_vae/chemvae/tgru_k2_gpu.py", line 73, in <module>
raise NotImplemented("Backend not implemented")
TypeError: 'NotImplementedType' object is not callable
Is there a version of this to run run tensorflow>=2.0
The code is difficult to work with
Please where can I find the model prediction result?
After hours on tinkering around on (arch) linux and not getting it to run with miniconda and optirun (running code on an intel-nvidia-laptop-gpu) I gave up after 3 hours and tried to get it running on windows.
So far I've got multiple remarks:
activate chemvae
without source
C:\Users\<username>\Anaconda3\envs\chemvae\etc\conda\activate.d\keras_activate.bat
to ackend=tensorflow.pip install --upgrade tensorflow-gpu==1.10
after install - otherwise it's not properly recognizing my gpu-device. This also requires editing the keras_activate.bat.How to reproduce: Install the visual studio 2015 build tools on a fresh install of windows 10, then run the install script and then the python -m chemvae.train_vae
I am using Anaconda Python 3.6 with latest Keras and other libraries. When I run
python -m chemvae.train_vae
It complains:
Traceback (most recent call last):
File "Anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "chemical_vae\chemvae\train_vae.py", line 399, in
main_no_prop(params)
File "chemical_vae\chemvae\train_vae.py", line 229, in main_no_prop
AE_only_model, encoder, decoder, kl_loss_var = load_models(params)
File "chemical_vae\chemvae\train_vae.py", line 161, in load_models
decoder = decoder_model(params)
File "chemical_vae\chemvae\models.py", line 137, in decoder_model
implementation=params['terminal_GRU_implementation'])([x_dec, true_seq_in])
File "chemical_vae\chemvae\tgru_k2_gpu.py", line 86, in init
self.units = units
AttributeError: can't set attribute
I tried keras-team/keras#7736
And changed the related lines to:
try:
self.units = units
except AttributeError:
self._units = units
try:
self.recurrent_dropout = min(1., max(0., recurrent_dropout))
except AttributeError:
self._recurrent_dropout = min(1., max(0., recurrent_dropout))
Then it complains:
ValueError: Layer decoder_tgru expects 3 inputs, but it received 2 input tensors. Input received: [<tf.Tensor 'decoder_gru2_2/transpose_1:0' shape=(?, ?, 488) dtype=float32>, <tf.Tensor 'decoder_true_seq_input_2:0' shape=(?, 120, 35) dtype=float32>]
Please help...
run script 'intro_to_chemvae.ipynb', return this :
AttributeError: in user code:
/home/sc/ml/chemical_space/chemical_vae-main/chemvae/tgru_k2_gpu.py:207 call *
preprocessed_input = self.preprocess_input(X)
AttributeError: 'TerminalGRU' object has no attribute 'preprocess_input'
anybody meet this issue?
hope for anwser or discussion
Which RDkit version is used in this project? I often get the following error reported when conducting tests,such as:
Traceback (most recent call last):
File "D:\VAE\chemical_vae-main\chemical_vae-main\chemvae\create.py", line 5, in
from chemvae.vae_utils import VAEUtils
File "D:\VAE\chemical_vae-main\chemical_vae-main\chemvae\vae_utils.py", line 1, in
from . import mol_utils as mu
File "D:\VAE\chemical_vae-main\chemical_vae-main\chemvae\mol_utils.py", line 4, in
from rdkit.Chem import AllChem as Chem
File "C:\Users\ASUS.conda\envs\rdkit\lib\site-packages\rdkit_init_.py", line 2, in
from .rdBase import rdkitVersion as version
ImportError: DLL load failed
The last cell should change f = pd.DataFrame(np.transpose((Z_tsne[:,0],Z_tsne[:,1]))) to
df = pd.DataFrame(np.transpose((Z_tsne[:,0],Z_tsne[:,1])))
otherwise, there is a dateframe error
Hi! I wanted to try running the intro_to_chemvae.ipynb example, I installed the environment via Anaconda, however I encountered the following issue when importing VAEUtils from chemvae.vae_utilis:
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
I have the same problem when running the autoencoder on the ZINC dataset as described in the example (python -m chemvae.train_vae)
Any suggestions on how to fix this? Thanks!
Simone
TypeError Traceback (most recent call last)
in
3 environ['KERAS_BACKEND'] = 'tensorflow'
4 # vae stuff
----> 5 from chemvae.vae_utils import VAEUtils
6 from chemvae import mol_utils as mu
7 # import scientific py
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in find_and_load(name, import)
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in find_and_load_unlocked(name, import)
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in _load_unlocked(spec)
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in _load_backward_compatible(spec)
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/vae_utils.py in
3 import random
4 import yaml
----> 5 from .models import load_encoder, load_decoder, load_property_predictor
6 import numpy as np
7 import pandas as pd
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in find_and_load(name, import)
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in find_and_load_unlocked(name, import)
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in _load_unlocked(spec)
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in _load_backward_compatible(spec)
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/models.py in
----> 1 from keras.layers import Input, Lambda
2 from keras.layers.core import Dense, Flatten, RepeatVector, Dropout
3 from keras.layers.convolutional import Convolution1D
4 from keras.layers.recurrent import GRU
5 from keras.layers.normalization import BatchNormalization
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/keras/init.py in
1 from future import absolute_import
2
----> 3 from . import utils
4 from . import activations
5 from . import applications
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/keras/utils/init.py in
4 from . import data_utils
5 from . import io_utils
----> 6 from . import conv_utils
7
8 # Globally-importable utils.
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/keras/utils/conv_utils.py in
1 from six.moves import range
2 import numpy as np
----> 3 from .. import backend as K
4
5
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/keras/backend/init.py in
81 elif _BACKEND == 'tensorflow':
82 sys.stderr.write('Using TensorFlow backend.\n')
---> 83 from .tensorflow_backend import *
84 else:
85 raise ValueError('Unknown backend: ' + str(_BACKEND))
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in
----> 1 import tensorflow as tf
2 from tensorflow.python.training import moving_averages
3 from tensorflow.python.ops import tensor_array_ops
4 from tensorflow.python.ops import control_flow_ops
5 from tensorflow.python.ops import functional_ops
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/init.py in
20
21 # pylint: disable=g-bad-import-order
---> 22 from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
23
24 try:
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/init.py in
61
62 # Framework
---> 63 from tensorflow.python.framework.framework_lib import * # pylint: disable=redefined-builtin
64 from tensorflow.python.framework.versions import *
65 from tensorflow.python.framework import errors
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/framework_lib.py in
23 # Classes used when building a Graph.
24 from tensorflow.python.framework.device import DeviceSpec
---> 25 from tensorflow.python.framework.ops import Graph
26 from tensorflow.python.framework.ops import Operation
27 from tensorflow.python.framework.ops import Tensor
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in
53 from tensorflow.python.framework import versions
54 from tensorflow.python.ops import control_flow_util
---> 55 from tensorflow.python.platform import app
56 from tensorflow.python.platform import tf_logging as logging
57 from tensorflow.python.util import compat
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/platform/app.py in
22 import sys as _sys
23
---> 24 from tensorflow.python.platform import flags
25 from tensorflow.python.util.tf_export import tf_export
26
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/platform/flags.py in
23
24 # go/tf-wildcard-import
---> 25 from absl.flags import * # pylint: disable=wildcard-import
26 import six as _six
27
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/absl/flags/init.py in
33 import warnings
34
---> 35 from absl.flags import _argument_parser
36 from absl.flags import _defines
37 from absl.flags import _exceptions
~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/absl/flags/_argument_parser.py in
80
81
---> 82 class ArgumentParser(Generic[_T], metaclass=_ArgumentParserCache):
83 """Base class used to parse and convert arguments.
84
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
Hi, I have a question about. Does this code works for time-series?
Dear friends,
I tried to implemented the codes for input smiles. It worked well for some given molecules. However, I also came across some problems I could NOT figure out.
Traceback (most recent call last):
in
z_1 = vae.encode(X_1)
File "/***/chemical_vae-master/chemvae/vae_utils.py", line 163, in encode
return self.standardize_z(self.enc.predict(X)[0])
IndexError: list index out of range
Any ideas how to fix it?
Thanks,
Cheng
Hi again,
could you paste exact versions of Python packages which are needed to successfuly run your code using GPU?
The problem is that when installing all packages using your environment.yml and setup.py files the Python package versions are not compatible and computations are only possible using CPU.
In the model directory, I noted that there are two directories with respect to model. So, what's the difference between these two models?
Hi guys,
I have encountered the following "weird" behaviour when I sample the latent space near molecules near a SMILES: the output molecules somehow change little with the specified noise level. My installation seems to be okay, it reproduces the examples (I'm using a CPU based installation), so I wonder whether I am missing something. I provide below some examples, but it is the case for many other molecules. (For the cases here I take only 100 samples, but for "production" work I take tens of thousands, and the pattern remains)
Noise 200:
$ python get_vae_smiles.py "CSCC(=O)NNC(=O)c1c(C)oc(C)c1C" 2>/dev/null
Using standarized functions? True
Standarization: estimating mu and std values ...done!
Input : CSCC(=O)NNC(=O)c1c(C)oc(C)c1C
Reconstruction : CSCC(=O)N(C(=O)c1c(C)oc(C)c1C
Z representation : (1, 196) with norm 10.705
Searching molecules randomly sampled from 200.00 std (z-distance) from the point
Found 10 unique mols, out of 30
SMILES
0 CSCC(=O)NNC(=O)c1c(C)oc(C)c1C
1 CSC(C=O)NNC(=O)c1c(C)oc(C)c1C
2 COCC(=O)NC(C=O)c1c(C)oc(C)c1C
3 CSCC(=O)NCC(=O)c1c(C)oc(C)c1C
4 COCC(=O)NCC(=O)c1c(C)oc(C)c1C
5 CSC(C=O)NCC(=O)c1c(C)oc(C)c1C
6 COCC(=O)NCC(=O)c1c(C)oc(C)c1Cl
7 CSC(C=O)NCC(=O)c1c(F)oc(C)c1C
8 COCC(=O)NC(=O)c1cc(O)nc(C)c1C
9 C#COC(=N)NC(=O)c1ccccc(Cl)cc1Cl
Name: smiles, dtype: object
Noise 2:
Searching molecules randomly sampled from 2.00 std (z-distance) from the point
Found 13 unique mols, out of 75
SMILES
0 CSCC(=O)NNC(=O)c1c(C)oc(C)c1C
1 CSC(C=O)NNC(=O)c1c(C)oc(C)c1C
2 CSCC(=O)NC(C=O)c1c(C)oc(C)c1C
3 COCC(=O)NC(C=O)c1c(C)oc(C)c1C
4 CSCC(=O)NCC(=O)c1c(C)oc(C)c1C
5 COC(C=O)NNC(=O)c1c(C)oc(C)c1C
6 COCC(=O)NCC(=O)c1c(C)oc(C)c1C
7 CSCC(=O)NCC(=O)c1c(O)oc(C)c1C
8 CSC(C=O)NCC(=O)c1c(C)oc(C)c1C
9 CSC(C=O)NCC(=O)c1c(F)oc(C)c1C
10 COC(C=O)NCC(=O)c1c(C)oc(C)c1C
11 CSCC(=O)N/C(=O)c1c(C)oc(C)c1C
12 ClCC(=O)NCC(=O)c1c(C)oc(C)c1C
Name: smiles, dtype: object
Searching molecules randomly sampled from 50.00 std (z-distance) from the point
Found 14 unique mols, out of 65
SMILES
0 CSCC(=O)NNC(=O)c1c(C)oc(C)c1C
1 COCC(=O)NNC(=O)c1c(C)oc(C)c1C
2 CSC(C=O)NNC(=O)c1c(C)oc(C)c1C
3 COCC(=O)NC(C=O)c1c(C)oc(C)c1C
4 CSCC(=O)NCC(=O)c1c(C)oc(C)c1C
5 CSC(C=O)NC(C=O)c1c(C)oc(C)c1C
6 COCC(=O)NCC(=O)c1c(C)oc(C)c1C
7 CSC(C=O)NCC(=O)c1c(C)oc(C)c1C
8 CSC(C=O)NCC(=O)c1c(F)oc(C)c1C
9 COC(C=O)NCC(=O)c1c(C)oc(C)c1C
10 CSCC(=O)N/C(=O)c1c(C)oc(C)c1C
11 ClC(C=O)NCC(=O)c1c(C)oc(C)c1C
12 ClCC(=O)NCC(=O)c1c(C)oc(C)c1C
13 ClCC(=O)NC(C=O)c1c(C)oc(C)c1C
Name: smiles, dtype: object
So it seems that for large Z distances the SMILES are not so much different than for small distances. )What is the distribution of the random sampling? I would expect this if the random sampling is not uniform and heavily biased towards the coordinates of input SMILES, so the specified noise level affects only the peripheries, and most molecules of the output still originate from the close neighbourhood of the SMILES.
I would greatly appreciate any help with this issue.
Best wishes,
Gyorgy Abrusan
I'm really interested in this model but I couldn't make the codes work on my computer or on a computing cluster due to the outdated version of Tensorflow applied in this model. Is it possible to make an update or any suggestion on how I can make it work? I would really appreciate your help!
When I loaded a model in the file intro_to_chemvae.ipynb
, it reported as follow:
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[196,196]
[[Node: z_mean_sample_4/kernel/Assign = Assign[T=DT_FLOAT, _class=["loc:@z_mean_sample_4/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](z_mean_sample_4/kernel, z_mean_sample_4/random_uniform)]]
Is it due to my server's memory?
My server's memory is 16GB
, CPU is Intel(R) Xeon(R) CPU E5-1603 v4 @ 2.80GHz
.
Hello,
I've been trying to reproduce the results of the zinc_properties provided in the default repositories ./chemical_vae/models/zinc_properties.
Basically I just cd to the zinc_properties directory and use
python3 -m chemvae.train_vae
for 120 epoch with the default exp.json file and end up with the three files zinc_decoder.h5, zinc_encoder.h5, zinc_prop_pred.h5.
Now if I try to use those files in the jupyter notebook /chemical_vae/examples/intro_to_chemvae.ipynb example, the "encode then decode test" as shown below does not work (cannot find back the original smiles encoded nor generate similar smiles using a noise of 5.0) though it all does work with the original h5 files.
# Using the VAE
## Decode/Encode
smiles_1 = mu.canon_smiles('CSCC(=O)NNC(=O)c1c(C)oc(C)c1C')
# smiles_1 = mu.canon_smiles('Cc1cc2c(cc1S(=O)(=O)NC1CCC(C)CC1)OCCN2C')
X_1 = vae.smiles_to_hot(smiles_1,canonize_smiles=True)
z_1 = vae.encode(X_1)
X_r= vae.decode(z_1)
print('{:20s} : {}'.format('Input',smiles_1))
print('{:20s} : {}'.format('Reconstruction',vae.hot_to_smiles(X_r,strip=True)[0]))
print('{:20s} : {} with norm {:.3f}'.format('Z representation',z_1.shape, np.linalg.norm(z_1)))
Were the h5 files provided obtained using the .csv and .json files provided in the same zinc_properties github repository?
Thank you very much for your work, it is so interesting
Best Regards
Hugues
Hi! I have one question. I used dataset of QM9(10000 datasets of SMILES and properties).
I tried to do intro_to_chemvae.py , but errors happened as below.
File "intro_to_chemvae.py", line 19, in
vae = VAEUtils(directory='../models/QM9')
File "/home/anaconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/vae_utils.py", line 52, in init
File "/home/anaconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/vae_utils.py", line 61, in estimate_estandarization
File "/home/anaconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/vae_utils.py", line 289, in random_molecules
File "/home/anaconda3/envs/chemvae/lib/python3.6/random.py", line 320, in sample
raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
This error can be considered that QM9 datasets is fewer than zinc datasets..
I would like you to tell me if you know..
Best wishes
Hi,
I have read your paper and think it is a wonderful work. In your paper, it referred a Gaussian process can be used in the latent space for optimizing the molecule for a specific property? Is there an example for this? Thank you
Ji
Firstly, the requirements cannot be installed through conda because they are too outdated.
Keras always returns "AttributeError: can't set attribute" when trying to load the Decoder. This is due to the chemvae/tgru_k2_gpu.py Layer.
self.units = .. should be changed to any other names because Keras/Tensorflow does not allow to set attributes from the super class.
The same goes for the variable self.recurrent_dropout
After fixing both, I still get an error Layer decoder_tgru expects 3 inputs, but it received 2 input tensors. Input received: [<tf.Tensor 'decoder_gru2_1/transpose_1:0' shape=(None, 120, 488) dtype=float32>, <tf.Tensor 'decoder_true_seq_input_2:0' shape=(None, 120, 35) dtype=float32>]
I still cannot figure out what is wrong with layer. Should I use former TF, Keras Versions to make it work?
When I execute the command python -m chemvae.train_vae, I get the following error:
Traceback (most recent call last):
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 14, in swig_im
port_helper
return importlib.import_module(mname)
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\imp
ortlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
File "<frozen importlib._bootstrap>", line 571, in module_from_spec
File "<frozen importlib._bootstrap_external>", line 922, in create_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: DLL load failed with error code -1073741795
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 17, in <module
>
_pywrap_tensorflow_internal = swig_import_helper()
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 16, in swig_im
port_helper
return importlib.import_module('_pywrap_tensorflow_internal')
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\imp
ortlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named '_pywrap_tensorflow_internal'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\__init__.py", line 24, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-im
port
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 14, in swig_im
port_helper
return importlib.import_module(mname)
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\imp
ortlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
File "<frozen importlib._bootstrap>", line 571, in module_from_spec
File "<frozen importlib._bootstrap_external>", line 922, in create_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: DLL load failed with error code -1073741795
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 17, in <module
>
_pywrap_tensorflow_internal = swig_import_helper()
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 16, in swig_im
port_helper
return importlib.import_module('_pywrap_tensorflow_internal')
File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\imp
ortlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named '_pywrap_tensorflow_internal'
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/install_sources#common_installation_probl
ems
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
>>>
executing vae = VAEUtils(directory='../models/zinc_properties')
yields this error
AttributeError Traceback (most recent call last)
in ()
----> 1 vae = VAEUtils(directory='../models/zinc_properties')
/home/rad/miniconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/vae_utils.py in init(self, exp_file, encoder_file, decoder_file, directory)
35 self.indices_char = dict((i, c) for i, c in enumerate(chars))
36 # encoder, decoder
---> 37 self.enc = load_encoder(self.params)
38 self.dec = load_decoder(self.params)
39 self.encode, self.decode = self.enc_dec_functions()
/home/rad/miniconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/models.py in load_encoder(params)
77 # return encoder
78 # !# not sure if this is the right format
---> 79 return load_model(params['encoder_weights_file'])
80
81
/home/rad/miniconda3/envs/chemvae/lib/python3.6/site-packages/keras/models.py in load_model(filepath, custom_objects, compile)
230 if model_config is None:
231 raise ValueError('No model found in config file.')
--> 232 model_config = json.loads(model_config.decode('utf-8'))
233 model = model_from_config(model_config, custom_objects=custom_objects)
234
AttributeError: 'str' object has no attribute 'decode'
Dear jnwei:
I am interesting in real-world application of Bayesian Optimization. I see that in the introduction of your paper , you state " Gradient-based optimization can be combined with Bayesian optimization methods to select compounds that are likely to be informative about the global optimum." and you also say you use GP to optimize the surrogate property predictor. I think the description in the paper is a little bit vague that I can't recognize how you actually use GP, could you specify the file that you use the GP? I really can't find it.
Thanks
Wei-Cheng
rdkit is only supported by python 3.7 and above, but tensorflow 1.1 is only supported up to python 3.5. I cant install both packages in an environment at the same time.
project was deleted. find an interesting fork here https://github.com/KnightTec/chemical_vae
When we are training a million molecules should we keep the limit_data as 5000 or we change ? What are the parameters affecting in training a set of 1 million ?
Does this package support GPU for training and if yes, how can we set that up?
I found the following code on top of the 'train_vae.py' file, but they were commented and so not used. Also there were no more information about it in the documentation. So, overall it was vague and I was not sure if GPU is supported.
from gpu_utils import pick_gpu_lowest_memory
gpu_free_number = str(pick_gpu_lowest_memory())
#
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '{}'.format(gpu_free_number)
Thank you in advance!
Hi. I am trying to implement your model using the code line as follows:
vae = VAEUtils(directory='C:\Users\Autoencoder\chemical_vae-master\models\zinc_properties')
but however it results in the error:
~\Autoencoder\chemical_vae-master\chemvae\vae_utils.py in init(self, exp_file, encoder_file, decoder_file, directory)
55 # encoder, decoder
56 self.enc = load_encoder(self.params)
---> 57 self.dec = load_decoder(self.params)
58 self.encode, self.decode = self.enc_dec_functions()
59 self.data = None
~\Autoencoder\chemical_vae-master\chemvae\vae_utils.py in load_decoder(params)
21 def load_decoder(params):
22 if params['do_tgru']:
---> 23 return load_model(params['decoder_weights_file'], custom_objects={'TerminalGRU': TerminalGRU})
24 else:
25 return load_model(params['decoder_weights_file'])
NameError: name 'TerminalGRU' is not defined
Could you tell me, what should I do to make it work?
Do you have some codes which can generate the similar figures in your paper? Many thanks.
While running the train_vae
script, apparently my GPU isn't being used (the CPU usage is 300%+, but the GPU seems to be unused). My keras.json
file specifies that the backend is tensorflow, and the KERAS_BACKEND
env variable is also set to tensorflow. Is there something else I can do to use my GPU for training?
Hi, I am trying to have a test run in the zinc_properties and trained about 100 molecules from the csv. After that I have several *.h5 files. When I went to the intro_to_chemvae.ipynb, I have the following errors.
NameError Traceback (most recent call last)
in
1 smiles_1 = mu.canon_smiles('CSCC(=O)NNC(=O)c1c(C)oc(C)c1C')
2
----> 3 X_1 = vae.smiles_to_hot(smiles_1,canonize_smiles=True)
4 z_1 = vae.encode(X_1)
5 X_r= vae.decode(z_1)
NameError: name 'vae' is not defined.
Everything else seems fine, can anyone help me find out where goes wrong?
The function called estimate_estandarization(self) has an issue because it encounters a nan or empty smiles and stop. This can be removed temporary using the following try:
try:
sub_smiles = [smiles[i] for i in chunk]
one_hot = self.smiles_to_hot(sub_smiles)
Z[chunk, :] = self.encode(one_hot, False)
except ValueError:
print(len(sub_smiles))
print(sub_smiles)
print(one_hot.shape)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.