aspuru-guzik-group / chemical_vae Goto Github PK

View Code? Open in Web Editor NEW

482.0 482.0 178.0 39.03 MB

Code for 10.1021/acscentsci.7b00572, now running on Keras 2.0 and Tensorflow

License: Apache License 2.0

Python 100.00%

chemical_vae's People

Contributors

Stargazers

Watchers

Forkers

amoliu lilleswing rgbombarelli spadavec leelasd rescalante-lilly sergeyanufriev dhidru jancajthaml ashwindcruz valaentine wj0600 shubhampachori12110095 thegodone jvaughn575 xiaoxj2 datascience4me hedgefair zhenpengyao onlybelter daishaoxing ruanyangry gucoloradoc maeve-k usccolumbia frank-lb hcji ginchung kaislash duolinwang manvithaponnapati so2jia foxtrotmike sacdallago lucyjimenez stephen2526 rayeesrahman muu4649 zhilibayev thava wangz10 yipgs nheidelberg unixjunkie cheneyyu gmorningbreakfast huangjiancong1 zhongsheng-chen rafalbachorz tonylv songsiwei aspirincode phenylazide arseeq ajayar7 jwarmitage sj-huang robmacc aksub99 abdulelahalshehri whoyouwith91 pascalnotin elgeekim jmche tjustorm awoziji roysh korney3 annbeg khushboog9 hassanmohsin zhangsushen1992 gvaladao jingxual lxlsu cameronbrown100 tslsun neerbhardwaj keshava hyeokhyen zmsunnyday iftekherm imamun93 snsie cherakhan nmrson cgh2797 yingli2009 dexiongyung jpatrick9793 evrentoptop gan-zi monge88 bhavikajain0001 shabbirk nateharms abhishekkumards robertlizatovic aasthas3 howartha

chemical_vae's Issues

Extract vector representation of molecules

Is it possible to use the program just to encode a set of molecules in an sdf file to the vector representation? I would like to use this representation as feature vectors in my code for binding affinity prediction.

runtime training the zinc dataset

Hello,

what is the average runtime for training the zince dataset (70 epochs)? I'm running this on a dual GPU workstation and a single epoch takes a very long time. Are there additional parameter settings I need to change?

Thanks!

Seperate training of reconstruction-task and property-prediction

Dear All,

In the paper it is mentioned that 250 000 drug-like molecules were used to train the autoencoder-system
And for training the Gaussian-process 2000 Molecules were used.

However the provided command line tool only provides one input.

Therefore the question: Is it possible to train the property-prediction and reconstruction-task separately, or how was this separated training achieved in the paper?

How should the code be executed, assuming we have the following two data sets?:
• a limited number of Molecules with SMILES and properties and
• a large number of Molecules with SMILES and without properties, (maybe overlapping with the smaller dataset)

Kind regards!

segmentation fault

How to train your VAE

If I have a new dataset, how can I use your code to train? It will be great if you could provide a procedure. Do you have any documentations on this code? Many thanks.

Dense layer in variational_layers is not saved and cannot be used to generate latent variables

I found the variational_layers gives the posterior log variances for the latent variables from the last encoding layer through a dense layer link. However, this dense in the variational_layers is not saved after training and thus users cannot obtain the latent posterior log variances or samples.

The example provided only uses the posterior means to decode to smiles. Can you actually encode smiles to posterior samples rather than posterior means after training?

Trouble Setting up Chemvae

I used conda to create the environment and then selected the "chemvae" interpreter in vscode. When I try to run the first cell of the given example script (all of the imports), I run into the error below, I tried removing and adding the environment again, I also tried using pycharm instead of vscode, and neither fixed the problem. I am new to using conda so please bear with me, thanks in advance.

TypeError Traceback (most recent call last)
in
3 environ['KERAS_BACKEND'] = 'tensorflow'
4 # vae stuff
----> 5 from chemvae.vae_utils import VAEUtils
6 from chemvae import mol_utils as mu
7 # import scientific py

c:\Users\yhtru\anaconda3\envs\chemvae\lib\importlib_bootstrap.py in find_and_load(name, import)

c:\Users\yhtru\anaconda3\envs\chemvae\lib\importlib_bootstrap.py in find_and_load_unlocked(name, import)

c:\Users\yhtru\anaconda3\envs\chemvae\lib\importlib_bootstrap.py in _load_unlocked(spec)

c:\Users\yhtru\anaconda3\envs\chemvae\lib\importlib_bootstrap.py in _load_backward_compatible(spec)

c:\Users\yhtru\anaconda3\envs\chemvae\lib\site-packages\chemvae-1.0.0-py3.6.egg\chemvae\vae_utils.py in
3 import random
4 import yaml
----> 5 from .models import load_encoder, load_decoder, load_property_predictor
6 import numpy as np
7 import pandas as pd

c:\Users\yhtru\anaconda3\envs\chemvae\lib\importlib_bootstrap.py in find_and_load(name, import)
...
---> 82 class ArgumentParser(Generic[_T], metaclass=_ArgumentParserCache):
83 """Base class used to parse and convert arguments.
84

TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

FileNotFoundError: [Errno 2] No such file or directory: 'zinc.json'

hi,
could you please tell me how to solve the problem? The zinc.json file exists in the folder, but the compiler reports an error when running the code and prompts that the file does not exist.

I am trying to use your model to get the encoded representation of the smiles.

When I run the command with a data-set specified in exp.json it trains the model, but does not return a file with the final encoded representation of the smiles. How can I change that?

chemical_vae install

Hi all,
I have installed chemical_vae by the following commands:**

"~/anaconda2/bin/pip install -r requirements.txt
~/anaconda2/bin/conda env create -f environment.yml
source ~/anaconda2/bin/activate chemvae
python setup.py install"

When I enter the instruction "python -m chemvae.train_vae", It gives the following errors:

_"/home/lili/anaconda2/envs/chemvae/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
/home/lili/anaconda2/envs/chemvae/lib/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Using Theano backend.
Traceback (most recent call last):
File "/home/lili/anaconda2/envs/chemvae/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/lili/anaconda2/envs/chemvae/lib/python3.6/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/home/lili/Programs/chemical_vae-master/chemvae/train_vae.py", line 30, in
from .models import encoder_model, load_encoder
File "/home/lili/Programs/chemical_vae-master/chemvae/models.py", line 10, in
from .tgru_k2_gpu import TerminalGRU
File "/home/lili/Programs/chemical_vae-master/chemvae/tgru_k2_gpu.py", line 72, in
raise NotImplemented("Backend not implemented")
TypeError: 'NotImplementedType' object is not callable"

How can I solve this problem? Thank you very much.

errors when run examples/intro_to_chemvae.ipynb

when I run the Decode several attempts part of the intro_to_chemvae.ipynb ,there come some errors

Searching molecules randomly sampled from 5.00 std (z-distance) from the point
Found 0 unique mols, out of 0
SMILES
Series([], Name: smiles, dtype: object)

AttributeError Traceback (most recent call last)
D:\Anaconda3\envs\chemvae\lib\site-packages\PIL\ImageFile.py in _save(im, fp, tile, bufsize)
481 try:
--> 482 fh = fp.fileno()
483 fp.flush()

AttributeError: '_idat' object has no attribute 'fileno'

During handling of the above exception, another exception occurred:

SystemError Traceback (most recent call last)
D:\Anaconda3\envs\chemvae\lib\site-packages\IPython\core\formatters.py in call(self, obj)
334 method = get_real_method(obj, self.print_method)
335 if method is not None:
--> 336 return method()
337 return None
338 else:

D:\Anaconda3\envs\chemvae\lib\site-packages\PIL\Image.py in repr_png(self)
655 from io import BytesIO
656 b = BytesIO()
--> 657 self.save(b, 'PNG')
658 return b.getvalue()
659

D:\Anaconda3\envs\chemvae\lib\site-packages\PIL\Image.py in save(self, fp, format, **params)
1928
1929 try:
-> 1930 save_handler(self, fp, filename)
1931 finally:
1932 # do what we can to clean up

D:\Anaconda3\envs\chemvae\lib\site-packages\PIL\PngImagePlugin.py in _save(im, fp, filename, chunk)
819
820 ImageFile._save(im, _idat(fp, chunk),
--> 821 [("zip", (0, 0)+im.size, 0, rawmode)])
822
823 chunk(fp, b"IEND", b"")

D:\Anaconda3\envs\chemvae\lib\site-packages\PIL\ImageFile.py in _save(im, fp, tile, bufsize)
488 if o > 0:
489 fp.seek(o, 0)
--> 490 e.setimage(im.im, b)
491 if e.pushes_fd:
492 e.setfd(fp)

SystemError: tile cannot extend outside image

<PIL.Image.Image image mode=RGBA size=1000x0 at 0x2100EA6F128>

trying using the conda method

python -m chemvae.train_vae
/usr/local/anaconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:523: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/anaconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:524: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/anaconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/anaconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/anaconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/anaconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:532: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Using Theano backend.
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/chemvae/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/anaconda3/envs/chemvae/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/berenger/src/chemical_vae/chemvae/train_vae.py", line 30, in <module>
    from .models import encoder_model, load_encoder
  File "/home/berenger/src/chemical_vae/chemvae/models.py", line 10, in <module>
    from .tgru_k2_gpu import TerminalGRU
  File "/home/berenger/src/chemical_vae/chemvae/tgru_k2_gpu.py", line 73, in <module>
    raise NotImplemented("Backend not implemented")
TypeError: 'NotImplementedType' object is not callable

Hello

Is there a version of this to run run tensorflow>=2.0

The code is difficult to work with

Where can I find the model prediction result?

Please where can I find the model prediction result?

Multiple install-issues under windows and linux

After hours on tinkering around on (arch) linux and not getting it to run with miniconda and optirun (running code on an intel-nvidia-laptop-gpu) I gave up after 3 hours and tried to get it running on windows.

So far I've got multiple remarks:

For me following your instructions, I end up with tensorflow 1.10, not 1.1 as you state in the readme.txt Same with keras 2.07 vs keras 2.06 -> Typo?
On windows it's just activate chemvae without source
You have a jupyter-example, but you don't require jupyter in the environment.yml, so even if it's installed natively it doesn't work
Also the train_vae NEEDS a gpu-device, yet you don't install tensorflow-gpu in the environment.yml (but in the requirements.txt, which adds to the confusion) - i found two ways that also fix the requirements for cuda+cudnn
- This is fixed by removing tensorflow fully from the requirements and after installing, running conda install tensorflow-gpu -> this leads to theano as default backend -> must change C:\Users\<username>\Anaconda3\envs\chemvae\etc\conda\activate.d\keras_activate.bat to ackend=tensorflow.
- Alternative solution: Changing it to tensorflow-gpu and running pip install --upgrade tensorflow-gpu==1.10 after install - otherwise it's not properly recognizing my gpu-device. This also requires editing the keras_activate.bat.

How to reproduce: Install the visual studio 2015 build tools on a fresh install of windows 10, then run the install script and then the python -m chemvae.train_vae

AttributeError: can't set attribute

I am using Anaconda Python 3.6 with latest Keras and other libraries. When I run
python -m chemvae.train_vae
It complains:
Traceback (most recent call last):
File "Anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "chemical_vae\chemvae\train_vae.py", line 399, in
main_no_prop(params)
File "chemical_vae\chemvae\train_vae.py", line 229, in main_no_prop
AE_only_model, encoder, decoder, kl_loss_var = load_models(params)
File "chemical_vae\chemvae\train_vae.py", line 161, in load_models
decoder = decoder_model(params)
File "chemical_vae\chemvae\models.py", line 137, in decoder_model
implementation=params['terminal_GRU_implementation'])([x_dec, true_seq_in])
File "chemical_vae\chemvae\tgru_k2_gpu.py", line 86, in init
self.units = units
AttributeError: can't set attribute

I tried keras-team/keras#7736
And changed the related lines to:
try:
self.units = units
except AttributeError:
self._units = units
try:
self.recurrent_dropout = min(1., max(0., recurrent_dropout))
except AttributeError:
self._recurrent_dropout = min(1., max(0., recurrent_dropout))

Then it complains:
ValueError: Layer decoder_tgru expects 3 inputs, but it received 2 input tensors. Input received: [<tf.Tensor 'decoder_gru2_2/transpose_1:0' shape=(?, ?, 488) dtype=float32>, <tf.Tensor 'decoder_true_seq_input_2:0' shape=(?, 120, 35) dtype=float32>]

Please help...

AttributeError: 'TerminalGRU' object has no attribute 'preprocess_input'

run script 'intro_to_chemvae.ipynb', return this :

AttributeError: in user code:

/home/sc/ml/chemical_space/chemical_vae-main/chemvae/tgru_k2_gpu.py:207 call *
preprocessed_input = self.preprocess_input(X)

AttributeError: 'TerminalGRU' object has no attribute 'preprocess_input'

anybody meet this issue?
hope for anwser or discussion

Which RDkit version is used in this project?

Which RDkit version is used in this project? I often get the following error reported when conducting tests，such as:
Traceback (most recent call last):
File "D:\VAE\chemical_vae-main\chemical_vae-main\chemvae\create.py", line 5, in
from chemvae.vae_utils import VAEUtils
File "D:\VAE\chemical_vae-main\chemical_vae-main\chemvae\vae_utils.py", line 1, in
from . import mol_utils as mu
File "D:\VAE\chemical_vae-main\chemical_vae-main\chemvae\mol_utils.py", line 4, in
from rdkit.Chem import AllChem as Chem
File "C:\Users\ASUS.conda\envs\rdkit\lib\site-packages\rdkit_init_.py", line 2, in
from .rdBase import rdkitVersion as version
ImportError: DLL load failed

bug in "intro_to_chemvae.ipynb"

The last cell should change f = pd.DataFrame(np.transpose((Z_tsne[:,0],Z_tsne[:,1]))) to
df = pd.DataFrame(np.transpose((Z_tsne[:,0],Z_tsne[:,1])))
otherwise, there is a dateframe error

Metaclass conflict

Hi! I wanted to try running the intro_to_chemvae.ipynb example, I installed the environment via Anaconda, however I encountered the following issue when importing VAEUtils from chemvae.vae_utilis:

TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

I have the same problem when running the autoencoder on the ZINC dataset as described in the example (python -m chemvae.train_vae)

Any suggestions on how to fix this? Thanks!

Simone

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in find_and_load(name, import)

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in find_and_load_unlocked(name, import)

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in _load_unlocked(spec)

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in _load_backward_compatible(spec)

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/vae_utils.py in
3 import random
4 import yaml
----> 5 from .models import load_encoder, load_decoder, load_property_predictor
6 import numpy as np
7 import pandas as pd

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in find_and_load(name, import)

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in find_and_load_unlocked(name, import)

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in _load_unlocked(spec)

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/importlib/_bootstrap.py in _load_backward_compatible(spec)

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/models.py in
----> 1 from keras.layers import Input, Lambda
2 from keras.layers.core import Dense, Flatten, RepeatVector, Dropout
3 from keras.layers.convolutional import Convolution1D
4 from keras.layers.recurrent import GRU
5 from keras.layers.normalization import BatchNormalization

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/keras/init.py in
1 from future import absolute_import
2
----> 3 from . import utils
4 from . import activations
5 from . import applications

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/keras/utils/init.py in
4 from . import data_utils
5 from . import io_utils
----> 6 from . import conv_utils
7
8 # Globally-importable utils.

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/keras/utils/conv_utils.py in
1 from six.moves import range
2 import numpy as np
----> 3 from .. import backend as K
4
5

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/keras/backend/init.py in
81 elif _BACKEND == 'tensorflow':
82 sys.stderr.write('Using TensorFlow backend.\n')
---> 83 from .tensorflow_backend import *
84 else:
85 raise ValueError('Unknown backend: ' + str(_BACKEND))

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in
----> 1 import tensorflow as tf
2 from tensorflow.python.training import moving_averages
3 from tensorflow.python.ops import tensor_array_ops
4 from tensorflow.python.ops import control_flow_ops
5 from tensorflow.python.ops import functional_ops

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/init.py in
20
21 # pylint: disable=g-bad-import-order
---> 22 from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
23
24 try:

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/init.py in
61
62 # Framework
---> 63 from tensorflow.python.framework.framework_lib import * # pylint: disable=redefined-builtin
64 from tensorflow.python.framework.versions import *
65 from tensorflow.python.framework import errors

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/framework_lib.py in
23 # Classes used when building a Graph.
24 from tensorflow.python.framework.device import DeviceSpec
---> 25 from tensorflow.python.framework.ops import Graph
26 from tensorflow.python.framework.ops import Operation
27 from tensorflow.python.framework.ops import Tensor

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in
53 from tensorflow.python.framework import versions
54 from tensorflow.python.ops import control_flow_util
---> 55 from tensorflow.python.platform import app
56 from tensorflow.python.platform import tf_logging as logging
57 from tensorflow.python.util import compat

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/platform/app.py in
22 import sys as _sys
23
---> 24 from tensorflow.python.platform import flags
25 from tensorflow.python.util.tf_export import tf_export
26

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/tensorflow/python/platform/flags.py in
23
24 # go/tf-wildcard-import
---> 25 from absl.flags import * # pylint: disable=wildcard-import
26 import six as _six
27

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/absl/flags/init.py in
33 import warnings
34
---> 35 from absl.flags import _argument_parser
36 from absl.flags import _defines
37 from absl.flags import _exceptions

~/software/pkg/miniconda3/envs/chemvae/lib/python3.6/site-packages/absl/flags/_argument_parser.py in
80
81
---> 82 class ArgumentParser(Generic[_T], metaclass=_ArgumentParserCache):
83 """Base class used to parse and convert arguments.
84

TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases

Paper

...For time-series...

Hi, I have a question about. Does this code works for time-series?

Errors with vae_utils.py

Dear friends,

I tried to implemented the codes for input smiles. It worked well for some given molecules. However, I also came across some problems I could NOT figure out.

Traceback (most recent call last):
in
z_1 = vae.encode(X_1)
File "/***/chemical_vae-master/chemvae/vae_utils.py", line 163, in encode
return self.standardize_z(self.enc.predict(X)[0])
IndexError: list index out of range

Any ideas how to fix it?

Thanks,
Cheng

Exact Python package versions

Hi again,

could you paste exact versions of Python packages which are needed to successfuly run your code using GPU?

The problem is that when installing all packages using your environment.yml and setup.py files the Python package versions are not compatible and computations are only possible using CPU.

What's the difference between the two models in your code?

In the model directory, I noted that there are two directories with respect to model. So, what's the difference between these two models?

possible issue with random sampling

Hi guys,

I have encountered the following "weird" behaviour when I sample the latent space near molecules near a SMILES: the output molecules somehow change little with the specified noise level. My installation seems to be okay, it reproduces the examples (I'm using a CPU based installation), so I wonder whether I am missing something. I provide below some examples, but it is the case for many other molecules. (For the cases here I take only 100 samples, but for "production" work I take tens of thousands, and the pattern remains)

Noise 200:
$ python get_vae_smiles.py "CSCC(=O)NNC(=O)c1c(C)oc(C)c1C" 2>/dev/null
Using standarized functions? True
Standarization: estimating mu and std values ...done!
Input : CSCC(=O)NNC(=O)c1c(C)oc(C)c1C
Reconstruction : CSCC(=O)N(C(=O)c1c(C)oc(C)c1C
Z representation : (1, 196) with norm 10.705
Searching molecules randomly sampled from 200.00 std (z-distance) from the point
Found 10 unique mols, out of 30
SMILES
0 CSCC(=O)NNC(=O)c1c(C)oc(C)c1C
1 CSC(C=O)NNC(=O)c1c(C)oc(C)c1C
2 COCC(=O)NC(C=O)c1c(C)oc(C)c1C
3 CSCC(=O)NCC(=O)c1c(C)oc(C)c1C
4 COCC(=O)NCC(=O)c1c(C)oc(C)c1C
5 CSC(C=O)NCC(=O)c1c(C)oc(C)c1C
6 COCC(=O)NCC(=O)c1c(C)oc(C)c1Cl
7 CSC(C=O)NCC(=O)c1c(F)oc(C)c1C
8 COCC(=O)NC(=O)c1cc(O)nc(C)c1C
9 C#COC(=N)NC(=O)c1ccccc(Cl)cc1Cl
Name: smiles, dtype: object

Noise 2:
Searching molecules randomly sampled from 2.00 std (z-distance) from the point
Found 13 unique mols, out of 75
SMILES
0 CSCC(=O)NNC(=O)c1c(C)oc(C)c1C
1 CSC(C=O)NNC(=O)c1c(C)oc(C)c1C
2 CSCC(=O)NC(C=O)c1c(C)oc(C)c1C
3 COCC(=O)NC(C=O)c1c(C)oc(C)c1C
4 CSCC(=O)NCC(=O)c1c(C)oc(C)c1C
5 COC(C=O)NNC(=O)c1c(C)oc(C)c1C
6 COCC(=O)NCC(=O)c1c(C)oc(C)c1C
7 CSCC(=O)NCC(=O)c1c(O)oc(C)c1C
8 CSC(C=O)NCC(=O)c1c(C)oc(C)c1C
9 CSC(C=O)NCC(=O)c1c(F)oc(C)c1C
10 COC(C=O)NCC(=O)c1c(C)oc(C)c1C
11 CSCC(=O)N/C(=O)c1c(C)oc(C)c1C
12 ClCC(=O)NCC(=O)c1c(C)oc(C)c1C
Name: smiles, dtype: object

Searching molecules randomly sampled from 50.00 std (z-distance) from the point
Found 14 unique mols, out of 65
SMILES
0 CSCC(=O)NNC(=O)c1c(C)oc(C)c1C
1 COCC(=O)NNC(=O)c1c(C)oc(C)c1C
2 CSC(C=O)NNC(=O)c1c(C)oc(C)c1C
3 COCC(=O)NC(C=O)c1c(C)oc(C)c1C
4 CSCC(=O)NCC(=O)c1c(C)oc(C)c1C
5 CSC(C=O)NC(C=O)c1c(C)oc(C)c1C
6 COCC(=O)NCC(=O)c1c(C)oc(C)c1C
7 CSC(C=O)NCC(=O)c1c(C)oc(C)c1C
8 CSC(C=O)NCC(=O)c1c(F)oc(C)c1C
9 COC(C=O)NCC(=O)c1c(C)oc(C)c1C
10 CSCC(=O)N/C(=O)c1c(C)oc(C)c1C
11 ClC(C=O)NCC(=O)c1c(C)oc(C)c1C
12 ClCC(=O)NCC(=O)c1c(C)oc(C)c1C
13 ClCC(=O)NC(C=O)c1c(C)oc(C)c1C
Name: smiles, dtype: object

So it seems that for large Z distances the SMILES are not so much different than for small distances. )What is the distribution of the random sampling? I would expect this if the random sampling is not uniform and heavily biased towards the coordinates of input SMILES, so the specified noise level affects only the peripheries, and most molecules of the output still originate from the close neighbourhood of the SMILES.

I would greatly appreciate any help with this issue.

Best wishes,
Gyorgy Abrusan

updated version of this VAE?

I'm really interested in this model but I couldn't make the codes work on my computer or on a computing cluster due to the outdated version of Tensorflow applied in this model. Is it possible to make an update or any suggestion on how I can make it work? I would really appreciate your help!

import

"Resource exhausted: OOM when allocating tensor with shape[940,196]"

When I loaded a model in the file intro_to_chemvae.ipynb, it reported as follow:

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[196,196]
	 [[Node: z_mean_sample_4/kernel/Assign = Assign[T=DT_FLOAT, _class=["loc:@z_mean_sample_4/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](z_mean_sample_4/kernel, z_mean_sample_4/random_uniform)]]

Is it due to my server's memory？
My server's memory is 16GB, CPU is Intel(R) Xeon(R) CPU E5-1603 v4 @ 2.80GHz.

Cannot reproduce the h5 files of the zinc_properties example

Hello,
I've been trying to reproduce the results of the zinc_properties provided in the default repositories ./chemical_vae/models/zinc_properties.

Basically I just cd to the zinc_properties directory and use
python3 -m chemvae.train_vae
for 120 epoch with the default exp.json file and end up with the three files zinc_decoder.h5, zinc_encoder.h5, zinc_prop_pred.h5.

Now if I try to use those files in the jupyter notebook /chemical_vae/examples/intro_to_chemvae.ipynb example, the "encode then decode test" as shown below does not work (cannot find back the original smiles encoded nor generate similar smiles using a noise of 5.0) though it all does work with the original h5 files.

# Using the VAE
## Decode/Encode 

smiles_1 = mu.canon_smiles('CSCC(=O)NNC(=O)c1c(C)oc(C)c1C')
# smiles_1 = mu.canon_smiles('Cc1cc2c(cc1S(=O)(=O)NC1CCC(C)CC1)OCCN2C')

X_1 = vae.smiles_to_hot(smiles_1,canonize_smiles=True)
z_1 = vae.encode(X_1)
X_r= vae.decode(z_1)

print('{:20s} : {}'.format('Input',smiles_1))
print('{:20s} : {}'.format('Reconstruction',vae.hot_to_smiles(X_r,strip=True)[0]))

print('{:20s} : {} with norm {:.3f}'.format('Z representation',z_1.shape, np.linalg.norm(z_1)))

Were the h5 files provided obtained using the .csv and .json files provided in the same zinc_properties github repository?

Thank you very much for your work, it is so interesting
Best Regards
Hugues

ValueError: Sample larger than population or is negative

Hi! I have one question. I used dataset of QM9(10000 datasets of SMILES and properties).
I tried to do intro_to_chemvae.py , but errors happened as below.

File "intro_to_chemvae.py", line 19, in
vae = VAEUtils(directory='../models/QM9')
File "/home/anaconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/vae_utils.py", line 52, in init
File "/home/anaconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/vae_utils.py", line 61, in estimate_estandarization
File "/home/anaconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/vae_utils.py", line 289, in random_molecules
File "/home/anaconda3/envs/chemvae/lib/python3.6/random.py", line 320, in sample
raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative

This error can be considered that QM9 datasets is fewer than zinc datasets..
I would like you to tell me if you know..

Best wishes

Is the Gaussian process for optimization of molecules included in the repository?

Hi,

I have read your paper and think it is a wonderful work. In your paper, it referred a Gaussian process can be used in the latent space for optimizing the molecule for a specific property? Is there an example for this? Thank you

Decoder expects 3 inputs but it received 2 inputs

Firstly, the requirements cannot be installed through conda because they are too outdated.
Keras always returns "AttributeError: can't set attribute" when trying to load the Decoder. This is due to the chemvae/tgru_k2_gpu.py Layer.
self.units = .. should be changed to any other names because Keras/Tensorflow does not allow to set attributes from the super class.
The same goes for the variable self.recurrent_dropout

After fixing both, I still get an error Layer decoder_tgru expects 3 inputs, but it received 2 input tensors. Input received: [<tf.Tensor 'decoder_gru2_1/transpose_1:0' shape=(None, 120, 488) dtype=float32>, <tf.Tensor 'decoder_true_seq_input_2:0' shape=(None, 120, 35) dtype=float32>]

I still cannot figure out what is wrong with layer. Should I use former TF, Keras Versions to make it work?

Running autoencoder

When I execute the command python -m chemvae.train_vae, I get the following error:

Traceback (most recent call last):
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 14, in swig_im
port_helper
    return importlib.import_module(mname)
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\imp
ortlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: DLL load failed with error code -1073741795

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 17, in <module
>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 16, in swig_im
port_helper
    return importlib.import_module('_pywrap_tensorflow_internal')
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\imp
ortlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named '_pywrap_tensorflow_internal'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-im
port
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 14, in swig_im
port_helper
    return importlib.import_module(mname)
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\imp
ortlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 658, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 922, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: DLL load failed with error code -1073741795

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 17, in <module
>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\sit
e-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 16, in swig_im
port_helper
    return importlib.import_module('_pywrap_tensorflow_internal')
  File "C:\Users\rodri647\AppData\Local\Continuum\anaconda2\envs\chemvae\lib\imp
ortlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
ModuleNotFoundError: No module named '_pywrap_tensorflow_internal'


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_probl
ems

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.
>>>

intro_to_chemvae.ipynb loading model AttributeError: 'str' object has no attribute 'decode'

executing vae = VAEUtils(directory='../models/zinc_properties')
yields this error

AttributeError Traceback (most recent call last)
in ()
----> 1 vae = VAEUtils(directory='../models/zinc_properties')

/home/rad/miniconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/vae_utils.py in init(self, exp_file, encoder_file, decoder_file, directory)
35 self.indices_char = dict((i, c) for i, c in enumerate(chars))
36 # encoder, decoder
---> 37 self.enc = load_encoder(self.params)
38 self.dec = load_decoder(self.params)
39 self.encode, self.decode = self.enc_dec_functions()

/home/rad/miniconda3/envs/chemvae/lib/python3.6/site-packages/chemvae-1.0.0-py3.6.egg/chemvae/models.py in load_encoder(params)
77 # return encoder
78 # !# not sure if this is the right format
---> 79 return load_model(params['encoder_weights_file'])
80
81

/home/rad/miniconda3/envs/chemvae/lib/python3.6/site-packages/keras/models.py in load_model(filepath, custom_objects, compile)
230 if model_config is None:
231 raise ValueError('No model found in config file.')
--> 232 model_config = json.loads(model_config.decode('utf-8'))
233 model = model_from_config(model_config, custom_objects=custom_objects)
234

AttributeError: 'str' object has no attribute 'decode'

I can't find the .py file that corresponding to optimization of molecules via properties using GP

Dear jnwei:

I am interesting in real-world application of Bayesian Optimization. I see that in the introduction of your paper ， you state " Gradient-based optimization can be combined with Bayesian optimization methods to select compounds that are likely to be informative about the global optimum." and you also say you use GP to optimize the surrogate property predictor. I think the description in the paper is a little bit vague that I can't recognize how you actually use GP， could you specify the file that you use the GP? I really can't find it.

Thanks

Wei-Cheng

Version incompatible

rdkit is only supported by python 3.7 and above, but tensorflow 1.1 is only supported up to python 3.5. I cant install both packages in an environment at the same time.

I am working in a new version of this project

project was deleted. find an interesting fork here https://github.com/KnightTec/chemical_vae

How is the limit_data used in exp.json ?

When we are training a million molecules should we keep the limit_data as 5000 or we change ? What are the parameters affecting in training a set of 1 million ?

GPU Support

Does this package support GPU for training and if yes, how can we set that up?

I found the following code on top of the 'train_vae.py' file, but they were commented and so not used. Also there were no more information about it in the documentation. So, overall it was vague and I was not sure if GPU is supported.

from gpu_utils import pick_gpu_lowest_memory
gpu_free_number = str(pick_gpu_lowest_memory())
#
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '{}'.format(gpu_free_number)

Thank you in advance!

NameError: name 'TerminalGRU' is not defined

Hi. I am trying to implement your model using the code line as follows:

vae = VAEUtils(directory='C:\Users\Autoencoder\chemical_vae-master\models\zinc_properties')

but however it results in the error:

~\Autoencoder\chemical_vae-master\chemvae\vae_utils.py in init(self, exp_file, encoder_file, decoder_file, directory)
55 # encoder, decoder
56 self.enc = load_encoder(self.params)
---> 57 self.dec = load_decoder(self.params)
58 self.encode, self.decode = self.enc_dec_functions()
59 self.data = None

~\Autoencoder\chemical_vae-master\chemvae\vae_utils.py in load_decoder(params)
21 def load_decoder(params):
22 if params['do_tgru']:
---> 23 return load_model(params['decoder_weights_file'], custom_objects={'TerminalGRU': TerminalGRU})
24 else:
25 return load_model(params['decoder_weights_file'])

NameError: name 'TerminalGRU' is not defined

Could you tell me, what should I do to make it work?

How to generate figures similar to your paper?

Do you have some codes which can generate the similar figures in your paper? Many thanks.

train_vae not picking up GPU?

While running the train_vae script, apparently my GPU isn't being used (the CPU usage is 300%+, but the GPU seems to be unused). My keras.json file specifies that the backend is tensorflow, and the KERAS_BACKEND env variable is also set to tensorflow. Is there something else I can do to use my GPU for training?

name 'vae' is not defined

Hi, I am trying to have a test run in the zinc_properties and trained about 100 molecules from the csv. After that I have several *.h5 files. When I went to the intro_to_chemvae.ipynb, I have the following errors.

NameError Traceback (most recent call last)
in
1 smiles_1 = mu.canon_smiles('CSCC(=O)NNC(=O)c1c(C)oc(C)c1C')
2
----> 3 X_1 = vae.smiles_to_hot(smiles_1,canonize_smiles=True)
4 z_1 = vae.encode(X_1)
5 X_r= vae.decode(z_1)

NameError: name 'vae' is not defined.

Everything else seems fine, can anyone help me find out where goes wrong?

vae_utils

The function called estimate_estandarization(self) has an issue because it encounters a nan or empty smiles and stop. This can be removed temporary using the following try:
try:
sub_smiles = [smiles[i] for i in chunk]
one_hot = self.smiles_to_hot(sub_smiles)
Z[chunk, :] = self.encode(one_hot, False)
except ValueError:
print(len(sub_smiles))
print(sub_smiles)
print(one_hot.shape)

aspuru-guzik-group / chemical_vae Goto Github PK

chemical_vae's People

Contributors

Stargazers

Watchers

Forkers

chemical_vae's Issues

Searching molecules randomly sampled from 5.00 std (z-distance) from the point Found 0 unique mols, out of 0 SMILES Series([], Name: smiles, dtype: object)

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Searching molecules randomly sampled from 5.00 std (z-distance) from the point
Found 0 unique mols, out of 0
SMILES
Series([], Name: smiles, dtype: object)