GithubHelp home page GithubHelp logo

wangxr0526 / retroprime Goto Github PK

View Code? Open in Web Editor NEW
33.0 33.0 6.0 40.03 MB

Code for Single-step Retrosynthesis model Retroprime

License: MIT License

Shell 4.64% Python 75.14% Jupyter Notebook 9.74% Makefile 0.08% TeX 2.82% Perl 4.62% Smalltalk 0.24% Emacs Lisp 2.15% JavaScript 0.11% NewLisp 0.20% Ruby 0.21% Slash 0.04% SystemVerilog 0.02%

retroprime's People

Contributors

wangxr0526 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

retroprime's Issues

运行 run_example.sh时,会出现RuntimeError: "index_select_out_cuda_impl" not implemented for 'Float'

Products to Synthons
0%| | 0/1 [00:00<?, ?it/s]/home/lzf/software/anaconda3/envs/seq_gr/lib/python3.7/site-packages/torchtext/data/field.py:359: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
var = torch.tensor(arr, dtype=self.dtype, device=device)
/home/lzf/programme/Retroprime/RetroPrime/retroprime/transformer_model/onmt/translate/translator.py:613: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
return torch.tensor(a, requires_grad=False)
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "retroprime/transformer_model/translate.py", line 53, in
main(opt)
File "retroprime/transformer_model/translate.py", line 34, in main
attn_debug=opt.attn_debug)
File "/home/lzf/programme/Retroprime/RetroPrime/retroprime/transformer_model/onmt/translate/translator.py", line 238, in translate
batch_data = self.translate_batch(batch, data, fast=self.fast)
File "/home/lzf/programme/Retroprime/RetroPrime/retroprime/transformer_model/onmt/translate/translator.py", line 375, in translate_batch
return self._translate_batch(batch, data)
File "/home/lzf/programme/Retroprime/RetroPrime/retroprime/transformer_model/onmt/translate/translator.py", line 712, in _translate_batch
beam_attn.data[:, j, :memory_lengths[j]])
File "/home/lzf/programme/Retroprime/RetroPrime/retroprime/transformer_model/onmt/translate/beam.py", line 140, in advance
self.attn.append(attn_out.index_select(0, prev_k))
RuntimeError: "index_select_out_cuda_impl" not implemented for 'Float'

How to make the reaction type unknown?

Hi
I am trying to see the results with our dataset when the reaction type is unknown.
Therefore, I want to know which parameter is related to reaction type when training?
In other words, what should I do if I want to make the reaction type unknown.

FileNotFoundError

Hello! When I run the run_example.sh script, it gives me a FileNotFoundError. Could you please let me know where I can obtain the files USPTO-50K_pos_pred_model_step_90000.pt and USPTO-50K_S2R_model_step_100000.pt? Thank you very much!

Always Loading Data

image
image

When I run the train.py file, loading data is always displayed. Is this normal?

How to get the data needed in the new_raw_all.csv file?

In the data directory provided, the file retrosim/retrosim/data/get_data.py, which is used to split the data_preprocessed.csv into train, validation and test sets is incomplete. It is generating error for missing function definition. How to get the test, train, validation data for compiling the new_raw_all.csv data?

Even this data: https://raw.githubusercontent.com/connorcoley/retrosim/0a272f0b5de833c448f41491e81e4dc00b4d85b0/retrosim/data/data_processed.csv
does not follow the format that retroprime needs.

Acc drops rapidly when training the P2S model in the uspto-full dataset

i would like to reproduce the result in the uspto-full dataset, but got some problems here: the accuracy of P2S drops rapidly.

i have trained over 50,000 steps and the acc was about 50%. is this normal?

[2022-08-09 14:28:33,864 INFO] encoder: 41252864
[2022-08-09 14:28:33,865 INFO] decoder: 54924817
[2022-08-09 14:28:33,865 INFO] * number of parameters: 96177681
[2022-08-09 14:28:33,889 INFO] Start training...
[2022-08-09 14:28:41,843 INFO] Loading train dataset from data/uspto_full_pos_pred/uspto_full_pos_pred.train.0.pt, number of examples: 1000000
[2022-08-09 14:28:41,844 INFO] train_iter finished
[2022-08-09 15:03:38,255 INFO] Step 1000/250000; acc:  81.60; ppl:  1.78; xent: 0.58; lr: 0.00012; 7395/7535 tok/s;   2096 sec
[2022-08-09 15:38:33,482 INFO] Step 2000/250000; acc:  95.38; ppl:  1.15; xent: 0.14; lr: 0.00025; 7380/7566 tok/s;   4192 sec
[2022-08-09 16:05:19,712 INFO] Loading train dataset from data/uspto_full_pos_pred/uspto_full_pos_pred.train.1.pt, number of examples: 1000000
[2022-08-09 16:13:46,337 INFO] Step 3000/250000; acc:  97.17; ppl:  1.09; xent: 0.09; lr: 0.00037; 7480/7610 tok/s;   6304 sec
[2022-08-09 16:48:51,050 INFO] Step 4000/250000; acc:  96.94; ppl:  1.11; xent: 0.11; lr: 0.00049; 7355/7484 tok/s;   8409 sec
[2022-08-09 17:23:55,865 INFO] Step 5000/250000; acc:  96.57; ppl:  1.11; xent: 0.10; lr: 0.00062; 7270/7407 tok/s;  10514 sec
[2022-08-09 17:42:44,180 INFO] Loading train dataset from data/uspto_full_pos_pred/uspto_full_pos_pred.train.2.pt, number of examples: 1000000
[2022-08-09 17:59:22,875 INFO] Step 6000/250000; acc:  96.90; ppl:  1.09; xent: 0.09; lr: 0.00074; 7399/7564 tok/s;  12641 sec
[2022-08-09 18:34:50,745 INFO] Step 7000/250000; acc:  95.93; ppl:  1.12; xent: 0.11; lr: 0.00086; 7320/7514 tok/s;  14769 sec
[2022-08-09 19:10:08,914 INFO] Step 8000/250000; acc:  96.79; ppl:  1.10; xent: 0.09; lr: 0.00099; 7289/7435 tok/s;  16887 sec
[2022-08-09 19:20:38,308 INFO] Loading train dataset from data/uspto_full_pos_pred/uspto_full_pos_pred.train.3.pt, number of examples: 1000000
[2022-08-09 19:45:17,775 INFO] Step 9000/250000; acc:  96.71; ppl:  1.11; xent: 0.10; lr: 0.00093; 7414/7583 tok/s;  18996 sec
[2022-08-09 20:20:10,783 INFO] Step 10000/250000; acc:  96.74; ppl:  1.11; xent: 0.10; lr: 0.00088; 7208/7353 tok/s;  21089 sec
[2022-08-09 20:20:10,787 INFO] Saving checkpoint experiments/checkpoints/uspto_full_pos_pred/151_uspto_full_pos_pred_model_step_10000.pt
[2022-08-09 20:55:25,013 INFO] Step 11000/250000; acc:  88.51; ppl:  1.44; xent: 0.36; lr: 0.00084; 6400/6718 tok/s;  23203 sec
[2022-08-09 21:00:21,009 INFO] Loading train dataset from data/uspto_full_pos_pred/uspto_full_pos_pred.train.4.pt, number of examples: 1000000
[2022-08-09 21:30:28,027 INFO] Step 12000/250000; acc:  92.56; ppl:  1.27; xent: 0.24; lr: 0.00081; 7225/7443 tok/s;  25306 sec
[2022-08-09 22:05:20,563 INFO] Step 13000/250000; acc:  85.15; ppl:  1.60; xent: 0.47; lr: 0.00078; 7314/7448 tok/s;  27399 sec
[2022-08-09 22:40:05,992 INFO] Step 14000/250000; acc:  62.15; ppl:  3.08; xent: 1.13; lr: 0.00075; 7438/7593 tok/s;  29484 sec
[2022-08-09 22:40:22,370 INFO] Loading train dataset from data/uspto_full_pos_pred/uspto_full_pos_pred.train.5.pt, number of examples: 1000000
[2022-08-09 23:14:50,982 INFO] Step 15000/250000; acc:  48.91; ppl:  4.95; xent: 1.60; lr: 0.00072; 7404/7537 tok/s;  31569 sec
[2022-08-09 23:49:25,286 INFO] Step 16000/250000; acc:  51.57; ppl:  4.75; xent: 1.56; lr: 0.00070; 6632/7134 tok/s;  33643 sec
[2022-08-10 00:19:41,798 INFO] Loading train dataset from data/uspto_full_pos_pred/uspto_full_pos_pred.train.6.pt, number of examples: 1000000
[2022-08-10 00:24:05,406 INFO] Step 17000/250000; acc:  52.31; ppl:  4.61; xent: 1.53; lr: 0.00068; 7293/7659 tok/s;  35724 sec
[2022-08-10 00:58:41,297 INFO] Step 18000/250000; acc:  50.13; ppl:  4.72; xent: 1.55; lr: 0.00066; 6996/7350 tok/s;  37799 sec
[2022-08-10 01:33:25,863 INFO] Step 19000/250000; acc:  52.59; ppl:  4.30; xent: 1.46; lr: 0.00064; 7378/7524 tok/s;  39884 sec

any reply would be greatly appreciated!

Environment setup failure

Opening this as more of a note to future readers. The current Anaconda environment setup instructions fail to solve for me and after much experimentation I was able to get a working environment like so:

conda create -n retroprime-env python=3.6 pytorch=1.5.0 torchvision torchtext cudatoolkit=10.1 -c pytorch
conda activate retroprime-env
pip install rdkit-pypi
conda install pandas tqdm six

For whatever reason, this would only resolve for me if I specify all of the PyTorch dependencies when first creating the environment like that. I found that RDKit had to be installed from pip (conda complains about conflicts).

Newer versions of PyTorch will not work and will encounter quite an opaque error to do with index data types. Additionally, the project is using legacy data structures from the older versions of torchtext.

Also note that some of these are undocumented dependencies, but you'll find you need them when trying to train and test.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.