thunlp-mt / thumt Goto Github PK

View Code? Open in Web Editor NEW

691.0 34.0 198.0 4.34 MB

An open-source neural machine translation toolkit developed by Tsinghua Natural Language Processing Group

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

neural-machine-translation machine-translation deep-learning

thumt's Introduction

THUMT: An Open Source Toolkit for Neural Machine Translation

Introduction
Online Demo
Implementations
Notable Features
Documentation
License
Citation
Development Team
Contact
Derivative Repositories

Introduction

Machine translation is a natural language processing task that aims to translate natural languages using computers automatically. Recent several years have witnessed the rapid development of end-to-end neural machine translation, which has become the new mainstream method in practical MT systems.

THUMT is an open-source toolkit for neural machine translation developed by the Natural Language Processing Group at Tsinghua University. The website of THUMT is: http://thumt.thunlp.org/.

Online Demo

The online demo of THUMT is available at http://translate.thumt.cn/. The languages involved include Ancient Chinese, Arabic, Chinese, English, French, German, Indonesian, Japanese, Portuguese, Russian, and Spanish.

Implementations

THUMT has currently three main implementations:

THUMT-PyTorch: a new implementation developed with PyTorch. It implements the Transformer model (Transformer) (Vaswani et al., 2017).
THUMT-TensorFlow: an implementation developed with TensorFlow. It implements the sequence-to-sequence model (Seq2Seq) (Sutskever et al., 2014), the standard attention-based model (RNNsearch) (Bahdanau et al., 2014), and the Transformer model (Transformer) (Vaswani et al., 2017).
THUMT-Theano: the original project developed with Theano, which is no longer updated because MLA put an end to Theano. It implements the standard attention-based model (RNNsearch) (Bahdanau et al., 2014), minimum risk training (MRT) (Shen et al., 2016) for optimizing model parameters with respect to evaluation metrics, semi-supervised training (SST) (Cheng et al., 2016) for exploiting monolingual corpora to learn bi-directional translation models, and layer-wise relevance propagation (LRP) (Ding et al., 2017) for visualizing and anlayzing RNNsearch.

The following table summarizes the features of three implementations:

Implementation	Model	Criterion	Optimizer	LRP
Theano	RNNsearch	MLE, MRT, SST	SGD, AdaDelta, Adam	RNNsearch
TensorFlow	Seq2Seq, RNNsearch, Transformer	MLE	Adam	RNNsearch, Transformer
PyTorch	Transformer	MLE	SGD, Adadelta, Adam	N.A.

We recommend using THUMT-PyTorch or THUMT-TensorFlow, which delivers better translation performance than THUMT-Theano. We will keep adding new features to THUMT-PyTorch and THUMT-TensorFlow.

Notable Features

Transformer (Vaswani et al., 2017)
Multi-GPU training & decoding
Multi-worker distributed training
Mixed precision training & decoding
Model ensemble & averaging
Gradient aggregation
TensorBoard for visualization

Documentation

The documentation of PyTorch implementation is avaiable at here.

License

The source code is dual licensed. Open source licensing is under the BSD-3-Clause, which allows free use for research purposes. For commercial licensing, please email [email protected].

Citation

Please cite the following paper:

Zhixing Tan, Jiacheng Zhang, Xuancheng Huang, Gang Chen, Shuo Wang, Maosong Sun, Huanbo Luan, Yang Liu. THUMT: An Open Source Toolkit for Neural Machine Translation. AMTA 2020.

Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, Yang Liu. 2017. THUMT: An Open Source Toolkit for Neural Machine Translation. arXiv:1706.06415.

Development Team

Project leaders: Maosong Sun, Yang Liu, Huanbo Luan

Project members:

Theano: Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng

TensorFlow: Zhixing Tan, Jiacheng Zhang, Xuancheng Huang, Gang Chen, Shuo Wang, Zonghan Yang

PyTorch: Zhixing Tan, Gang Chen

Contact

If you have questions, suggestions and bug reports, please email [email protected].

Derivative Repositories

UCE4BT (Improving Back-Translation with Uncertainty-based Confidence Estimation)
L2Copy4APE (Learning to Copy for Automatic Post-Editing)
Document-Transformer (Improving the Transformer Translation Model with Document-Level Context)
PR4NMT (Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization)

thumt's People

Contributors

Stargazers

Watchers

Forkers

oliviershi liyc7711 vicapple22 hfxunlp mtfelix cvml dvector89 mingmingyang leme7 oliverwu1996 zgsxwsdxg mengqhui stevenlol benjamesbabala glaceon31 zabin10 csxqli hanksantford mellonguan michael-gm cyzhangathit hexingwei kaharjan profding-bh frankatmech airobotzhang yueqi-zhang cosmmb dingyz12 zhaocq-nlp halidanmu howardchenhd tsingcoo 15091444119 lanyujt zaenal1981abidin zwjyyc yudianer lavine24 joelau94 curioustauseef renhongkai whr94621 douglas01996 caymoon harlixxy guodongxie leechung kelleyyin qcl6355 coldison hulalazz iaep kietleo tonybb9089 czhiming pemywei xingluxi microstrongruan shaohuikuang alvations ashang eva0071 edwardzh arvidzt sy950921 grittychen cherriewang97 wurentidai kurakimai w1994wl202 corner4world yufish oztc batermj zqma2 scottlinzy zpvip chenmoshushi nsdown wanyu2018umac caoqian1995 poethan felixgithub2017 softengers flyfree5 lqchencode minorfox zmskye hekaistorm raburabu91 xiaotaw lihongzheng-nlp baosongyang mqlove blueworm7 liujinyanliu gaokaihui dghcs jingzhoujinga

thumt's Issues

batch size of Minimum risk training

Hi, It seems that the batch size of Minimum risk training should be set to 1, this would make the training much slower if the dataset is very large, is there anyway to avoid this?

Implementation Question: why scaling the embeddings by square root of hidden size?

Do this influence the final performance? I haven't see it in Google's implementation. Many thanks!

if params.multiply_embedding_mode == "sqrt_depth":
        inputs = inputs * (hidden_size ** 0.5)

has the latest version added MultiGPU training module?

That's an important and efficient training trick to speed the training, I saw your teams' paper before which said that would implement multiGPU function soon , however , I can not find any description in your readme file~

confirmation: only Adam optim is implemented at this time

based on the code:
https://github.com/thumt/THUMT/blob/81b16687b7ddd0a194a3acc2b5f026768afc6492/thumt/bin/trainer.py#L341

I remember there are others in the older version?

ANN加速

请问你们打算实现ANN加速吗？

How to decode beam-size sentences?

I follow the UserManual to train and decode one translation mission, but i find out the decoding file only has one decoded sentence even i set the beam size to 5. How can i decode beam-size sentences?
This is my running script for decoding.

python THUMT/thumt/bin/translator.py
--models transformer
--input newstest2015.tc.32k.de
--output newstest2015.trans
--vocabulary vocab.32k.de.txt vocab.32k.en.txt
--checkpoints train/eval
--parameters=device_list=[0],beam_size=5

需要什么样的硬件配置？

我两块tesla K40M ，显存12G，一运行就卡死
多次都是到这个状态：
INFO:tensorflow:seq2seq/softmax/matrix shape (1000, 878534) INFO:tensorflow:seq2seq/source_embedding/bias shape (1000,) INFO:tensorflow:seq2seq/source_embedding/embedding shape (1804778, 1000) INFO:tensorflow:seq2seq/target_embedding/bias shape (1000,) INFO:tensorflow:seq2seq/target_embedding/embedding shape (878534, 1000) INFO:tensorflow:Total trainable variables size: 3626758534 INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Create EvaluationHook. INFO:tensorflow:Graph was finalized. 2018-05-30 10:42:22.524117: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
前几次系统自动kill进程，最后一次直接卡住不动了

请问怎样确定模型训练结束？

confuse about vocab.py/get_control_mapping

Great project! Thanks for your contribution.
When I read through the whole project. I am confused about the get_control_mapping method on vocab.py. This method seems like to convert the vocabulary table as a list to a vocabulary mapping as a dictionary. But the parameter "symbols" is not used in this method since the vocabulary table already has , and . Even those symbols do not exists in vocabulary table, this method can not add those to the mapping dictionary.

Compatibility Tips for TensorFlow-1.4.0

I use python2.7,tensorflow==1.4.0.
The error is :
me: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Traceback (most recent call last):
File "thumt/bin/trainer.py", line 466, in
main(parse_args())
File "thumt/bin/trainer.py", line 459, in main
sess.run_step_fn(restore_fn)
AttributeError: 'MonitoredSession' object has no attribute 'run_step_fn'

We look forward to your reply.Thank you very much.

The error of decoding

(venv-2.7.14) ubuntu@ubuntu:~/python2.7/tensorflow$ sed -r ’s/(@@ )|(@@ ?$)//g’ < newstest2015.trans > newstest2015.trans.norm bash: syntax error near unexpected token ('
`
How can i solve this problem？
Thanks！

How about the performence?

I reproduced this experiment with rnnsearch and seq2seq model ,I got the avarege bleu is 37.354 (rnnsearch) and 21.774(seq2seq) with zh-en corpus. I want to know how about the performence and my experiment baseline score is normal or not ? the seq2seq score is a little low .therefore ,can you given your performance ?

Can I continue the training process?

I want to continue to train my model based on my last training. How could I do that? Thanks!

文档中decoding一节(3.3节)有个笔误

1 $ python THUMT/thumt/bin/translator.py --models transformer
2 --input newstest2015.tc.32k.de --output newstest2015.trans
3 --vocabulary vocab.32k.de vocab.32k.en --checkpoints train/eval
4 --parameters=device_list[0]

第4行这里应该为 --parameters=device_list=[0]

GPU support

我们用tesla k80 单个gpu进行测试，代码为
sudo python trainer.py --config-file config/THUMT.config --trn-src-file data/train.src --trn-trg-file data/train.trg --vld-src-file data/valid.src --vld-trg-file data/valid.trg --device gpu0 --replace-unk

程序能运行，但是，用nividia-smi检测，发现gpu没有启用，

something is wrong when I give it params.output argument,it is can't correct copying model to path/to/eval.

hooks.py ling 282

`
if added is not None:
old_path = os.path.join(self._base_dir, added)
new_path = os.path.join(self._save_path, added)
old_files = tf.gfile.Glob(old_path + "*")
tf.logging.info("Copying %s to %s" % (old_path, new_path))

follow information is a simple reproduce the problem ,may be help for you to confirm

`
import os
added = '/data/xqzhou/new_thumt_Dir/model/transformer-baseline-1gpu/model.ckpt-2002'
old_path = os.path.join('/data/xqzhou/new_thumt_Dir/model/transformer-baseline-1gpu/model.ckpt-2002', added)
new_path = os.path.join('/data/xqzhou/new_thumt_Dir/model/transformer-baseline-1gpu/eval', added)

print("old_path:%s new_path:%s"%(old_path, new_path))
`

if my post is correct ,the following can fix this wrong information ,may be useful for later users.

hooks.py ling 282

if added is not None: old_path = os.path.join(self._base_dir, os.path.basename(added)) new_path = os.path.join(self._save_path, os.path.basename(added)) old_files = tf.gfile.Glob(old_path + "*") tf.logging.info("Copying %s to %s" % (old_path, new_path))

fast_align

hello, would you offer the code for the aligner definded in the mapping.py :
aligner = '~/fast_align/build/fast_align'
Thanks.

ANN 加速

BLEU scores

Hi，the training has been finished, but when I run the follow command, there is an error. Would you mind giving me some guide? Thanks!

(venv-2.7.14) ubuntu@ubuntu:~/python2.7/tensorflow$ multi-bleu.perl -lc newstest2015.tc.en < newstest2015.trans.norm > evalResult multi-bleu.perl: command not found (venv-2.7.14) ubuntu@ubuntu:~/python2.7/tensorflow$ ls THUMT corpus.tc.en newstest2014-deen-ref.en.sgm newstest2015-deen-ref.en.sgm newstest2015.trans newstest2016.tc.en train vocab.de bpe32k dev.tgz newstest2014-deen-src.de.sgm newstest2015-deen-src.de.sgm newstest2015.trans.norm nohup.out train_2 vocab.en corpus.tc.32k.de evalResult newstest2014-deen-src.en.sgm newstest2015-ende-ref.de.sgm newstest2016-deen-ref.en.sgm requirment.txt train_bak corpus.tc.32k.de.shuf info.txt newstest2014.tc.32k.de newstest2015-ende-src.en.sgm newstest2016-deen-src.de.sgm run.txt train_seq2seq corpus.tc.32k.en mteval-14.pl newstest2014.tc.32k.en newstest2015.tc.32k.de newstest2016-ende-ref.de.sgm subword-nmt true.tgz corpus.tc.32k.en.shuf multi-bleu.perl newstest2014.tc.de newstest2015.tc.de newstest2016-ende-src.en.sgm tarin_1 vocab.32k.de.txt corpus.tc.de newstest2014-deen-ref.de.sgm newstest2014.tc.en newstest2015.tc.en newstest2016.tc.de test.py vocab.32k.en.txt
When I tried like this, another error is coming
(venv-2.7.14) ubuntu@ubuntu:~/python2.7/tensorflow$ perl multi-bleu.perl -lc newstest2015.tc.en < newstest2015.trans.norm > evalResult Use of uninitialized value in division (/) at multi-bleu.perl line 139, <STDIN> line 2169. Use of uninitialized value in division (/) at multi-bleu.perl line 139, <STDIN> line 2169. It is in-advisable to publish scores from multi-bleu.perl. The scores depend on your tokenizer, which is unlikely to be reproducible from your paper or consistent across research groups. Instead you should detokenize then use mteval-v14.pl, which has a standard tokenization. Scores from multi-bleu.perl can still be used for internal purposes when you have a consistent tokenizer.

The error of trainer

There is my command as follow . What cause the error and how to solve it ?
mmyin@amax:/data2/mmyin/experiment$ python THUMT/thumt/bin/trainer.py --input ch.txt.32k.shuf en.txt.32k.shuf --vocabulary vocab.32k.ch.txt vocab.32k.en.txt --model rnnsearch --validation ch.volid.32k --references en.volid.32k --parameters=batch_size=128,device_lise=[2],train_steps=200000 Traceback (most recent call last): File "THUMT/thumt/bin/trainer.py", line 15, in <module> import thumt.data.cache as cache ImportError: No module named thumt.data.cache
Thanks in advance .

Decoding output format

I was run decoding command as
python thumt/bin/translator.py --models transformer --input newstest2015.tc.32k.de --output newstest2015.trans --vocabulary vocab.32k.de.txt vocab.32k.en.txt --checkpoints /home/shweta/sandip/thumt/THUMT/train/eval/model.ckpt-4000 --parameters=device_list=[0]

Here, one thing I am not understand why we create output file as a .trans extension. And how we read/open that output file. Can we open file in Text Editor? I have use NVIDIA-SMI 384.130 4gb GPU.

Is it possible to use your THUMT to take part in a tranlation cometition?

a very slight error in scripts/train.py

Hi,
THUMT is a very cool toolkit to train nmt or conversation. I just find a very slight error in scripts/train.py.
print 'The training started at ' + tb.strftime("%Y-%m-%d %H:%M:%S") + \ ' and ended at ' + te.strftime("%Y-%m-%d %H:%M:%S") + \ '. The total training time is %.2f hour(s).' % ( (te - tb).seconds / 3600.0)
in fact , here (te-tb) is an instance of data.timedelta. And (te-tb).seconds just cal the seconds excluding the days.
So after training a task, thumt currently just display a time less than 24hrs.

MRT in transformer

Is that possible to add MRT in transformer model? Or could you tell how to inference sentence during training (inference training batch during training)?

训练过程中会出现内存爆炸的问题

在data.py中的这个函数
def format_num(self, x):
result = str(round(x, 2))
while result[-3] != '.':
result += '0'
return result
中，如果x不是一个浮点数，比如我遇到的情况，是x是一个inf，那么这个地方就会出现一个死循环，会导致result无限变长，内存会变大，直到被kill

SST with monolingual target data only?

I'm trying to use the Semi-Supervised Training on the theano branch, but only with monolingual data from the target language (and no monolingual data from the source language).
They do the same in Cheng et al. (2016), but it seems like THUMT requires monolingual data from both languages. Am I missing something or is there an easy way to do this without changing a lot of code?
Thank you!

cache的作用

你好，

请问cache的作用是什么？cache通过update_cycle调节，为什么updata_cycle从1调到4后每步的训练时间变长了（大约四倍）？

Online demo not working.

I want to try the neural machine translation system to translate from Chinese-English but I am confused how to start. Can you please provide a tutorial?

tools中clip的问题

Hi,
我之前曾经问过一个训练生成对话模型（机器翻译模型未做实验）过程中经常会遇到"NaN"的问题，之前获得的建议是调整学习率，我试过很多次，但仍然会出现这种情况。我最近发现在tools.py中的clip函数中，只处理了NaN的情况，而实际发现，梯度更多的是先出现Inf的情况，进而会导致cost出现NaN的情况。而加入Inf的处理后，"NaN"的情况目前就没有出现过了。不知道这种解决方案是否是对的？

训练对话模型时，会经常出现“”WARNING] There is an NaN!“”这样的提示信息，连续若干次后，训练过程会自动停掉了。

训练对话模型时，会经常出现“”WARNING] There is an NaN!“”这样的提示信息，连续若干次后，训练过程会停掉了。出现这个情况的原因是什么？如果避免？

How to restore trained model and go on training with saved checkpoints (not using the method in translator.py)?

I use the method here by LingjiaDeng to restore checkpoint in train/eval folder. Codes are exactly below, I run it in ipython:

saver = tf.train.import_meta_graph('train/eval/model.ckpt-50000.meta')
sess = tf.Session()
saver.restore(sess, 'train/eval/model.ckpt-50000')

The error is as follow:

Caused by op u'create_train_op/gradients/parallel_1/transformer/Gather_grad/Shape', defined at:
  File "/usr/local/bin/ipython", line 11, in <module>
    sys.exit(start_ipython())
  File "/usr/local/lib/python2.7/dist-packages/IPython/__init__.py", line 119, in start_ipython
    return launch_new_instance(argv=argv, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/local/lib/python2.7/dist-packages/IPython/terminal/ipapp.py", line 355, in start
    self.shell.mainloop()
  File "/usr/local/lib/python2.7/dist-packages/IPython/terminal/interactiveshell.py", line 493, in mainloop
    self.interact()
  File "/usr/local/lib/python2.7/dist-packages/IPython/terminal/interactiveshell.py", line 484, in interact
    self.run_cell(code, store_history=True)
  File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-9adf313a28f3>", line 1, in <module>
    saver = tf.train.import_meta_graph('train_bpe_nosrctgtpos/eval/model.ckpt-50000.meta')
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1810, in import_meta_graph
    **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/meta_graph.py", line 660, in import_scoped_meta_graph
    **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/meta_graph.py", line 660, in import_scoped_meta_graph
    producer_op_list=producer_op_list)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 313, in import_graph_def
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Cannot colocate nodes 'create_train_op/gradients/parallel_1/transformer/Gather_grad/Shape' and 'create_tr
ain_op/Adam/update_transformer/source_embedding/sub_3/x: Cannot merge devices with incompatible ids: '/device:GPU:1' and '/device:GPU:0'
         [[Node: create_train_op/gradients/parallel_1/transformer/Gather_grad/Shape = Const[_class=["loc:@parallel_1/transformer/Gather", "loc:@transform
er/source_embedding"], dtype=DT_INT64, value=Tensor<type: int64 shape: [2] values: 34673 512>, _device="/device:GPU:1"]()]]

What is this error? Are there more elegant way to restore the training from pre-trained model parameters?

I found out that if I use a single gpu to train the model, so in parallel_model no data parallelism, the checkpoint can be successfully reloaded through my above way. Is that a problem?

Thanks very much.

Character-based NMT

Do you add features that do character-based translation? Or, which papers have you referenced?

Monolingual data never used??

I dont think the monolingual data is ever used. It is loaded into the array self.target_mono (data.py) but never read from again.

I checked this by printing out xm, ym = data.next_mono() (train.py) and found none of my monolingual data was ever within it.

Additionally (im not sure it matters) but I believe if self.config['sort_batches'] is even it will never enter the (next_mono, data.py)if not self.inited or self.batch_id == self.config['sort_batches'] within (thus never adding monolingual data), if it is odd it will never enter (next, data.py) if not self.inited or self.batch_id == self.config['sort_batches'], never adding bilingual data.

Bleu Score error

I have run the below command
sed -r 's/(@@ )|(@@ ?$)//g' < newstest2015.trans > newstest2015.trans.norm

And the run command for Bleu score as
multi-bleu.perl -lc newstest2015.tc.en < newstest2015.trans.norm > evalResult

and occur error as
multi-bleu.perl: command not found

There is any package required or any more parameters take

无法使用GPU？

CUDA一定要9.0吗？我的机器是CUDA 8.0的，并不能使用GPU训练

Segmentation fault (core dumped)

I try to run Thumt-theano for testing my dataset. However, it arised a error . Does anybody solve it ?
Thanks in advance .

the thumt transformer model no implement early stopping to prevent overfitting?

the early stopping to prevent overfitting is very useful to train a NMT model,but I didn't found it in thumt transformer.

The error of training an RNNsearch model

There is my command as follow . What cause the error and how to solve it ?
(venv-2.7.14) ubuntu@ubuntu:~/python2.7/tensorflow$ python THUMT/thumt/bin/trainer.py --input corpus.tc.32k.de.shuf corpus.tc.32k.en.shuf --vocabulary vocab.32k.de.txt vocab.32k.en.txt --model rnnsearch --validation newstest2014.tc.32k.de --references newstest2014.tc.32k.en --parameters=batch_size=128,device_list=[0],train_steps=200000 INFO:tensorflow:Restoring hyper parameters from /home/ubuntu/python2.7/tensorflow/train/params.json Traceback (most recent call last): File "THUMT/thumt/bin/trainer.py", line 472, in <module> main(parse_args()) File "THUMT/thumt/bin/trainer.py", line 317, in main params = import_params(args.output, args.model, params) File "THUMT/thumt/bin/trainer.py", line 122, in import_params params.parse_json(json_str) File "/home/ubuntu/.pyenv/versions/venv-2.7.14/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/hparam.py", line 587, in parse_json return self.override_from_dict(values_map) File "/home/ubuntu/.pyenv/versions/venv-2.7.14/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/hparam.py", line 539, in override_from_dict self.set_hparam(name, value) File "/home/ubuntu/.pyenv/versions/venv-2.7.14/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/hparam.py", line 490, in set_hparam param_type, is_list = self._hparam_types[name] KeyError: u'num_hidden_layers'
Thanks in advance .

AttributeError: 'module' object has no attribute 'rnn_cell'

Hi,

When I try "python $THUFOLDER/thumt/bin/trainer.py -h"
it reports like:

Traceback (most recent call last):
File "/training/Users/THUMT/THUMT/thumt/bin/trainer.py", line 17, in
import thumt.models as models
File "/training/Users/THUMT/THUMT/thumt/models/init.py", line 4, in
import thumt.models.seq2seq
File "/training/Users/THUMT/THUMT/thumt/models/seq2seq.py", line 11, in
import thumt.layers as layers
File "/training/Users/THUMT/THUMT/thumt/layers/init.py", line 6, in
import thumt.layers.rnn_cell
File "/training/Users/THUMT/THUMT/thumt/layers/rnn_cell.py", line 13, in
class LegacyGRUCell(tf.nn.rnn_cell.RNNCell):
AttributeError: 'module' object has no attribute 'rnn_cell'

Do you have any idea to fix it?
Thanks.

Is the batch size the same for MRT and MLE?

I tried both MRT and MLE without changing the batch size, and I got out of memory error for MRT.
It looks like the model has to process 80 * 100 samples if I set the batch size to 80 and sampleN to 100, strictly following the settings in the original paper of MRT. However, in my experience, at least 4G intermediate variables have to be stored to compute gradient when training a usable NMT model with batch size 256+. I wonder is it correct to set batch size to 80 for MRT? Or it is only for MLE?

Does THUMT support the latest version of Theano?

Hi,

Does THUMT support the latest version of Theano?

Thanks,

Tianlin

how to set layer number

How to modify the layer number of NMT, for example, 2 or 3?

embedding_size and hidden_size should be the same?

Hi,

I try to train a seq2seq model.
It seams I have to set same value for embedding_size and hidden_size.
Otherwise, it will report some errors.

Is that correct?
Thanks.

RuntimeError:Graph is finalized and cannot be modified. when validating

环境：
python 2.7.15
tensorflow 1.7.0
运行命令:
python bin/trainer.py --input mt_dataset/corpus.tc.32k.de.shuf mt_dataset/corpus.tc.32k.en.shuf --vocabulary vocab.32k.de.txt vocab.32k.en.txt --model transformer --validation mt_dataset/newstest2014.tc.32k.de --references mt_dataset/newstest2014.tc.32k.en --parameters=batch_size=6250,device_list=[3],train_steps=2000,eval_steps=100
在validating时出现如下错误:

Traceback (most recent call last):
  File "bin/trainer.py", line 472, in <module>
    main(parse_args())
  File "bin/trainer.py", line 468, in main
            "source_length": tf.placeholder(tf.int32, [None], "source_length")
    sess.run_step_fn(step_fn)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 607, in run_step_fn
    return self._sess.run_step_fn(step_fn, self._tf_sess(), run_with_hooks=None)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1038, in run_step_fn
    return self._sess.run_step_fn(step_fn, raw_session, run_with_hooks)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 957, in run_step_fn
    return step_fn(_MonitoredSession.StepContext(raw_session, run_with_hooks))
  File "bin/trainer.py", line 458, in step_fn
    return step_context.run_with_hooks(ops["train_op"])
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 634, in run_with_hooks
    return self._run_with_hooks_fn(*args, **kwargs)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1113, in run
    raise six.reraise(*original_exc_info)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1098, in run
    return self._sess.run(*args, **kwargs)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1178, in run
    run_metadata=run_metadata))
  File "/data1/xxx/gen_comment/THUMT/thumt/utils/hooks.py", line 279, in after_run
    self._session_config)
  File "/data1/xxx/gen_comment/THUMT/thumt/utils/hooks.py", line 168, in _evaluate
    print(tf.shape(outputs))
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 285, in shape
    return shape_internal(input, name, optimize=True, out_type=out_type)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 309, in shape_internal
    input_tensor = ops.convert_to_tensor(input)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 950, in convert_to_tensor
    as_ref=False)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1040, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 235, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 220, in constant
    name=name).outputs[0]
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3262, in create_op
    self._check_not_finalized()
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2896, in _check_not_finalized
    raise RuntimeError("Graph is finalized and cannot be modified.")```

打印调试发现是在这个地方出错：
utils/hooks.py
```python
        with tf.train.MonitoredSession(session_creator=sess_creator) as sess:
            while not sess.should_stop():
                feats = sess.run(features)
                outputs = sess.run(predictions, feed_dict={
                    placeholders["source"]: feats["source"],
                    placeholders["source_length"]: feats["source_length"]
                })
                # shape: [batch, len]
                outputs = outputs.tolist()
                # shape: ([batch, len], ..., [batch, len])

关于BPE的问题

1.如果不使用BPE，会对结果产生影响吗？如果有影响，bleu会降低很多吗？
2.我在tips for transformer这篇文章中看到batch_size会对结果有很大影响，请问你遇到过相关的问题吗，batch_size至少设置多少呢？(默认的batch_size在1080上显存不够用)

About train_steps

How to determine the end of the training? The parameter train_steps in train.py is 100000, but the program is still executing after 100000.Thank you very much.

对比Tensor2Tensor

你好，

我用100万的中英平行预料分别训练了THUMT的transformer_base和Tensor2Tensor的transformer_base。
THUMT设置有constant_batch_size=True,batch_size=64,update_cycle=1。t2t也是batch_size=64，他们都在单GPU上训练。

训练时，THUMT每个step的平均时间是0.63秒，t2t每个step的平均时间是0.08秒。
翻译我没有记录准确数字，大概是翻译一万行，THUMT比t2t快十几倍。

我的实验结果合理吗？哪里可能有问题？

Log file question

There is my log file , which is different with the log provided in official manual . I can't find any information aboult average cost or Bleu score .

Did I make some mistakes or ignore some details ?

my cammand is as follow :
1 #!/bin/bash 2 python ~/THUMT-theano/scripts/trainer.py --config-file ~/THUMT-theano/config/THUMT.config --trn-src-file ~/data/ch.train --trn-trg-file ~/data/en.train --vld-src-file ~/data/ch.dev --vld-trg-file ~/ data/en.dev --device gpu1

The model transformer support multi-GPU whether or not ?

I have try to reproduce the experiment with model transformer and multi-GPU ,but I found a lots of line is none in the file *trans.norm of decoding .the experiment model rnnsearch with one-gpu hasn't this phenomenon.
tensorflow/ master

what's the "labels" here in RNNsearch.py ?

Hi, I try to run the thumt with the RNNsearch model, but I found couldn't.
A variable named "labels" seems not be defined.
==========Line 307===============
ce = layers.nn.smoothed_softmax_cross_entropy_with_logits(
logits=logits,
labels=labels,
smoothing=params.label_smoothing,
normalize=True
)

thunlp-mt / thumt Goto Github PK

thumt's Introduction

THUMT: An Open Source Toolkit for Neural Machine Translation

Contents

Introduction

Online Demo

Implementations

Notable Features

Documentation

License

Citation

Development Team

Contact

Derivative Repositories

thumt's People

Contributors

Stargazers

Watchers

Forkers

thumt's Issues

hooks.py ling 282

follow information is a simple reproduce the problem ,may be help for you to confirm

if my post is correct ,the following can fix this wrong information ,may be useful for later users.

hooks.py ling 282

Recommend Projects

Recommend Topics

Recommend Org

Jobs