GithubHelp home page GithubHelp logo

thunlp-mt / thumt Goto Github PK

View Code? Open in Web Editor NEW
691.0 34.0 198.0 4.34 MB

An open-source neural machine translation toolkit developed by Tsinghua Natural Language Processing Group

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
neural-machine-translation machine-translation deep-learning

thumt's Introduction

THUMT: An Open Source Toolkit for Neural Machine Translation

Contents

Introduction

Machine translation is a natural language processing task that aims to translate natural languages using computers automatically. Recent several years have witnessed the rapid development of end-to-end neural machine translation, which has become the new mainstream method in practical MT systems.

THUMT is an open-source toolkit for neural machine translation developed by the Natural Language Processing Group at Tsinghua University. The website of THUMT is: http://thumt.thunlp.org/.

Online Demo

The online demo of THUMT is available at http://translate.thumt.cn/. The languages involved include Ancient Chinese, Arabic, Chinese, English, French, German, Indonesian, Japanese, Portuguese, Russian, and Spanish.

Implementations

THUMT has currently three main implementations:

The following table summarizes the features of three implementations:

Implementation Model Criterion Optimizer LRP
Theano RNNsearch MLE, MRT, SST SGD, AdaDelta, Adam RNNsearch
TensorFlow Seq2Seq, RNNsearch, Transformer MLE Adam RNNsearch, Transformer
PyTorch Transformer MLE SGD, Adadelta, Adam N.A.

We recommend using THUMT-PyTorch or THUMT-TensorFlow, which delivers better translation performance than THUMT-Theano. We will keep adding new features to THUMT-PyTorch and THUMT-TensorFlow.

Notable Features

  • Transformer (Vaswani et al., 2017)
  • Multi-GPU training & decoding
  • Multi-worker distributed training
  • Mixed precision training & decoding
  • Model ensemble & averaging
  • Gradient aggregation
  • TensorBoard for visualization

Documentation

The documentation of PyTorch implementation is avaiable at here.

License

The source code is dual licensed. Open source licensing is under the BSD-3-Clause, which allows free use for research purposes. For commercial licensing, please email [email protected].

Citation

Please cite the following paper:

Zhixing Tan, Jiacheng Zhang, Xuancheng Huang, Gang Chen, Shuo Wang, Maosong Sun, Huanbo Luan, Yang Liu. THUMT: An Open Source Toolkit for Neural Machine Translation. AMTA 2020.

Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, Yang Liu. 2017. THUMT: An Open Source Toolkit for Neural Machine Translation. arXiv:1706.06415.

Development Team

Project leaders: Maosong Sun, Yang Liu, Huanbo Luan

Project members:

Theano: Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng

TensorFlow: Zhixing Tan, Jiacheng Zhang, Xuancheng Huang, Gang Chen, Shuo Wang, Zonghan Yang

PyTorch: Zhixing Tan, Gang Chen

Contact

If you have questions, suggestions and bug reports, please email [email protected].

Derivative Repositories

  • UCE4BT (Improving Back-Translation with Uncertainty-based Confidence Estimation)
  • L2Copy4APE (Learning to Copy for Automatic Post-Editing)
  • Document-Transformer (Improving the Transformer Translation Model with Document-Level Context)
  • PR4NMT (Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization)

thumt's People

Contributors

alphadl avatar efanzh avatar glaceon31 avatar grittychen avatar minicheshire avatar playinf avatar shuo-git avatar stevensgeek41 avatar thudcsly avatar thumt avatar xc-kiwiberry avatar xiaoqingnlp avatar xiaotaw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

thumt's Issues

batch size of Minimum risk training

Hi, It seems that the batch size of Minimum risk training should be set to 1, this would make the training much slower if the dataset is very large, is there anyway to avoid this?

has the latest version added MultiGPU training module?

That's an important and efficient training trick to speed the training, I saw your teams' paper before which said that would implement multiGPU function soon , however , I can not find any description in your readme file~

ANN加速

请问你们打算实现ANN加速吗?

How to decode beam-size sentences?

I follow the UserManual to train and decode one translation mission, but i find out the decoding file only has one decoded sentence even i set the beam size to 5. How can i decode beam-size sentences?
This is my running script for decoding.

python THUMT/thumt/bin/translator.py
--models transformer
--input newstest2015.tc.32k.de
--output newstest2015.trans
--vocabulary vocab.32k.de.txt vocab.32k.en.txt
--checkpoints train/eval
--parameters=device_list=[0],beam_size=5

需要什么样的硬件配置?

我两块tesla K40M ,显存12G,一运行就卡死
多次都是到这个状态:
INFO:tensorflow:seq2seq/softmax/matrix shape (1000, 878534) INFO:tensorflow:seq2seq/source_embedding/bias shape (1000,) INFO:tensorflow:seq2seq/source_embedding/embedding shape (1804778, 1000) INFO:tensorflow:seq2seq/target_embedding/bias shape (1000,) INFO:tensorflow:seq2seq/target_embedding/embedding shape (878534, 1000) INFO:tensorflow:Total trainable variables size: 3626758534 INFO:tensorflow:Create CheckpointSaverHook. INFO:tensorflow:Create EvaluationHook. INFO:tensorflow:Graph was finalized. 2018-05-30 10:42:22.524117: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
前几次系统自动kill进程,最后一次直接卡住不动了

confuse about vocab.py/get_control_mapping

Great project! Thanks for your contribution.
When I read through the whole project. I am confused about the get_control_mapping method on vocab.py. This method seems like to convert the vocabulary table as a list to a vocabulary mapping as a dictionary. But the parameter "symbols" is not used in this method since the vocabulary table already has , and . Even those symbols do not exists in vocabulary table, this method can not add those to the mapping dictionary.

Compatibility Tips for TensorFlow-1.4.0

I use python2.7,tensorflow==1.4.0.
The error is :
me: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Traceback (most recent call last):
File "thumt/bin/trainer.py", line 466, in
main(parse_args())
File "thumt/bin/trainer.py", line 459, in main
sess.run_step_fn(restore_fn)
AttributeError: 'MonitoredSession' object has no attribute 'run_step_fn'

We look forward to your reply.Thank you very much.

The error of decoding

(venv-2.7.14) ubuntu@ubuntu:~/python2.7/tensorflow$ sed -r ’s/(@@ )|(@@ ?$)//g’ < newstest2015.trans > newstest2015.trans.norm bash: syntax error near unexpected token ('
`
How can i solve this problem?
Thanks!

How about the performence?

I reproduced this experiment with rnnsearch and seq2seq model ,I got the avarege bleu is 37.354 (rnnsearch) and 21.774(seq2seq) with zh-en corpus. I want to know how about the performence and my experiment baseline score is normal or not ? the seq2seq score is a little low .therefore ,can you given your performance ?

文档中decoding一节(3.3节)有个笔误

1 $ python THUMT/thumt/bin/translator.py --models transformer
2 --input newstest2015.tc.32k.de --output newstest2015.trans
3 --vocabulary vocab.32k.de vocab.32k.en --checkpoints train/eval
4 --parameters=device_list[0]

第4行这里应该为 --parameters=device_list=[0]

GPU support

我们用tesla k80 单个gpu进行测试,代码为
sudo python trainer.py --config-file config/THUMT.config --trn-src-file data/train.src --trn-trg-file data/train.trg --vld-src-file data/valid.src --vld-trg-file data/valid.trg --device gpu0 --replace-unk

程序能运行,但是,用nividia-smi检测,发现gpu没有启用,

*something is wrong* when I give it params.output argument,it is can't correct copying model to path/to/eval.

hooks.py ling 282

`
if added is not None:
old_path = os.path.join(self._base_dir, added)
new_path = os.path.join(self._save_path, added)
old_files = tf.gfile.Glob(old_path + "*")
tf.logging.info("Copying %s to %s" % (old_path, new_path))

`

follow information is a simple reproduce the problem ,may be help for you to confirm

`
import os
added = '/data/xqzhou/new_thumt_Dir/model/transformer-baseline-1gpu/model.ckpt-2002'
old_path = os.path.join('/data/xqzhou/new_thumt_Dir/model/transformer-baseline-1gpu/model.ckpt-2002', added)
new_path = os.path.join('/data/xqzhou/new_thumt_Dir/model/transformer-baseline-1gpu/eval', added)

print("old_path:%s new_path:%s"%(old_path, new_path))
`

if my post is correct ,the following can fix this wrong information ,may be useful for later users.

hooks.py ling 282

if added is not None: old_path = os.path.join(self._base_dir, os.path.basename(added)) new_path = os.path.join(self._save_path, os.path.basename(added)) old_files = tf.gfile.Glob(old_path + "*") tf.logging.info("Copying %s to %s" % (old_path, new_path))

fast_align

hello, would you offer the code for the aligner definded in the mapping.py :
aligner = '~/fast_align/build/fast_align'
Thanks.

BLEU scores

Hi,the training has been finished, but when I run the follow command, there is an error. Would you mind giving me some guide? Thanks!

(venv-2.7.14) ubuntu@ubuntu:~/python2.7/tensorflow$ multi-bleu.perl -lc newstest2015.tc.en < newstest2015.trans.norm > evalResult multi-bleu.perl: command not found (venv-2.7.14) ubuntu@ubuntu:~/python2.7/tensorflow$ ls THUMT corpus.tc.en newstest2014-deen-ref.en.sgm newstest2015-deen-ref.en.sgm newstest2015.trans newstest2016.tc.en train vocab.de bpe32k dev.tgz newstest2014-deen-src.de.sgm newstest2015-deen-src.de.sgm newstest2015.trans.norm nohup.out train_2 vocab.en corpus.tc.32k.de evalResult newstest2014-deen-src.en.sgm newstest2015-ende-ref.de.sgm newstest2016-deen-ref.en.sgm requirment.txt train_bak corpus.tc.32k.de.shuf info.txt newstest2014.tc.32k.de newstest2015-ende-src.en.sgm newstest2016-deen-src.de.sgm run.txt train_seq2seq corpus.tc.32k.en mteval-14.pl newstest2014.tc.32k.en newstest2015.tc.32k.de newstest2016-ende-ref.de.sgm subword-nmt true.tgz corpus.tc.32k.en.shuf multi-bleu.perl newstest2014.tc.de newstest2015.tc.de newstest2016-ende-src.en.sgm tarin_1 vocab.32k.de.txt corpus.tc.de newstest2014-deen-ref.de.sgm newstest2014.tc.en newstest2015.tc.en newstest2016.tc.de test.py vocab.32k.en.txt
When I tried like this, another error is coming
(venv-2.7.14) ubuntu@ubuntu:~/python2.7/tensorflow$ perl multi-bleu.perl -lc newstest2015.tc.en < newstest2015.trans.norm > evalResult Use of uninitialized value in division (/) at multi-bleu.perl line 139, <STDIN> line 2169. Use of uninitialized value in division (/) at multi-bleu.perl line 139, <STDIN> line 2169. It is in-advisable to publish scores from multi-bleu.perl. The scores depend on your tokenizer, which is unlikely to be reproducible from your paper or consistent across research groups. Instead you should detokenize then use mteval-v14.pl, which has a standard tokenization. Scores from multi-bleu.perl can still be used for internal purposes when you have a consistent tokenizer.

The error of trainer

There is my command as follow . What cause the error and how to solve it ?
mmyin@amax:/data2/mmyin/experiment$ python THUMT/thumt/bin/trainer.py --input ch.txt.32k.shuf en.txt.32k.shuf --vocabulary vocab.32k.ch.txt vocab.32k.en.txt --model rnnsearch --validation ch.volid.32k --references en.volid.32k --parameters=batch_size=128,device_lise=[2],train_steps=200000 Traceback (most recent call last): File "THUMT/thumt/bin/trainer.py", line 15, in <module> import thumt.data.cache as cache ImportError: No module named thumt.data.cache
Thanks in advance .

Decoding output format

I was run decoding command as
python thumt/bin/translator.py --models transformer --input newstest2015.tc.32k.de --output newstest2015.trans --vocabulary vocab.32k.de.txt vocab.32k.en.txt --checkpoints /home/shweta/sandip/thumt/THUMT/train/eval/model.ckpt-4000 --parameters=device_list=[0]

Here, one thing I am not understand why we create output file as a .trans extension. And how we read/open that output file. Can we open file in Text Editor? I have use NVIDIA-SMI 384.130 4gb GPU.

a very slight error in scripts/train.py

Hi,
THUMT is a very cool toolkit to train nmt or conversation. I just find a very slight error in scripts/train.py.
print 'The training started at ' + tb.strftime("%Y-%m-%d %H:%M:%S") + \ ' and ended at ' + te.strftime("%Y-%m-%d %H:%M:%S") + \ '. The total training time is %.2f hour(s).' % ( (te - tb).seconds / 3600.0)
in fact , here (te-tb) is an instance of data.timedelta. And (te-tb).seconds just cal the seconds excluding the days.
So after training a task, thumt currently just display a time less than 24hrs.

MRT in transformer

Is that possible to add MRT in transformer model? Or could you tell how to inference sentence during training (inference training batch during training)?

训练过程中会出现内存爆炸的问题

在data.py中的这个函数
def format_num(self, x):
result = str(round(x, 2))
while result[-3] != '.':
result += '0'
return result
中,如果x不是一个浮点数,比如我遇到的情况,是x是一个inf,那么这个地方就会出现一个死循环,会导致result无限变长,内存会变大,直到被kill

SST with monolingual target data only?

I'm trying to use the Semi-Supervised Training on the theano branch, but only with monolingual data from the target language (and no monolingual data from the source language).
They do the same in Cheng et al. (2016), but it seems like THUMT requires monolingual data from both languages. Am I missing something or is there an easy way to do this without changing a lot of code?
Thank you!

cache的作用

你好,

请问cache的作用是什么?cache通过update_cycle调节,为什么updata_cycle从1调到4后每步的训练时间变长了(大约四倍)?

Online demo not working.

I want to try the neural machine translation system to translate from Chinese-English but I am confused how to start. Can you please provide a tutorial?

tools中clip的问题

Hi,
我之前曾经问过一个训练生成对话模型(机器翻译模型未做实验)过程中经常会遇到"NaN"的问题,之前获得的建议是调整学习率,我试过很多次,但仍然会出现这种情况。我最近发现在tools.py中的clip函数中,只处理了NaN的情况,而实际发现,梯度更多的是先出现Inf的情况,进而会导致cost出现NaN的情况。而加入Inf的处理后,"NaN"的情况目前就没有出现过了。不知道这种解决方案是否是对的?

How to restore trained model and go on training with saved checkpoints (not using the method in translator.py)?

I use the method here by LingjiaDeng to restore checkpoint in train/eval folder. Codes are exactly below, I run it in ipython:

saver = tf.train.import_meta_graph('train/eval/model.ckpt-50000.meta')
sess = tf.Session()
saver.restore(sess, 'train/eval/model.ckpt-50000')

The error is as follow:

Caused by op u'create_train_op/gradients/parallel_1/transformer/Gather_grad/Shape', defined at:
  File "/usr/local/bin/ipython", line 11, in <module>
    sys.exit(start_ipython())
  File "/usr/local/lib/python2.7/dist-packages/IPython/__init__.py", line 119, in start_ipython
    return launch_new_instance(argv=argv, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/local/lib/python2.7/dist-packages/IPython/terminal/ipapp.py", line 355, in start
    self.shell.mainloop()
  File "/usr/local/lib/python2.7/dist-packages/IPython/terminal/interactiveshell.py", line 493, in mainloop
    self.interact()
  File "/usr/local/lib/python2.7/dist-packages/IPython/terminal/interactiveshell.py", line 484, in interact
    self.run_cell(code, store_history=True)
  File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-9adf313a28f3>", line 1, in <module>
    saver = tf.train.import_meta_graph('train_bpe_nosrctgtpos/eval/model.ckpt-50000.meta')
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1810, in import_meta_graph
    **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/meta_graph.py", line 660, in import_scoped_meta_graph
    **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/meta_graph.py", line 660, in import_scoped_meta_graph
    producer_op_list=producer_op_list)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 313, in import_graph_def
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Cannot colocate nodes 'create_train_op/gradients/parallel_1/transformer/Gather_grad/Shape' and 'create_tr
ain_op/Adam/update_transformer/source_embedding/sub_3/x: Cannot merge devices with incompatible ids: '/device:GPU:1' and '/device:GPU:0'
         [[Node: create_train_op/gradients/parallel_1/transformer/Gather_grad/Shape = Const[_class=["loc:@parallel_1/transformer/Gather", "loc:@transform
er/source_embedding"], dtype=DT_INT64, value=Tensor<type: int64 shape: [2] values: 34673 512>, _device="/device:GPU:1"]()]]

What is this error? Are there more elegant way to restore the training from pre-trained model parameters?

I found out that if I use a single gpu to train the model, so in parallel_model no data parallelism, the checkpoint can be successfully reloaded through my above way. Is that a problem?

Thanks very much.

Character-based NMT

Do you add features that do character-based translation? Or, which papers have you referenced?

Monolingual data never used??

I dont think the monolingual data is ever used. It is loaded into the array self.target_mono (data.py) but never read from again.

I checked this by printing out xm, ym = data.next_mono() (train.py) and found none of my monolingual data was ever within it.

Additionally (im not sure it matters) but I believe if self.config['sort_batches'] is even it will never enter the (next_mono, data.py)if not self.inited or self.batch_id == self.config['sort_batches'] within (thus never adding monolingual data), if it is odd it will never enter (next, data.py) if not self.inited or self.batch_id == self.config['sort_batches'], never adding bilingual data.

Bleu Score error

I have run the below command
sed -r 's/(@@ )|(@@ ?$)//g' < newstest2015.trans > newstest2015.trans.norm

And the run command for Bleu score as
multi-bleu.perl -lc newstest2015.tc.en < newstest2015.trans.norm > evalResult

and occur error as
multi-bleu.perl: command not found

There is any package required or any more parameters take

无法使用GPU?

CUDA一定要9.0吗?我的机器是CUDA 8.0的,并不能使用GPU训练

The error of training an RNNsearch model

There is my command as follow . What cause the error and how to solve it ?
(venv-2.7.14) ubuntu@ubuntu:~/python2.7/tensorflow$ python THUMT/thumt/bin/trainer.py --input corpus.tc.32k.de.shuf corpus.tc.32k.en.shuf --vocabulary vocab.32k.de.txt vocab.32k.en.txt --model rnnsearch --validation newstest2014.tc.32k.de --references newstest2014.tc.32k.en --parameters=batch_size=128,device_list=[0],train_steps=200000 INFO:tensorflow:Restoring hyper parameters from /home/ubuntu/python2.7/tensorflow/train/params.json Traceback (most recent call last): File "THUMT/thumt/bin/trainer.py", line 472, in <module> main(parse_args()) File "THUMT/thumt/bin/trainer.py", line 317, in main params = import_params(args.output, args.model, params) File "THUMT/thumt/bin/trainer.py", line 122, in import_params params.parse_json(json_str) File "/home/ubuntu/.pyenv/versions/venv-2.7.14/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/hparam.py", line 587, in parse_json return self.override_from_dict(values_map) File "/home/ubuntu/.pyenv/versions/venv-2.7.14/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/hparam.py", line 539, in override_from_dict self.set_hparam(name, value) File "/home/ubuntu/.pyenv/versions/venv-2.7.14/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/hparam.py", line 490, in set_hparam param_type, is_list = self._hparam_types[name] KeyError: u'num_hidden_layers'
Thanks in advance .

AttributeError: 'module' object has no attribute 'rnn_cell'

Hi,

When I try "python $THUFOLDER/thumt/bin/trainer.py -h"
it reports like:

Traceback (most recent call last):
File "/training/Users/THUMT/THUMT/thumt/bin/trainer.py", line 17, in
import thumt.models as models
File "/training/Users/THUMT/THUMT/thumt/models/init.py", line 4, in
import thumt.models.seq2seq
File "/training/Users/THUMT/THUMT/thumt/models/seq2seq.py", line 11, in
import thumt.layers as layers
File "/training/Users/THUMT/THUMT/thumt/layers/init.py", line 6, in
import thumt.layers.rnn_cell
File "/training/Users/THUMT/THUMT/thumt/layers/rnn_cell.py", line 13, in
class LegacyGRUCell(tf.nn.rnn_cell.RNNCell):
AttributeError: 'module' object has no attribute 'rnn_cell'

Do you have any idea to fix it?
Thanks.

Is the batch size the same for MRT and MLE?

I tried both MRT and MLE without changing the batch size, and I got out of memory error for MRT.
It looks like the model has to process 80 * 100 samples if I set the batch size to 80 and sampleN to 100, strictly following the settings in the original paper of MRT. However, in my experience, at least 4G intermediate variables have to be stored to compute gradient when training a usable NMT model with batch size 256+. I wonder is it correct to set batch size to 80 for MRT? Or it is only for MLE?

RuntimeError:Graph is finalized and cannot be modified. when validating

环境:
python 2.7.15
tensorflow 1.7.0
运行命令:
python bin/trainer.py --input mt_dataset/corpus.tc.32k.de.shuf mt_dataset/corpus.tc.32k.en.shuf --vocabulary vocab.32k.de.txt vocab.32k.en.txt --model transformer --validation mt_dataset/newstest2014.tc.32k.de --references mt_dataset/newstest2014.tc.32k.en --parameters=batch_size=6250,device_list=[3],train_steps=2000,eval_steps=100
在validating时出现如下错误:

Traceback (most recent call last):
  File "bin/trainer.py", line 472, in <module>
    main(parse_args())
  File "bin/trainer.py", line 468, in main
            "source_length": tf.placeholder(tf.int32, [None], "source_length")
    sess.run_step_fn(step_fn)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 607, in run_step_fn
    return self._sess.run_step_fn(step_fn, self._tf_sess(), run_with_hooks=None)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1038, in run_step_fn
    return self._sess.run_step_fn(step_fn, raw_session, run_with_hooks)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 957, in run_step_fn
    return step_fn(_MonitoredSession.StepContext(raw_session, run_with_hooks))
  File "bin/trainer.py", line 458, in step_fn
    return step_context.run_with_hooks(ops["train_op"])
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 634, in run_with_hooks
    return self._run_with_hooks_fn(*args, **kwargs)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1113, in run
    raise six.reraise(*original_exc_info)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1098, in run
    return self._sess.run(*args, **kwargs)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1178, in run
    run_metadata=run_metadata))
  File "/data1/xxx/gen_comment/THUMT/thumt/utils/hooks.py", line 279, in after_run
    self._session_config)
  File "/data1/xxx/gen_comment/THUMT/thumt/utils/hooks.py", line 168, in _evaluate
    print(tf.shape(outputs))
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 285, in shape
    return shape_internal(input, name, optimize=True, out_type=out_type)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 309, in shape_internal
    input_tensor = ops.convert_to_tensor(input)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 950, in convert_to_tensor
    as_ref=False)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1040, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 235, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 220, in constant
    name=name).outputs[0]
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3262, in create_op
    self._check_not_finalized()
  File "/data1/xxx/anaconda3/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2896, in _check_not_finalized
    raise RuntimeError("Graph is finalized and cannot be modified.")```

打印调试发现是在这个地方出错utils/hooks.py
```python
        with tf.train.MonitoredSession(session_creator=sess_creator) as sess:
            while not sess.should_stop():
                feats = sess.run(features)
                outputs = sess.run(predictions, feed_dict={
                    placeholders["source"]: feats["source"],
                    placeholders["source_length"]: feats["source_length"]
                })
                # shape: [batch, len]
                outputs = outputs.tolist()
                # shape: ([batch, len], ..., [batch, len])

关于BPE的问题

1.如果不使用BPE,会对结果产生影响吗?如果有影响,bleu会降低很多吗?
2.我在tips for transformer这篇文章中看到batch_size会对结果有很大影响,请问你遇到过相关的问题吗,batch_size至少设置多少呢?(默认的batch_size在1080上显存不够用)

About train_steps

How to determine the end of the training? The parameter train_steps in train.py is 100000, but the program is still executing after 100000.Thank you very much.

对比Tensor2Tensor

你好,

我用100万的中英平行预料分别训练了THUMT的transformer_base和Tensor2Tensor的transformer_base。
THUMT设置有constant_batch_size=True,batch_size=64,update_cycle=1。t2t也是batch_size=64,他们都在单GPU上训练。

训练时,THUMT每个step的平均时间是0.63秒,t2t每个step的平均时间是0.08秒。
翻译我没有记录准确数字,大概是翻译一万行,THUMT比t2t快十几倍。

我的实验结果合理吗?哪里可能有问题?

Log file question

thumt-log
There is my log file , which is different with the log provided in official manual . I can't find any information aboult average cost or Bleu score .

Did I make some mistakes or ignore some details ?

my cammand is as follow :
1 #!/bin/bash 2 python ~/THUMT-theano/scripts/trainer.py --config-file ~/THUMT-theano/config/THUMT.config --trn-src-file ~/data/ch.train --trn-trg-file ~/data/en.train --vld-src-file ~/data/ch.dev --vld-trg-file ~/ data/en.dev --device gpu1

The model transformer support multi-GPU whether or not ?

I have try to reproduce the experiment with model transformer and multi-GPU ,but I found a lots of line is none in the file *trans.norm of decoding .the experiment model rnnsearch with one-gpu hasn't this phenomenon.
tensorflow/ master

what's the "labels" here in RNNsearch.py ?

Hi, I try to run the thumt with the RNNsearch model, but I found couldn't.
A variable named "labels" seems not be defined.
==========Line 307===============
ce = layers.nn.smoothed_softmax_cross_entropy_with_logits(
logits=logits,
labels=labels,
smoothing=params.label_smoothing,
normalize=True
)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.