opennmt / opennmt-py Goto Github PK

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

License: MIT License

Python 95.84% Shell 3.56% Perl 0.50% Dockerfile 0.09%

deep-learning language-model llms machine-translation neural-machine-translation pytorch

opennmt-py's Introduction

This project is considered obsolete as the Torch framework is no longer maintained. If you are starting a new project, please use an alternative in the OpenNMT family: OpenNMT-tf (TensorFlow) or OpenNMT-py (PyTorch) depending on your requirements.

OpenNMT: Open-Source Neural Machine Translation

OpenNMT is a full-featured, open-source (MIT) neural machine translation system utilizing the Torch mathematical toolkit.

The system is designed to be simple to use and easy to extend, while maintaining efficiency and state-of-the-art translation accuracy. Features include:

Speed and memory optimizations for high-performance GPU training.
Simple general-purpose interface, only requires and source/target data files.
C++ implementation of the translator for easy deployment.
Extensions to allow other sequence generation tasks such as summarization and image captioning.

Installation

OpenNMT only requires a Torch installation with few dependencies.

Install Torch
Install additional packages:

luarocks install tds
luarocks install bit32 # if using LuaJIT

For other installation methods including Docker, visit the documentation.

Quickstart

OpenNMT consists of three commands:

Preprocess the data.

th preprocess.lua -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo

Train the model.

th train.lua -data data/demo-train.t7 -save_model model

Translate sentences.

th translate.lua -model model_final.t7 -src data/src-test.txt -output pred.txt

For more details, visit the documentation.

Citation

A technical report on OpenNMT is available. If you use the system for academic work, please cite:

@ARTICLE{2017opennmt,
  author = {{Klein}, G. and {Kim}, Y. and {Deng}, Y. and {Senellart}, J. and {Rush}, A.~M.},
  title = "{OpenNMT: Open-Source Toolkit for Neural Machine Translation}",
  journal = {ArXiv e-prints},
  eprint = {1701.02810}
}

Acknowledgments

Our implementation utilizes code from the following:

Additional resources

opennmt-py's People

Contributors

Stargazers

Watchers

Forkers

hfxunlp shiweiba allensmile asmith26 vikingmew codeaudit bopo soumendas benjamesbabala irshadbhat gumblex amrsharaf mail4y lzzk xingxingzhang johnsonc chagge ml-lab rikkarikka bmccann metamind lianghuang3 kaayy voidlin vkhokhla dapengliamy ganji15 dryuna statml paojianghu binbinbian fuhaha szmigacz tsingcoo ajaytalati severusvinegar aaronlifenghan sotetsuk maga33 hagho xbkong zzmjohn kyunghyuncho ryeak47 pltrdy ggyuan leej35 toanhvu burtenshaw stillkeeptry xibinyue magic282 sameerkhurana10 shubhujf kjj-litton hanqinglu weifoo embracelife henry-e hdubey srush asaluja yuntang wade2016 adolfoeliazat ylhsieh ydjbuaa kobikun mrduongnv kenneth telmop yilinyang7 shuoyangd quanpn90 askender lidra gladuo epwalsh aashishv sebastiangehrmann bpopeters xryanyjy falcondai zhang-jinyi chiraaglala unbabel henry2009 liyi193328 dalegebit premjithb wuhh thunder112358 zhengzx-nlp futureer sojvai esalesky wintor12 bowenliu16 sterlesser surafelml

opennmt-py's Issues

optim.optimizer.state_dict.state not found when loading from checkpoint

I encountered an error when using Adam optimizer and resuming training from checkpoint, which states that state in Adam optimizer is not found. I found that the line optim.set_parameters(model.parameters()) wipe out states in
optim.optimizer.state_dict.state.

if not opt.train_from_state_dict and not opt.train_from:
        for p in model.parameters():
            p.data.uniform_(-opt.param_init, opt.param_init)
        encoder.load_pretrained_vectors(opt)
        decoder.load_pretrained_vectors(opt)
        optim = onmt.Optim(
            opt.optim, opt.learning_rate, opt.max_grad_norm,
            lr_decay=opt.learning_rate_decay,
            start_decay_at=opt.start_decay_at
        )
    else:
        print('Loading optimizer from checkpoint:')
        optim = checkpoint['optim']
        print(optim)

    optim.set_parameters(model.parameters())

after I move the line optim.set_parameters(model.parameters()) into the block under if statement, the code works fine:

if not opt.train_from_state_dict and not opt.train_from:
        for p in model.parameters():
            p.data.uniform_(-opt.param_init, opt.param_init)
        encoder.load_pretrained_vectors(opt)
        decoder.load_pretrained_vectors(opt)
        optim = onmt.Optim(
            opt.optim, opt.learning_rate, opt.max_grad_norm,
            lr_decay=opt.learning_rate_decay,
            start_decay_at=opt.start_decay_at
        )
        optim.set_parameters(model.parameters())
    else:
        print('Loading optimizer from checkpoint:')
        optim = checkpoint['optim']
        print(optim)

Bug in StackedLSTM with dropout

https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/Models.py#L59

if i != self.num_layers:
    input = self.dropout(input)

This if statement is always true, dropout is always applied, even for the outputs of the last decoder layer, outputs after dropout are then passed to the attention module.
It looks like a bug.
It might have been introduced while porting from torch (0-indexed arrays vs 1-indexed)

MAC OSX ISSUE: bash: wget: command not found

brew install wget

Read this stackoverflow for more answers:
https://stackoverflow.com/questions/4572153/os-x-equivalent-of-linuxs-wget

-brnn_merge is unused in the code?

-brnn_merge: Merge action for the bidirectional hidden states: [concat|sum]

I searched the -brnn_merge flag in the code, but it seems that this function has not been implemented.

Thanks!

Context gate runtime error

How to properly use newly committed -context_gate feature as I couldn't make it works. I keep getting the following error.

$ python ../OpenNMT/train.py -gpus 0 -log_interval 100 -rnn_size 2048 -word_vec_size 1024 -context_gate both -optim adam -learning_rate 0.0001 -data data.000.000.train.pt
Namespace(batch_size=64, brnn=False, brnn_merge='concat', context_gate='both', curriculum=False, data='data.000.000.train.pt', dropout=0.3, encoder_type='text', epochs=13, extra_shuffle=False, gpus=[0], input_fe
ed=1, layers=2, learning_rate=0.0001, learning_rate_decay=0.5, log_interval=100, max_generator_batches=32, max_grad_norm=5, optim='adam', param_init=0.1, pre_word_vecs_dec=None, pre_word_vecs_enc=None, rnn_size=2048, rnn_type='LSTM', save_model='model', seed=-1, start_decay_at=8, start_epoch=1, train_from='', train_from_state_dict='', word_vec_size=1024)
Loading data from 'data.000.000.train.pt'
 * vocabulary size. source = 10004; target = 10004
 * number of training sentences. 2000000
 * maximum batch size. 64
Building model...
* number of parameters: 208830228
NMTModel (
  (encoder): Encoder (
    (word_lut): Embedding(10004, 1024, padding_idx=0)
    (rnn): LSTM(1024, 2048, num_layers=2, dropout=0.3)
  ) 
  (decoder): Decoder (
    (word_lut): Embedding(10004, 1024, padding_idx=0)
    (rnn): StackedLSTM (
      (dropout): Dropout (p = 0.3)
      (layers): ModuleList (
        (0): LSTMCell(3072, 2048)
        (1): LSTMCell(2048, 2048)
      ) 
    ) 
    (attn): GlobalAttention (
      (linear_in): Linear (2048 -> 2048)
      (sm): Softmax ()
      (linear_out): Linear (4096 -> 2048)
      (tanh): Tanh ()
    ) 
    (context_gate): BothContextGate (
      (context_gate): ContextGate (
        (gate): Linear (5120 -> 2048)
        (sig): Sigmoid ()
        (source_proj): Linear (2048 -> 2048)
        (target_proj): Linear (3072 -> 2048)
      ) 
      (tanh): Tanh ()
    )  
    (dropout): Dropout (p = 0.3)
  )
  (generator): Sequential (
    (0): Linear (2048 -> 10004)
    (1): LogSoftmax ()
  )
)

Traceback (most recent call last):
  File "../OpenNMT/train.py", line 415, in <module>
    main()
  File "../OpenNMT/train.py", line 411, in main
    trainModel(model, trainData, validData, dataset, optim)
  File "../OpenNMT/train.py", line 268, in trainModel
    train_loss, train_acc = trainEpoch(epoch)
  File "../OpenNMT/train.py", line 229, in trainEpoch
    outputs = model(batch)
  File "/XXX/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/beritagar/workdir/src/OpenNMT-py/onmt/Models.py", line 198, in forward
    context, init_output)
  File "/XXX/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/beritagar/workdir/src/OpenNMT-py/onmt/Models.py", line 152, in forward
    emb_t.squeeze(0), rnn_output, attn_output
  File "/XXX/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/beritagar/workdir/src/OpenNMT-py/onmt/modules/Gate.py", line 89, in forward
    z, source, target = self.context_gate(prev_emb, dec_state, attn_state)
  File "/XXX/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/beritagar/workdir/src/OpenNMT-py/onmt/modules/Gate.py", line 39, in forward
    input_tensor = torch.cat((prev_emb, dec_state, attn_state), dim=2)
  File "/XXX/lib/python3.6/site-packages/torch/autograd/variable.py", line 841, in cat
    return Concat(dim)(*iterable)
  File "/XXX/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 309, in forward
    self.input_sizes = [i.size(self.dim) for i in inputs]
  File "/XXX/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 309, in <listcomp>
    self.input_sizes = [i.size(self.dim) for i in inputs]
RuntimeError: out of range at /py/conda-bld/pytorch_1493681908901/work/torch/lib/THC/generic/THCTensor.c:23

Pre-trained model not compatible with current implemenation

The English-German pre-trained model (onmt_model_en_de_200k-4783d9c3.pt) is not compatible anymore with the current implementation.

When loading the model and translating, the following error emerges: AttributeError: 'StackedLSTM' object has no attribute 'num_layers'

I checked the history for the StackedLSTM class and it seems the initial version didn't include the num_layers attribute.

multiple GPUs do not reduce training time

I am trying to use multiple GPU in training, but I am not able to reduce training time .

I have a machine with 3 GPUs (GeForce GTX 1080), and I train a network (details are below)
I tried with different amount of GPUS (1 or 2 or 3) and different batch_size (64,128,192,248)
Here is the table reporting the times of one epoch

using 1 GPU:
batch_size=64 43 seconds
batch_size=128 35 seconds
batch_size=192 32 seconds
batch_size=248 30 seconds

using 2 GPUs:
batch_size=64 78 seconds
batch_size=128 51 seconds
batch_size=192 43 seconds
batch_size=248 40 seconds

using 3 GPUs:
batch_size=64 94 seconds
batch_size=128 60 seconds
batch_size=192 50 seconds
batch_size=248 44 seconds

I also notice that the GPU utilization is quite low when multiple GPUs are used
with 1 GPU GPU utilization: 80-90%
with 2 GPU GPU utilization: 45-55%
with 3 GPU GPU utilization: 35-45%

I am using this setting (gpus and batch_size vary according to the experiments):
Namespace(batch_size=128, brnn=False, brnn_merge='concat', context_gate=None, curriculum=False, data='debugging/model.train.pt', dropout=0.3, encoder_type='text', epochs=13, extra_shuffle=False, gpus=[0], input_feed=1, layers=2, learning_rate=1.0, learning_rate_decay=0.5, log_interval=50, max_generator_batches=32, max_grad_norm=5, optim='sgd', param_init=0.1, pre_word_vecs_dec=None, pre_word_vecs_enc=None, rnn_size=500, rnn_type='LSTM', save_model='debugging/model', seed=-1, start_decay_at=8, start_epoch=1, train_from='', train_from_state_dict='', word_vec_size=500)

and this commit 58c8b52

Why training speed does not scale with the number of GPUs?
But rather it seems to slow down the training.

Have you already noticed this behavior?
Am I doing any error?

Any comment is welcome.

Is any plan to include the "Word Features" in the near future?

AttributeError: 'module' object has no attribute '_cuda_setDevice'

when I try to train the model,
python train.py -data data/multi30k.atok.low.train.pt -save_model multi30k_model -gpus 0
An error occurs. What should I do ?
Namespace(batch_size=64, brnn=False, brnn_merge='concat', curriculum=False, data='data/multi30k.atok.low.train.pt', dropout=0.3, epochs=13, extra_shuffle=False, gpus=[0], input_feed=1, layers=2, learning_rate=1.0, learning_rate_decay=0.5, log_interval=50, max_generator_batches=32, max_grad_norm=5, optim='sgd', param_init=0.1, pre_word_vecs_dec=None, pre_word_vecs_enc=None, rnn_size=500, save_model='multi30k_model', start_decay_at=8, start_epoch=1, train_from='', train_from_state_dict='', word_vec_size=500) Traceback (most recent call last): File "train.py", line 119, in <module> cuda.set_device(opt.gpus[0]) File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/cuda/__init__.py", line 161, in set_device torch._C._cuda_setDevice(device) AttributeError: 'module' object has no attribute '_cuda_setDevice''

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 48: ordinal not in range(128)

When I run the translate.py step of the tutorial, I get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 48: ordinal not in range(128)

Any idea how to fix this error?

MAC OSX issue: sed: 1: "tokenizer.perl": undefined label 'okenizer.perl'

When running sed -i "s/$RealBin/../share/nonbreaking_prefixes//" tokenizer.perl mac users might get an error sed: 1: "tokenizer.perl": undefined label 'okenizer.perl'

Resolved it by:

sed -i '.bak' "s/$RealBin/../share/nonbreaking_prefixes//" tokenizer.perl

a strange problem about saving memory in training process

hello guys,

i have some strange problems here,

if i use memoryEfficientLoss function to backward loss, the training seems to be normal

but if i put the content of memoryEfficientLoss into trainEpoch function and do not define extra memoryEfficientLoss function, the training will not converge, all other code are the same.

and another question is that i guess the split operation along the first dimension to ouputs of the model can save the memory, however, if so, how do we calculate gradients and do backward and why do you call the backward() two times (loss.backward() and outputs.backward()) ? can you explain this ? thank you.

Can anyone tell me why ? any reply will be appreciated.

README.md out of date?

At the end of the README under "Not yet implemented", it says "Multi-GPU". However it seems multi-gpu support was added: https://github.com/OpenNMT/OpenNMT-py/blob/master/train.py#L368

Is this correct?

got an error when translate a GRU -brnn model

/Users/ChaiDuo/.virtualenvs/pytorch/bin/python /Users/ChaiDuo/Code/Project/OpenNMT-py/translate.py
Traceback (most recent call last):
  File "/Users/ChaiDuo/Code/Project/OpenNMT-py/translate.py", line 162, in <module>
    main()
  File "/Users/ChaiDuo/Code/Project/OpenNMT-py/translate.py", line 109, in main
    tgtBatch)
  File "/Users/ChaiDuo/Code/Project/OpenNMT-py/onmt/Translator.py", line 268, in translate
    pred, predScore, attn, goldScore = self.translateBatch(src, tgt)
  File "/Users/ChaiDuo/Code/Project/OpenNMT-py/onmt/Translator.py", line 117, in translateBatch
    encStates = (self.model._fix_enc_hidden(encStates[0]),
  File "/Users/ChaiDuo/Code/Project/OpenNMT-py/onmt/Models.py", line 166, in _fix_enc_hidden
    return h.view(h.size(0) // 2, 2, h.size(1), h.size(2)) \
RuntimeError: dimension 2 out of range of 2D tensor at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensor.c:24

Process finished with exit code 1

got an error when translate a GRU -brnn model

Translation Crashes when ground truth output is not provided

It seems like the current translation implementation assumes we always have access to the ground truth target translations (provided using the -tgt flag). This is not desirable when we don't have access to the ground truth target sentences.

beam search outputs the same result many times

Here, I use beam search, and set beam_size and n_best both to 50, but the result shows that the same result outputs many times. I don't know why. Is it because of beam search itself or I do something wrong ?

Are there plans to add "the Transformer"?

Are there plans to add a PyTorch implementation of "the Transformer" from
"Attention Is All You Need" https://arxiv.org/abs/1706.03762 ?

Slow in Training

Thanks for the Pytorch version of ONMT. It seems to me that the training is slower than Lua version of ONMT. With lua version of ONMT I was getting about 3000 tokens/sec where I am getting here 1700 tokens/sec. Both version I am using 4 layers and 1000(rnn size). Is it correct behavior or I am missing something?

No such file or directory: 'multi30k_model_e13_*.pt'

I did the training successfully

...
Epoch 13,   450/  454; acc:  75.87; ppl:   2.87; 2975 src tok/s; 3085 tgt tok/s;   1714 s elapsed
Train perplexity: 2.91169
Train accuracy: 75.7449
Validation perplexity: 5.64726
Validation accuracy: 70.0177
Decaying learning rate to 0.015625

and then from the same folder

root@27277298c897:/# python translate.py -gpu 0 -model multi30k_model_e13_*.pt -src data/multi30k/test.en.atok -tgt data/multi30k/test.de.atok -replace_unk -verbose -output multi30k.test.pred.atok
Traceback (most recent call last):
  File "translate.py", line 135, in <module>
    main()
  File "translate.py", line 62, in main
    translator = onmt.Translator(opt)
  File "/musixmatch/onmt/Translator.py", line 12, in __init__
    checkpoint = torch.load(opt.model)
  File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 220, in load
    f = open(f, 'rb')
IOError: [Errno 2] No such file or directory: 'multi30k_model_e13_*.pt'

The root folder content is

-rw-r--r-- 1 root root      1137 Apr  3 15:07 LICENSE.md
-rw-r--r-- 1 root root      3606 Apr  3 15:07 README.md
drwxr-xr-x 5 root root      4096 Apr  3 15:53 data
-rw-r--r-- 1 root root      4826 Apr  3 15:53 multi-bleu.perl
-rw-r--r-- 1 root root 131591914 Apr  3 15:56 multi30k_model_acc_28.59_ppl_86.75_e1.pt
-rw-r--r-- 1 root root 131591914 Apr  3 15:58 multi30k_model_acc_37.08_ppl_38.35_e2.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:00 multi30k_model_acc_52.30_ppl_15.70_e3.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:02 multi30k_model_acc_58.82_ppl_10.89_e4.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:04 multi30k_model_acc_62.22_ppl_8.55_e5.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:09 multi30k_model_acc_64.36_ppl_7.17_e7.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:07 multi30k_model_acc_64.60_ppl_7.32_e6.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:11 multi30k_model_acc_66.44_ppl_6.52_e8.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:13 multi30k_model_acc_68.62_ppl_5.88_e9.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:15 multi30k_model_acc_69.69_ppl_5.68_e10.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:18 multi30k_model_acc_69.71_ppl_5.67_e11.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:22 multi30k_model_acc_70.02_ppl_5.65_e13.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:20 multi30k_model_acc_70.04_ppl_5.64_e12.pt
-rw-r--r-- 1 root root    196775 Apr  3 15:53 nonbreaking_prefix.de
-rw-r--r-- 1 root root    159010 Apr  3 15:53 nonbreaking_prefix.en
drwxr-xr-x 4 root root      4096 Apr  3 15:53 onmt
-rw-r--r-- 1 root root      5977 Apr  3 15:07 preprocess.py
-rwxr-xr-x 1 root root       464 Apr  3 15:52 preprocess.sh
-rw-r--r-- 1 root root     16790 Apr  3 15:53 tokenizer.perl
-rw-r--r-- 1 root root     14316 Apr  3 15:07 train.py
-rw-r--r-- 1 root root         0 Apr  3 15:44 train.sh
-rw-r--r-- 1 root root      4774 Apr  3 15:07 translate.py

So I do not see the multi30k_model_e13_*.pt model file there.

OpenNMT confusion use of `Optim.updateLearningRate`: `valid_loss` used as `ppl` arg

In train.py line 246, we have

#  (3) update the learning rate
optim.updateLearningRate(valid_loss, epoch)

However, in onmt/Optim.py, inside function updateLearningRate() at line 37, we have:

def updateLearningRate(self, ppl, epoch):
...
    self.last_ppl = ppl

Why use valid_loss in the place of ppl for updateLearningRate()?
Are valid_loss and ppl different measures?

Could anyone help me here, Thanks!

`TypeError: 'NoneType' object is not callable` during training

I was trying to train on mac with cpu with the following steps:

preprocess data and shrink src and tgt to have only the first 100 sentences by inserting the following lines after line133 in preprocess.py

    shrink = True
    if shrink:
        src = src[0:100]
        tgt = tgt[0:100]

then, I ran

python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo

then I train using python train.py -data data/demo.train.pt -save_model demo_model

Then it rans ok for a while before an error appeared:

(dlnd-tf-lab)  ->python train.py -data data/demo.train.pt -save_model demo_model
Namespace(batch_size=64, brnn=False, brnn_merge='concat', curriculum=False, data='data/demo.train.pt', dropout=0.3, epochs=13, extra_shuffle=False, gpus=[], input_feed=1, layers=2, learning_rate=1.0, learning_rate_decay=0.5, log_interval=50, max_generator_batches=32, max_grad_norm=5, optim='sgd', param_init=0.1, pre_word_vecs_dec=None, pre_word_vecs_enc=None, rnn_size=500, save_model='demo_model', start_decay_at=8, start_epoch=1, train_from='', train_from_state_dict='', word_vec_size=500)
Loading data from 'data/demo.train.pt'
 * vocabulary size. source = 24999; target = 35820
 * number of training sentences. 100
 * maximum batch size. 64
Building model...
* number of parameters: 58121320
NMTModel (
  (encoder): Encoder (
    (word_lut): Embedding(24999, 500, padding_idx=0)
    (rnn): LSTM(500, 500, num_layers=2, dropout=0.3)
  )
  (decoder): Decoder (
    (word_lut): Embedding(35820, 500, padding_idx=0)
    (rnn): StackedLSTM (
      (dropout): Dropout (p = 0.3)
      (layers): ModuleList (
        (0): LSTMCell(1000, 500)
        (1): LSTMCell(500, 500)
      )
    )
    (attn): GlobalAttention (
      (linear_in): Linear (500 -> 500)
      (sm): Softmax ()
      (linear_out): Linear (1000 -> 500)
      (tanh): Tanh ()
    )
    (dropout): Dropout (p = 0.3)
  )
  (generator): Sequential (
    (0): Linear (500 -> 35820)
    (1): LogSoftmax ()
  )
)

Train perplexity: 29508.9
Train accuracy: 0.0216306
Validation perplexity: 4.50917e+08
Validation accuracy: 3.57853

Train perplexity: 1.07012e+07
Train accuracy: 0.06198
Validation perplexity: 103639
Validation accuracy: 0.944334

Train perplexity: 458795
Train accuracy: 0.031198
Validation perplexity: 43578.2
Validation accuracy: 3.42942

Train perplexity: 144931
Train accuracy: 0.0432612
Validation perplexity: 78366.8
Validation accuracy: 2.33598
Decaying learning rate to 0.5

Train perplexity: 58696.8
Train accuracy: 0.0278702
Validation perplexity: 14045.8
Validation accuracy: 3.67793
Decaying learning rate to 0.25

Train perplexity: 10045.1
Train accuracy: 0.0457571
Validation perplexity: 26435.6
Validation accuracy: 4.87078
Decaying learning rate to 0.125

Train perplexity: 10301.5
Train accuracy: 0.0490849
Validation perplexity: 24243.5
Validation accuracy: 3.62823
Decaying learning rate to 0.0625

Train perplexity: 7927.77
Train accuracy: 0.062812
Validation perplexity: 7180.49
Validation accuracy: 5.31809
Decaying learning rate to 0.03125

Train perplexity: 4573.5
Train accuracy: 0.047421
Validation perplexity: 6545.51
Validation accuracy: 5.6163
Decaying learning rate to 0.015625

Train perplexity: 3995.7
Train accuracy: 0.0549085
Validation perplexity: 6316.25
Validation accuracy: 5.4175
Decaying learning rate to 0.0078125

Train perplexity: 3715.81
Train accuracy: 0.0540765
Validation perplexity: 6197.91
Validation accuracy: 5.86481
Decaying learning rate to 0.00390625

Train perplexity: 3672.46
Train accuracy: 0.0540765
Validation perplexity: 6144.18
Validation accuracy: 6.01392
Decaying learning rate to 0.00195312

Train perplexity: 3689.7
Train accuracy: 0.0528286
Validation perplexity: 6113.55
Validation accuracy: 6.31213
Decaying learning rate to 0.000976562
Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x118b19b70>
Traceback (most recent call last):
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/weakref.py", line 117, in remove
TypeError: 'NoneType' object is not callable

Could you tell me how to fix it? Thanks!

MultiGPU issues

Hi,
This might be already known to you guys (since the README does mention at the end that multi-GPU is not supported), but I encountered this just now so thought I might give a heads up!
The code is supposed to support multiGPU via nn.DataParallel, however, it is breaking due to usage of nn.utils.pack_padded_sequence in the encoder. Basically, when the input data of padded sequences is sent to a DataParallel, it is split along the batch dimension, but the list containing the lengths is not split, resulting in a size mismatch error as follows:

File "/somepath/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
    raise output
ValueError: lengths array has incorrect size

If the input was a tuple of a B*L*D Variable and a B length list of lengths, where B is the batchsize, L is the padded length, and D is the feature size, the nn.DataParallel scatters the B*L*D into B/K*L*D, where K is the number of GPUs, while the B length list is shallow copied. This causes nn.utils.pack_padded_sequence to raise an error.

GPU & encoding problems with translate.py

Hi All,

So I just trained my first model with OpenNMT-py, everything looks great and easy to play with. But when I started to run the inference part, things start to get tricky.

First, since I was translate from Chinese to English, everything in translate.py that's dealing with files and I/O is not working because of encoding problems. I fixed this by substituting open with codecs.open. You may want to fix this, or I can submit a pull request after more through testing.

(Not sure if this is relevant, but I'm using Python 3.6.1.)

Second, I assume when I pass -gpu 0 I'm just using GPU device 0, but it does not look like so -- here is the nvidia-smi output when I run the program:

Tue Jun 20 17:38:50 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20m          On   | 0000:02:00.0     Off |                    0 |
| N/A   25C    P0    85W / 225W |    871MiB /  4742MiB |     93%   E. Process |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20m          On   | 0000:03:00.0     Off |                    0 |
| N/A   23C    P0    46W / 225W |   1300MiB /  4742MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     27148    C   python                                         869MiB |
|    1     27148    C   python                                        1298MiB |
+-----------------------------------------------------------------------------+

which seems to indicate there is another mysterious process which is taking a lot of memories but not doing anything. Any idea what started that process? And is there any chance I can get rid of it?

Here is the exact command I used to run translate.py:

python ~/nmt/opennmt/translate.py -gpu 0 -model $PWD/model/model_acc_58.17_ppl_7.80_e13.pt -src data/eval08.bpe.zh -tgt data/eval08.en.tok -output out

Thanks!

More pre-trained data

Hope this is not a dumb question.
Is there some way to get pre-trained data for other language? I'm interested in contributing on other language(such as Korean or Japanese), but cannot find it. Do I need to make it myself, or there is some way to create it?

Thanks.

About the _fix_enc_hidden() function

Hi, this project is great.
I found this code

    def _fix_enc_hidden(self, h):
        #  the encoder hidden is  (layers*directions) x batch x dim
        #  we need to convert it to layers x batch x (directions*dim)
        if self.encoder.num_directions == 2:
            return h.view(h.size(0) // 2, 2, h.size(1), h.size(2)) \
                    .transpose(1, 2).contiguous() \
                    .view(h.size(0) // 2, h.size(1), h.size(2) * 2)
        else:
            return h

If we do this, we assume the h_n is like
[
layer0_forward
layer0_backward
layer1_forward
layer1_backward
layer2_forward
layer2_backward
...
]
So how can I know it's correct instead of like
[
layer0_forward
layer1_forward
layer2_forward
layer0_backward
layer1_backward
layer2_backward
...
]

Could you please put up your development plan?

Could you please put out your development plan, so I can contribute to your project ?

Great perplexity on training but useless translations

Hi,

I'm running the last updated OpenNMT-py with the latest pytorch, cuda7.5 on a GPU K80.
I ran the training with both the data provided in the example and with the en-fr from IWSLT2016. In both cases, the perplexity during training gets very low (single digit for every minibatch) but the accuracy is always 0.0.
Moreover, when I try to translate the validation sets, the translations seems totally random.

"Convolutional Sequence to Sequence Learning" implementation

Are there plans to add a PyTorch implementation of "Convolutional Sequence to Sequence Learning" to this repo?
Paper: https://s3.amazonaws.com/fairseq/papers/convolutional-sequence-to-sequence-learning.pdf
LuaTorch implementation: https://github.com/facebookresearch/fairseq/blob/master/fairseq/models/fconv_model.lua

About MultiGPU Usage

When i use Opennmt-py with MultiGPU, it will report the follow error:

Traceback (most recent call last):
File "train.py", line 356, in
main()
File "train.py", line 352, in main
trainModel(model, trainData, validData, dataset, optim)
File "train.py", line 234, in trainModel
train_loss, train_acc = trainEpoch(epoch)
File "train.py", line 198, in trainEpoch
outputs = model(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
return parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
raise output
ValueError: lengths array has incorrect size

But in Single GPU, it has no problem, is it a bug or other problem?

translate.py crashes with a medium size seq2seq model

python translate.py -model test_model
GPU : titan X pascal

error message:
THCudaCheck FAIL file=/data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.10_1488756735684/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "translate.py", line 116, in
main()
File "translate.py", line 77, in main
predBatch, predScore, goldScore = translator.translate(srcBatch, tgtBatch)
File "/home/user_name/openNMT_proj_name/onmt/Translator.py", line 199, in translate
pred, predScore, attn, goldScore = self.translateBatch(batch)
File "/home/user_name/openNMT_proj_name/onmt/Translator.py", line 64, in translateBatch
encStates, context_t = self.model.encoder(srcBatch_t, hidden=encStates)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 202, in call
result = self.forward(*input, **kwargs)
File "/home/user_name/openNMT_proj_name/onmt/Models.py", line 40, in forward
outputs, hidden_t = self.rnn(emb, hidden)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 202, in call
result = self.forward(*input, **kwargs)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/nn/modules/rnn.py", line 91, in forward
output, hidden = func(input, self.all_weights, hx)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/nn/_functions/rnn.py", line 327, in forward
return func(input, *fargs, **fkwargs)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/autograd/function.py", line 201, in _do_forward
flat_output = super(NestedIOFunction, self)._do_forward(*flat_input)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/autograd/function.py", line 223, in forward
result = self.forward_extended(*nested_tensors)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/nn/_functions/rnn.py", line 269, in forward_extended
cudnn.rnn.forward(self, input, hx, weight, output, hy)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/backends/cudnn/rnn.py", line 247, in forward
fn.weight_buf = x.new(num_weights)
RuntimeError: cuda runtime error (2) : out of memory at /data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.10_1488756735684/work/torch/lib/THC/generic/THCStorage.cu:66

AttributeError: 'NMTModel' object has no attribute 'items'

When I tried to use the pre-trained model onmt_model_en_de_200k, by
python translate.py -gpu 0 -model onmt_model_en_de_200k-4783d9c3.pt -src data/multi30k/test.en.atok -tgt data/multi30k/test.de.atok -replace_u nk -verbose -output multi30k.test.pred.atok
got the following message:

Traceback (most recent call last):
File "translate.py", line 135, in
main()
File "translate.py", line 62, in main
translator = onmt.Translator(opt)
File "~/OpenNMT-py/onmt/Translator.py", line 26, in init
model.load_state_dict(checkpoint['model'])
File "~/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 328, in load_state_dict
for name, param in state_dict.items():
File "~/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 238, in _getattr_
type(self)._name_, name))
AttributeError: 'NMTModel' object has no attribute 'items'

I tried to modify the source code as some warnings appeared:

SourceChangeWarning: source code of class 'torch.nn.modules.dropout.Dropout' has changed. you can retrieve the original source code by accessing the object's source attribute or settorch.nn.Module.dump_patches = True and use the patch tool to revert the changes.

but I got same error plus the following message:

SourceChangeWarning: source code of class 'torch.nn.modules.dropout.Dropout' has changed. Tried to save a patch, but couldn't create a writable file Dropout.patch. Make sure it doesn't exist and your working directory is writable.

word2vec converter

How to covert word2vec model to fit pre_word_vecs_enc ?

Beam search shape consistency

        decOut = decOut.squeeze(0)
        out = self.model.generator.forward(decOut)

        # batch x beam x numWords
        wordLk = out.view(beamSize, remainingSents, -1).transpose(0, 1).contiguous()
        attn = attn.view(beamSize, remainingSents, -1).transpose(0, 1).contiguous()

Here, shouldn't the view be out.view(remainingSents, beamSize, -1)?
The decStates, inputs to the Decoder are stacked by beamSize. Wouldn't the outputs be extracted in the wrong order if you do wordLk = out.view(beamSize, remainingSents, -1) and then take a transpose?

dropout on stacked rnn

I might be overseeing something but isn't this line always true?

AssertionError: Torch not compiled with CUDA enabled

when I run

python train.py -data data/multi30k.atok.low.train.pt -save_model multi30k_model -gpus 0

An error occurs, Could anyone help me?

Namespace(batch_size=64, brnn=False, brnn_merge='concat', curriculum=False, data='data/multi30k.atok.low.train.pt', dropout=0.3, epochs=13, extra_shuffle=False, gpus=[0], input_feed=1, layers=2, learning_rate=1.0, learning_rate_decay=0.5, log_interval=50, max_generator_batches=32, max_grad_norm=5, optim='sgd', param_init=0.1, pre_word_vecs_dec=None, pre_word_vecs_enc=None, rnn_size=500, save_model='multi30k_model', start_decay_at=8, start_epoch=1, train_from='', train_from_state_dict='', word_vec_size=500)
Loading data from 'data/multi30k.atok.low.train.pt'
* vocabulary size. source = 9799; target = 18006
* number of training sentences. 29000
* maximum batch size. 64
Building model...
Traceback (most recent call last):
File "train.py", line 356, in <module>
main()
File "train.py", line 315, in main
model.cuda()
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 147, in cuda
return self._apply(lambda t: t.cuda(device_id))
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply
module._apply(fn)
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply
module._apply(fn)
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 124, in _apply
param.data = fn(param.data)
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 147, in <lambda>
return self._apply(lambda t: t.cuda(device_id))
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/_utils.py", line 65, in _cuda
return new_type(self.size()).copy_(self, async)
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/cuda/__init__.py", line 272, in __new__ _lazy_init()
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/cuda/__init__.py", line 84, in _lazy_init _check_driver()
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/cuda/__init__.py", line 51, in _check_driver raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Training from a checkpoint on new data

Hello, I want to fine tune a pretrained model on new data (incremental adaptation) using some new parameters (epochs, learning rate). However, I have been facing problems while using the train_from option due to varying vocabulary sizes (due to different data, hence different preprocessed .pt files, hence different vocab) in both iterations. Could anyone please suggest a smooth way to achieve this? (i.e. make it train on new data from a checkpoint, using existing vocab but a few new parameters)

RuntimeError: size mismatch (when using -brnn option)

When I train a model with default parameters and translate with translate.py it works fine. When I use the brnn option for training, the train code works fine but the translate.py throws a RuntimeError: size mismatch. Please help me out if I am doing anything wrong. These are the codes that I executed:

python preprocess.py -train_src data/src-train.txt -train_tgt data/trg-train.txt -valid_src data/src-val.txt -valid_tgt data/trg-val.txt -save_data data/ehtrans

python train.py -data data/ehtrans-train.pt -save_model eh/model -brnn -brnn_merge concat -epochs 1000 -cuda

python translate.py -model eh/model_e7_13.28.pt -src data/src-val.txt -tgt data/trg-val.txt -output file-tgt.tok -cuda

The preprocess and train codes work fine but the translate code throws the following error:

Traceback (most recent call last):
  File "translate.py", line 121, in <module>
    main()
  File "translate.py", line 74, in main
    predBatch, predScore, goldScore = translator.translate(srcBatch, tgtBatch)
  File "/DATA/USERS/irshad/OpenNMT-py/onmt/Translator.py", line 190, in translate
    pred, predScore, attn, goldScore = self.translateBatch(batch)
  File "/DATA/USERS/irshad/OpenNMT-py/onmt/Translator.py", line 87, in translateBatch
    tgtBatch[:-1], decStates, context, initOutput)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/DATA/USERS/irshad/OpenNMT-py/onmt/Models.py", line 119, in forward
    output, hidden = self.rnn(emb_t, hidden)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/DATA/USERS/irshad/OpenNMT-py/onmt/Models.py", line 61, in forward
    h_1_i, c_1_i = layer(input, (h_0[i], c_0[i]))
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py", line 472, in forward
    self.bias_ih, self.bias_hh,
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py", line 22, in LSTMCell
    gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 381, in linear
    return bias and state(input, weight, bias) or state(input, weight)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/linear.py", line 10, in forward
    output.addmm_(0, 1, input, weight.t())
RuntimeError: size mismatch, m1: [30 x 250], m2: [500 x 2000] at /home/soumith/local/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:862

Mask for attention

I wonder why we didn't apply mask in GlobalAttention since we paded zeros on the right. I find there actually have a applyMask function but not used.
Thank you.

Is this line redundancy?

When i use train_from_state_dict , it will report the follow error:

Traceback (most recent call last):
File "train.py", line 356, in
main()
File "train.py", line 352, in main
trainModel(model, trainData, validData, dataset, optim)
File "train.py", line 234, in trainModel
train_loss, train_acc = trainEpoch(epoch)
File "train.py", line 206, in trainEpoch
optim.step()
File "/home/XXX/workplaces/OpenNMT-py/onmt/Optim.py", line 34, in step
self.optimizer.step()
File "/usr/local/lib/python2.7/dist-packages/torch/optim/adadelta.py", line 43, in step
state = self.state[p]
KeyError: Parameter containing:
-5.1825e-02 -1.6526e-02 4.0255e-02 ... 6.6652e-02 8.8333e-02 7.9287e-02
-6.2284e-02 -1.1338e-01 1.4133e-01 ... 6.0823e-02 -1.1704e-01 -2.1166e-02
-1.8695e-01 8.3002e-02 -1.1960e-01 ... -1.0802e-01 3.1869e-01 3.1139e-02
... ⋱ ...
-9.4983e-02 -5.2894e-02 -9.2437e-02 ... 3.6116e-02 -2.0674e-01 6.3990e-02
-7.7295e-02 -1.6950e-01 3.5867e-02 ... 4.5825e-02 -5.6685e-02 -6.2091e-03
-3.7790e-02 -1.1555e-01 -3.2032e-02 ... -1.9250e-01 -7.9354e-02 -1.6450e-01
[torch.cuda.FloatTensor of size 30008x620 (GPU 3)]

So, i remove this part. I think if use opt.train_from_state_dict, thers is no need to use optim.optimizer.load_state_dict(checkpoint['optim'].optimizer.state_dict())

No module named utils.rnn

$ python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo
Traceback (most recent call last):
File "preprocess.py", line 1, in
import onmt
File "/home/zeng/opensource/OpenNMT-py/onmt/init.py", line 2, in
import onmt.Models
File "/home/zeng/opensource/OpenNMT-py/onmt/Models.py", line 5, in
from torch.nn.utils.rnn import pad_packed_sequence as unpack
ImportError: No module named utils.rnn

Translate fails when using model trained with DataParallel

GPU: 4 Titan X

Traceback (most recent call last):
File "/home/nikola/code/pytorch-examples/examples/OpenNMT/translate.py", line 116, in
main()
File "/home/nikola/code/pytorch-examples/examples/OpenNMT/translate.py", line 77, in main
predBatch, predScore, goldScore = translator.translate(srcBatch, tgtBatch)
File "/home/nikola/code/pytorch-examples/examples/OpenNMT/onmt/Translator.py", line 195, in translate
pred, predScore, attn, goldScore = self.translateBatch(batch)
File "/home/nikola/code/pytorch-examples/examples/OpenNMT/onmt/Translator.py", line 60, in translateBatch
encStates, context_t = self.model.encoder(srcBatch_t, hidden=encStates)
File "/home/nikola/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 235, in getattr
return object.getattribute(self, name)
AttributeError: 'DataParallel' object has no attribute 'encoder'

If I change self.model.encoder to self.model.module.encoder on line 60, i get OOM. Loading this model from checkpoint, to resume training, works fine.
Thanks!

Translator replace_unk exception when max attention is on EOS

Translator.py#61
uses attention to replace unknowns in decoded tokens, but sometimes the max attention is on EOS, which causes an exception.

File "/home/user/pytorch-seq2seq/onmt/Translator.py", line 65, in buildTargetTokens
tokens[i] = src[maxIndex[0]]
IndexError: list index out of range

Should we guard against this case? Or maybe exclude specials when finding the max attention token?

potential memory leak on large scale dataset

Hi,

I am training a dialog system with opennmt-py on GPUs
My dataset contains 26,265,224 sequence pairs (available here https://github.com/jiweil/Neural-Dialogue-Generation)

I observed a potential memory leak on CPU: GPU memory consumption remains constant (2.9G), but CPU RAM is almost eaten up.
Initially the cpu memory consumption is around 27G (my dataset is large and it's acceptable), but after 38.5 hours (around 3.4 epochs) of training, it becomes 53G.

Here is my training script

CUDA_VISIBLE_DEVICES=3 python $codedir/train.py -data $data -save_model $model -gpus 0 -batch_size 128 -max_generator_batches 64 -rnn_size 512 -word_vec_size 512 2>&1 | tee $label.log.txt

Since I am training models on GPUs rather than CPUs, but with increased memory consumption on CPU. There might be something to do with the dynamic computational graph, which I believe is created on CPU. It is likely that after each batch of training, the dynamic graph is not destroyed properly. I am not sure if it is because the maximum sequence length are different from each batch and pytorch has to create a new graph per batch.

Here is my torch version

>>> import torch
>>> torch.__version__
'0.1.10+ac9245a'

Where is multi-bleu.perl?

haixu@my-machine:~/Desktop/OpenNMT-cls-mdn$ perl multi-bleu.perl data/test.de.atok < cls2mdn.test.pred.atok
Can't open perl script "multi-bleu.perl": No such file or directory

Error loading pretrained word embeddings

When calling train.py with -pre_word_vecs_enc pretrained.embeddings.pt it displays an error

Traceback (most recent call last):
  File "train.py", line 352, in <module>
    main()
  File "train.py", line 289, in main
    encoder = onmt.Models.Encoder(opt, dicts['src'])
  File "/home/xxxx/RNN/onmt/Models.py", line 28, in __init__
    self.word_lut.weight.copy_(pretrained)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 63, in __getattr__
    raise AttributeError(name)
AttributeError: copy_

It seems like there is nothing called copy_ in /usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py
I saved a torch.Tensor in the file pretrained.embeddings.pt and gave this file to the training script. Is there a bug?

How to use a different MT dataset like WMT'15

Tensorflow tutorial Seq2Seq is using the WMT 2015 datasets Following the step by step guide in the OpenNMT readme, will it work switching to those new datasets like one of parallel corpus there?

Thanks.

StackedLSTM vs nn.LSTM

Hi, is there any difference between StackedLSTM and pytorch LSTM?
Specifically, it looks like the effect of

nn.LSTM(input_size, hidden_size,
                        num_layers=layers,
                        dropout=dropout)

and

StackedLSTM(layers, input_size, hidden_size, dropout)

is largely the same.

Is there any reason why StackedLSTM was used?

Error when training from a checkpoint

if opt.train_from:
    print('Loading model from checkpoint at %s' % opt.train_from)
    chk_model = checkpoint['model']
    generator_state_dict = chk_model.generator.state_dict()
    model_state_dict = {k: v for k, v in chk_model.state_dict().items() if 'generator' not in k}
    model.load_state_dict(model_state_dict)
    generator.load_state_dict(generator_state_dict)
    opt.start_epoch = checkpoint['epoch'] + 1

the coding above load previous training state from a checkpoint , however, when I tried to do so, it raises the error that

File "train.py", line 317, in main
generator_state_dict = chk_model.generator.state_dict()
AttributeError: 'dict' object has no attribute 'generator'

After reading the codes, I found that the author already generates the model_state_dict and generator_state_dict while saving models. The relative code see

model_state_dict = model.module.state_dict() if len(opt.gpus) > 1 else model.state_dict()        
model_state_dict = {k: v for k, v in model_state_dict.items() if 'generator' not in k}
generator_state_dict = model.generator.module.state_dict() if len(opt.gpus) > 1 else 
model.generator.state_dict()         
checkpoint = {
            'model': model_state_dict,
            'generator': generator_state_dict,
            'dicts': dataset['dicts'],
            'opt': opt,
            'epoch': epoch,
            'optim': optim
        }
torch.save(checkpoint, '%s_acc_%.2f_ppl_%.2f_e%d.pt' % (opt.save_model, 100*valid_acc, valid_ppl, epoch))

So when you would like to train from a checkpoint, just modify the code

if opt.train_from:
        print('Loading model from checkpoint at %s' % opt.train_from)
        chk_model = checkpoint['model']
        generator_state_dict = chk_model.generator.state_dict()
        model_state_dict = {k: v for k, v in chk_model.state_dict().items() if 'generator' not in k}
        model.load_state_dict(model_state_dict)
        generator.load_state_dict(generator_state_dict)
        opt.start_epoch = checkpoint['epoch'] + 1

if opt.train_from:
        print('Loading model from checkpoint at %s' % opt.train_from)
        model_state_dict = checkpoint['model']
        generator_state_dict = checkpoint['generator']
        model.load_state_dict(model_state_dict)
        generator.load_state_dict(generator_state_dict)
        opt.start_epoch = checkpoint['epoch'] + 1

and then it works.

Add mask is set for Attn during training.

In Decoder.forward, no mask is set for attention model before attention computation. The softmax will has 0 (padding value) as input and the output will be exp(0)/sum exp(x_i) != 0

batchify in class Dataset ignores last batch in input

I have a training dataset with 1950 sentences, when I set the batch size to 64, and create a new dataset instance, I observed that the created dataset has only 30 batches, rather than 31, by inspection, I found out that the last 30 sentences has been ignored. The correct behavior will be to create 31 batches rather than 30, and the size of the last batch will be smaller than 64, but that should be fine!