GithubHelp home page GithubHelp logo

opennmt / opennmt-py Goto Github PK

View Code? Open in Web Editor NEW
6.6K 175.0 2.2K 314.17 MB

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

Home Page: https://opennmt.net/

License: MIT License

Python 95.84% Shell 3.56% Perl 0.50% Dockerfile 0.09%
deep-learning pytorch machine-translation neural-machine-translation language-model llms

opennmt-py's Introduction

OpenNMT-py: Open-Source Neural Machine Translation and (Large) Language Models

Build Status Documentation Gitter Forum

OpenNMT-py is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine translation (and beyond!) framework. It is designed to be research friendly to try out new ideas in translation, language modeling, summarization, and many other NLP tasks. Some companies have proven the code to be production ready.

We love contributions! Please look at issues marked with the contributions welcome tag.

Before raising an issue, make sure you read the requirements and the Full Documentation examples.

Unless there is a bug, please use the Forum or Gitter to ask questions.


For beginners:

There is a step-by-step and explained tuto (Thanks to Yasmin Moslem): Tutorial

Please try to read and/or follow before raising newbies issues.

Otherwise you can just have a look at the Quickstart steps


New:

  • You will need Pytorch v2 preferably v2.2 which fixes some scaled_dot_product_attention issues
  • LLM support with converters for: Llama (+ Mistral), OpenLlama, Redpajama, MPT-7B, Falcon.
  • Support for 8bit and 4bit quantization along with LoRA adapters, with or without checkpointing.
  • You can finetune 7B and 13B models on a single RTX 24GB with 4-bit quantization.
  • Inference can be forced in 4/8bit using the same layer quantization as in finetuning.
  • Tensor parallelism when the model does not fit on one GPU's memory (both training and inference)
  • Once your model is finetuned you can run inference either with OpenNMT-py or faster with CTranslate2.
  • MMLU evaluation script, see results here

For all usecases including NMT, you can now use Multiquery instead of Multihead attention (faster at training and inference) and remove biases from all Linear (QKV as well as FeedForward modules).

If you used previous versions of OpenNMT-py, you can check the Changelog or the Breaking Changes


Tutorials:

  • How to replicate Vicuna with a 7B or 13B llama (or Open llama, MPT-7B, Redpajama) Language Model: Tuto Vicuna
  • How to finetune NLLB-200 with your dataset: Tuto Finetune NLLB-200
  • How to create a simple OpenNMT-py REST Server: Tuto REST
  • How to create a simple Web Interface: Tuto Streamlit
  • Replicate the WMT17 en-de experiment: WMT17 ENDE

Setup

Using docker

To facilitate setup and reproducibility, some docker images are made available via the Github Container Registry: https://github.com/OpenNMT/OpenNMT-py/pkgs/container/opennmt-py

You can adapt the workflow and build your own image(s) depending on specific needs by using build.sh and Dockerfile in the docker directory of the repo.

docker pull ghcr.io/opennmt/opennmt-py:3.4.3-ubuntu22.04-cuda12.1

Example oneliner to run a container and open a bash shell within it

docker run --rm -it --runtime=nvidia ghcr.io/opennmt/opennmt-py:test-ubuntu22.04-cuda12.1

Note: you need to have the Nvidia Container Toolkit (formerly nvidia-docker) installed to properly take advantage of the CUDA/GPU features.

Depending on your needs you can add various flags:

  • -p 5000:5000 to forward some exposed port from your container to your host;
  • -v /some/local/directory:/some/container/directory to mount some local directory to some container directory;
  • --entrypoint some_command to directly run some specific command as the container entry point (instead of the default bash shell);

Installing locally

OpenNMT-py requires:

  • Python >= 3.8
  • PyTorch >= 2.0 <2.2

Install OpenNMT-py from pip:

pip install OpenNMT-py

or from the source:

git clone https://github.com/OpenNMT/OpenNMT-py.git
cd OpenNMT-py
pip install -e .

Note: if you encounter a MemoryError during installation, try to use pip with --no-cache-dir.

(Optional) Some advanced features (e.g. working pretrained models or specific transforms) require extra packages, you can install them with:

pip install -r requirements.opt.txt

Manual installation of some dependencies

Apex is highly recommended to have fast performance (especially the legacy fusedadam optimizer and FusedRMSNorm)

git clone https://github.com/NVIDIA/apex
cd apex
pip3 install -v --no-build-isolation --config-settings --build-option="--cpp_ext --cuda_ext --deprecated_fused_adam --xentropy --fast_multihead_attn" ./
cd ..

Flash attention:

As of Oct. 2023 flash attention 1 has been upstreamed to pytorch v2 but it is recommended to use flash attention 2 with v2.3.1 for sliding window attention support.

When using regular position_encoding=True or Rotary with max_relative_positions=-1 OpenNMT-py will try to use an optimized dot-product path.

if you want to use flash attention then you need to manually install it first:

pip install flash-attn --no-build-isolation

if flash attention 2 is not installed, then we will use F.scaled_dot_product_attention from pytorch 2.x

When using max_relative_positions > 0 or Alibi max_relative_positions=-2 OpenNMT-py will use its legacy code for matrix multiplications.

flash attention and F.scaled_dot_product_attention are a bit faster and saves some GPU memory.

AWQ:

If you want to run inference or quantize an AWQ model you will need AutoAWQ.

For AutoAWQ: pip install autoawq

Documentation & FAQs

Full HTML Documentation

FAQs

Acknowledgements

OpenNMT-py is run as a collaborative open-source project. Project was incubated by Systran and Harvard NLP in 2016 in Lua and ported to Pytorch in 2017.

Current maintainers (since 2018):

François Hernandez Vincent Nguyen (Seedfall)

Citation

If you are using OpenNMT-py for academic work, please cite the initial system demonstration paper published in ACL 2017:

@misc{klein2018opennmt,
      title={OpenNMT: Neural Machine Translation Toolkit}, 
      author={Guillaume Klein and Yoon Kim and Yuntian Deng and Vincent Nguyen and Jean Senellart and Alexander M. Rush},
      year={2018},
      eprint={1805.11462},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

opennmt-py's People

Contributors

adamlerer avatar anderleich avatar apaszke avatar bmccann avatar bpopeters avatar da03 avatar flauted avatar francoishernandez avatar funboarder13920 avatar guillaumekln avatar gwenniger avatar helson73 avatar jianyuzhan avatar jsenellart avatar justinchiu avatar l-k-11235 avatar meocong avatar panosk avatar pltrdy avatar scarletpan avatar sebastiangehrmann avatar soumith avatar srush avatar tayciryahmed avatar thammegowda avatar vince62s avatar waino avatar wjbianjason avatar xutaima avatar zenglinxiao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opennmt-py's Issues

Training from a checkpoint on new data

Hello, I want to fine tune a pretrained model on new data (incremental adaptation) using some new parameters (epochs, learning rate). However, I have been facing problems while using the train_from option due to varying vocabulary sizes (due to different data, hence different preprocessed .pt files, hence different vocab) in both iterations. Could anyone please suggest a smooth way to achieve this? (i.e. make it train on new data from a checkpoint, using existing vocab but a few new parameters)

a strange problem about saving memory in training process

hello guys,

i have some strange problems here,

if i use memoryEfficientLoss function to backward loss, the training seems to be normal

but if i put the content of memoryEfficientLoss into trainEpoch function and do not define extra memoryEfficientLoss function, the training will not converge, all other code are the same.

and another question is that i guess the split operation along the first dimension to ouputs of the model can save the memory, however, if so, how do we calculate gradients and do backward and why do you call the backward() two times (loss.backward() and outputs.backward()) ? can you explain this ? thank you.

Can anyone tell me why ? any reply will be appreciated.

Beam search shape consistency

        decOut = decOut.squeeze(0)
        out = self.model.generator.forward(decOut)

        # batch x beam x numWords
        wordLk = out.view(beamSize, remainingSents, -1).transpose(0, 1).contiguous()
        attn = attn.view(beamSize, remainingSents, -1).transpose(0, 1).contiguous()

`

Here, shouldn't the view be out.view(remainingSents, beamSize, -1)?
The decStates, inputs to the Decoder are stacked by beamSize. Wouldn't the outputs be extracted in the wrong order if you do wordLk = out.view(beamSize, remainingSents, -1) and then take a transpose?

AttributeError: 'NMTModel' object has no attribute 'items'

When I tried to use the pre-trained model onmt_model_en_de_200k, by
python translate.py -gpu 0 -model onmt_model_en_de_200k-4783d9c3.pt -src data/multi30k/test.en.atok -tgt data/multi30k/test.de.atok -replace_u nk -verbose -output multi30k.test.pred.atok
got the following message:

Traceback (most recent call last):
File "translate.py", line 135, in
main()
File "translate.py", line 62, in main
translator = onmt.Translator(opt)
File "~/OpenNMT-py/onmt/Translator.py", line 26, in init
model.load_state_dict(checkpoint['model'])
File "~/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 328, in load_state_dict
for name, param in state_dict.items():
File "~/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 238, in _getattr_
type(self)._name_, name))
AttributeError: 'NMTModel' object has no attribute 'items'

I tried to modify the source code as some warnings appeared:

SourceChangeWarning: source code of class 'torch.nn.modules.dropout.Dropout' has changed. you can retrieve the original source code by accessing the object's source attribute or settorch.nn.Module.dump_patches = True and use the patch tool to revert the changes.

but I got same error plus the following message:

SourceChangeWarning: source code of class 'torch.nn.modules.dropout.Dropout' has changed. Tried to save a patch, but couldn't create a writable file Dropout.patch. Make sure it doesn't exist and your working directory is writable.

AssertionError: Torch not compiled with CUDA enabled

when I run

python train.py -data data/multi30k.atok.low.train.pt -save_model multi30k_model -gpus 0

An error occurs, Could anyone help me?

Namespace(batch_size=64, brnn=False, brnn_merge='concat', curriculum=False, data='data/multi30k.atok.low.train.pt', dropout=0.3, epochs=13, extra_shuffle=False, gpus=[0], input_feed=1, layers=2, learning_rate=1.0, learning_rate_decay=0.5, log_interval=50, max_generator_batches=32, max_grad_norm=5, optim='sgd', param_init=0.1, pre_word_vecs_dec=None, pre_word_vecs_enc=None, rnn_size=500, save_model='multi30k_model', start_decay_at=8, start_epoch=1, train_from='', train_from_state_dict='', word_vec_size=500)
Loading data from 'data/multi30k.atok.low.train.pt'
* vocabulary size. source = 9799; target = 18006
* number of training sentences. 29000
* maximum batch size. 64
Building model...
Traceback (most recent call last):
File "train.py", line 356, in <module>
main()
File "train.py", line 315, in main
model.cuda()
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 147, in cuda
return self._apply(lambda t: t.cuda(device_id))
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply
module._apply(fn)
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 118, in _apply
module._apply(fn)
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 124, in _apply
param.data = fn(param.data)
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 147, in <lambda>
return self._apply(lambda t: t.cuda(device_id))
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/_utils.py", line 65, in _cuda
return new_type(self.size()).copy_(self, async)
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/cuda/__init__.py", line 272, in __new__ _lazy_init()
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/cuda/__init__.py", line 84, in _lazy_init _check_driver()
File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/cuda/__init__.py", line 51, in _check_driver raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

GPU & encoding problems with translate.py

Hi All,

So I just trained my first model with OpenNMT-py, everything looks great and easy to play with. But when I started to run the inference part, things start to get tricky.

First, since I was translate from Chinese to English, everything in translate.py that's dealing with files and I/O is not working because of encoding problems. I fixed this by substituting open with codecs.open. You may want to fix this, or I can submit a pull request after more through testing.

(Not sure if this is relevant, but I'm using Python 3.6.1.)

Second, I assume when I pass -gpu 0 I'm just using GPU device 0, but it does not look like so -- here is the nvidia-smi output when I run the program:

Tue Jun 20 17:38:50 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20m          On   | 0000:02:00.0     Off |                    0 |
| N/A   25C    P0    85W / 225W |    871MiB /  4742MiB |     93%   E. Process |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20m          On   | 0000:03:00.0     Off |                    0 |
| N/A   23C    P0    46W / 225W |   1300MiB /  4742MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     27148    C   python                                         869MiB |
|    1     27148    C   python                                        1298MiB |
+-----------------------------------------------------------------------------+

which seems to indicate there is another mysterious process which is taking a lot of memories but not doing anything. Any idea what started that process? And is there any chance I can get rid of it?

Here is the exact command I used to run translate.py:

python ~/nmt/opennmt/translate.py -gpu 0 -model $PWD/model/model_acc_58.17_ppl_7.80_e13.pt -src data/eval08.bpe.zh -tgt data/eval08.en.tok -output out

Thanks!

Error loading pretrained word embeddings

When calling train.py with -pre_word_vecs_enc pretrained.embeddings.pt it displays an error

Traceback (most recent call last):
  File "train.py", line 352, in <module>
    main()
  File "train.py", line 289, in main
    encoder = onmt.Models.Encoder(opt, dicts['src'])
  File "/home/xxxx/RNN/onmt/Models.py", line 28, in __init__
    self.word_lut.weight.copy_(pretrained)
  File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 63, in __getattr__
    raise AttributeError(name)
AttributeError: copy_

It seems like there is nothing called copy_ in /usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py
I saved a torch.Tensor in the file pretrained.embeddings.pt and gave this file to the training script. Is there a bug?

RuntimeError: size mismatch (when using -brnn option)

When I train a model with default parameters and translate with translate.py it works fine. When I use the brnn option for training, the train code works fine but the translate.py throws a RuntimeError: size mismatch. Please help me out if I am doing anything wrong. These are the codes that I executed:

python preprocess.py -train_src data/src-train.txt -train_tgt data/trg-train.txt -valid_src data/src-val.txt -valid_tgt data/trg-val.txt -save_data data/ehtrans

python train.py -data data/ehtrans-train.pt -save_model eh/model -brnn -brnn_merge concat -epochs 1000 -cuda

python translate.py -model eh/model_e7_13.28.pt -src data/src-val.txt -tgt data/trg-val.txt -output file-tgt.tok -cuda

The preprocess and train codes work fine but the translate code throws the following error:

Traceback (most recent call last):
  File "translate.py", line 121, in <module>
    main()
  File "translate.py", line 74, in main
    predBatch, predScore, goldScore = translator.translate(srcBatch, tgtBatch)
  File "/DATA/USERS/irshad/OpenNMT-py/onmt/Translator.py", line 190, in translate
    pred, predScore, attn, goldScore = self.translateBatch(batch)
  File "/DATA/USERS/irshad/OpenNMT-py/onmt/Translator.py", line 87, in translateBatch
    tgtBatch[:-1], decStates, context, initOutput)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/DATA/USERS/irshad/OpenNMT-py/onmt/Models.py", line 119, in forward
    output, hidden = self.rnn(emb_t, hidden)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/DATA/USERS/irshad/OpenNMT-py/onmt/Models.py", line 61, in forward
    h_1_i, c_1_i = layer(input, (h_0[i], c_0[i]))
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py", line 472, in forward
    self.bias_ih, self.bias_hh,
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/rnn.py", line 22, in LSTMCell
    gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 381, in linear
    return bias and state(input, weight, bias) or state(input, weight)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/linear.py", line 10, in forward
    output.addmm_(0, 1, input, weight.t())
RuntimeError: size mismatch, m1: [30 x 250], m2: [500 x 2000] at /home/soumith/local/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:862

Slow in Training

Thanks for the Pytorch version of ONMT. It seems to me that the training is slower than Lua version of ONMT. With lua version of ONMT I was getting about 3000 tokens/sec where I am getting here 1700 tokens/sec. Both version I am using 4 layers and 1000(rnn size). Is it correct behavior or I am missing something?

Pre-trained model not compatible with current implemenation

The English-German pre-trained model (onmt_model_en_de_200k-4783d9c3.pt) is not compatible anymore with the current implementation.

When loading the model and translating, the following error emerges: AttributeError: 'StackedLSTM' object has no attribute 'num_layers'

I checked the history for the StackedLSTM class and it seems the initial version didn't include the num_layers attribute.

Context gate runtime error

How to properly use newly committed -context_gate feature as I couldn't make it works. I keep getting the following error.

$ python ../OpenNMT/train.py -gpus 0 -log_interval 100 -rnn_size 2048 -word_vec_size 1024 -context_gate both -optim adam -learning_rate 0.0001 -data data.000.000.train.pt
Namespace(batch_size=64, brnn=False, brnn_merge='concat', context_gate='both', curriculum=False, data='data.000.000.train.pt', dropout=0.3, encoder_type='text', epochs=13, extra_shuffle=False, gpus=[0], input_fe
ed=1, layers=2, learning_rate=0.0001, learning_rate_decay=0.5, log_interval=100, max_generator_batches=32, max_grad_norm=5, optim='adam', param_init=0.1, pre_word_vecs_dec=None, pre_word_vecs_enc=None, rnn_size=2048, rnn_type='LSTM', save_model='model', seed=-1, start_decay_at=8, start_epoch=1, train_from='', train_from_state_dict='', word_vec_size=1024)
Loading data from 'data.000.000.train.pt'
 * vocabulary size. source = 10004; target = 10004
 * number of training sentences. 2000000
 * maximum batch size. 64
Building model...
* number of parameters: 208830228
NMTModel (
  (encoder): Encoder (
    (word_lut): Embedding(10004, 1024, padding_idx=0)
    (rnn): LSTM(1024, 2048, num_layers=2, dropout=0.3)
  ) 
  (decoder): Decoder (
    (word_lut): Embedding(10004, 1024, padding_idx=0)
    (rnn): StackedLSTM (
      (dropout): Dropout (p = 0.3)
      (layers): ModuleList (
        (0): LSTMCell(3072, 2048)
        (1): LSTMCell(2048, 2048)
      ) 
    ) 
    (attn): GlobalAttention (
      (linear_in): Linear (2048 -> 2048)
      (sm): Softmax ()
      (linear_out): Linear (4096 -> 2048)
      (tanh): Tanh ()
    ) 
    (context_gate): BothContextGate (
      (context_gate): ContextGate (
        (gate): Linear (5120 -> 2048)
        (sig): Sigmoid ()
        (source_proj): Linear (2048 -> 2048)
        (target_proj): Linear (3072 -> 2048)
      ) 
      (tanh): Tanh ()
    )  
    (dropout): Dropout (p = 0.3)
  )
  (generator): Sequential (
    (0): Linear (2048 -> 10004)
    (1): LogSoftmax ()
  )
)

Traceback (most recent call last):
  File "../OpenNMT/train.py", line 415, in <module>
    main()
  File "../OpenNMT/train.py", line 411, in main
    trainModel(model, trainData, validData, dataset, optim)
  File "../OpenNMT/train.py", line 268, in trainModel
    train_loss, train_acc = trainEpoch(epoch)
  File "../OpenNMT/train.py", line 229, in trainEpoch
    outputs = model(batch)
  File "/XXX/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/beritagar/workdir/src/OpenNMT-py/onmt/Models.py", line 198, in forward
    context, init_output)
  File "/XXX/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/beritagar/workdir/src/OpenNMT-py/onmt/Models.py", line 152, in forward
    emb_t.squeeze(0), rnn_output, attn_output
  File "/XXX/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/beritagar/workdir/src/OpenNMT-py/onmt/modules/Gate.py", line 89, in forward
    z, source, target = self.context_gate(prev_emb, dec_state, attn_state)
  File "/XXX/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/beritagar/workdir/src/OpenNMT-py/onmt/modules/Gate.py", line 39, in forward
    input_tensor = torch.cat((prev_emb, dec_state, attn_state), dim=2)
  File "/XXX/lib/python3.6/site-packages/torch/autograd/variable.py", line 841, in cat
    return Concat(dim)(*iterable)
  File "/XXX/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 309, in forward
    self.input_sizes = [i.size(self.dim) for i in inputs]
  File "/XXX/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 309, in <listcomp>
    self.input_sizes = [i.size(self.dim) for i in inputs]
RuntimeError: out of range at /py/conda-bld/pytorch_1493681908901/work/torch/lib/THC/generic/THCTensor.c:23

-brnn_merge is unused in the code?

-brnn_merge: Merge action for the bidirectional hidden states: [concat|sum]

I searched the -brnn_merge flag in the code, but it seems that this function has not been implemented.

Thanks!

More pre-trained data

Hope this is not a dumb question.
Is there some way to get pre-trained data for other language? I'm interested in contributing on other language(such as Korean or Japanese), but cannot find it. Do I need to make it myself, or there is some way to create it?

Thanks.

beam search outputs the same result many times

Here, I use beam search, and set beam_size and n_best both to 50, but the result shows that the same result outputs many times. I don't know why. Is it because of beam search itself or I do something wrong ?

Is this line redundancy?

When i use train_from_state_dict , it will report the follow error:

Traceback (most recent call last):
File "train.py", line 356, in
main()
File "train.py", line 352, in main
trainModel(model, trainData, validData, dataset, optim)
File "train.py", line 234, in trainModel
train_loss, train_acc = trainEpoch(epoch)
File "train.py", line 206, in trainEpoch
optim.step()
File "/home/XXX/workplaces/OpenNMT-py/onmt/Optim.py", line 34, in step
self.optimizer.step()
File "/usr/local/lib/python2.7/dist-packages/torch/optim/adadelta.py", line 43, in step
state = self.state[p]
KeyError: Parameter containing:
-5.1825e-02 -1.6526e-02 4.0255e-02 ... 6.6652e-02 8.8333e-02 7.9287e-02
-6.2284e-02 -1.1338e-01 1.4133e-01 ... 6.0823e-02 -1.1704e-01 -2.1166e-02
-1.8695e-01 8.3002e-02 -1.1960e-01 ... -1.0802e-01 3.1869e-01 3.1139e-02
... ⋱ ...
-9.4983e-02 -5.2894e-02 -9.2437e-02 ... 3.6116e-02 -2.0674e-01 6.3990e-02
-7.7295e-02 -1.6950e-01 3.5867e-02 ... 4.5825e-02 -5.6685e-02 -6.2091e-03
-3.7790e-02 -1.1555e-01 -3.2032e-02 ... -1.9250e-01 -7.9354e-02 -1.6450e-01
[torch.cuda.FloatTensor of size 30008x620 (GPU 3)]

So, i remove this part. I think if use opt.train_from_state_dict, thers is no need to use optim.optimizer.load_state_dict(checkpoint['optim'].optimizer.state_dict())

Error when training from a checkpoint

if opt.train_from:
    print('Loading model from checkpoint at %s' % opt.train_from)
    chk_model = checkpoint['model']
    generator_state_dict = chk_model.generator.state_dict()
    model_state_dict = {k: v for k, v in chk_model.state_dict().items() if 'generator' not in k}
    model.load_state_dict(model_state_dict)
    generator.load_state_dict(generator_state_dict)
    opt.start_epoch = checkpoint['epoch'] + 1

the coding above load previous training state from a checkpoint , however, when I tried to do so, it raises the error that

File "train.py", line 317, in main
generator_state_dict = chk_model.generator.state_dict()
AttributeError: 'dict' object has no attribute 'generator'

After reading the codes, I found that the author already generates the model_state_dict and generator_state_dict while saving models. The relative code see

model_state_dict = model.module.state_dict() if len(opt.gpus) > 1 else model.state_dict()        
model_state_dict = {k: v for k, v in model_state_dict.items() if 'generator' not in k}
generator_state_dict = model.generator.module.state_dict() if len(opt.gpus) > 1 else 
model.generator.state_dict()         
checkpoint = {
            'model': model_state_dict,
            'generator': generator_state_dict,
            'dicts': dataset['dicts'],
            'opt': opt,
            'epoch': epoch,
            'optim': optim
        }
torch.save(checkpoint, '%s_acc_%.2f_ppl_%.2f_e%d.pt' % (opt.save_model, 100*valid_acc, valid_ppl, epoch))

So when you would like to train from a checkpoint, just modify the code

if opt.train_from:
        print('Loading model from checkpoint at %s' % opt.train_from)
        chk_model = checkpoint['model']
        generator_state_dict = chk_model.generator.state_dict()
        model_state_dict = {k: v for k, v in chk_model.state_dict().items() if 'generator' not in k}
        model.load_state_dict(model_state_dict)
        generator.load_state_dict(generator_state_dict)
        opt.start_epoch = checkpoint['epoch'] + 1

to

if opt.train_from:
        print('Loading model from checkpoint at %s' % opt.train_from)
        model_state_dict = checkpoint['model']
        generator_state_dict = checkpoint['generator']
        model.load_state_dict(model_state_dict)
        generator.load_state_dict(generator_state_dict)
        opt.start_epoch = checkpoint['epoch'] + 1

and then it works.

potential memory leak on large scale dataset

Hi,

I am training a dialog system with opennmt-py on GPUs
My dataset contains 26,265,224 sequence pairs (available here https://github.com/jiweil/Neural-Dialogue-Generation)

I observed a potential memory leak on CPU: GPU memory consumption remains constant (2.9G), but CPU RAM is almost eaten up.
Initially the cpu memory consumption is around 27G (my dataset is large and it's acceptable), but after 38.5 hours (around 3.4 epochs) of training, it becomes 53G.

Here is my training script

CUDA_VISIBLE_DEVICES=3 python $codedir/train.py -data $data -save_model $model -gpus 0 -batch_size 128 -max_generator_batches 64 -rnn_size 512 -word_vec_size 512 2>&1 | tee $label.log.txt

Since I am training models on GPUs rather than CPUs, but with increased memory consumption on CPU. There might be something to do with the dynamic computational graph, which I believe is created on CPU. It is likely that after each batch of training, the dynamic graph is not destroyed properly. I am not sure if it is because the maximum sequence length are different from each batch and pytorch has to create a new graph per batch.

Here is my torch version

>>> import torch
>>> torch.__version__
'0.1.10+ac9245a'

Great perplexity on training but useless translations

Hi,

I'm running the last updated OpenNMT-py with the latest pytorch, cuda7.5 on a GPU K80.
I ran the training with both the data provided in the example and with the en-fr from IWSLT2016. In both cases, the perplexity during training gets very low (single digit for every minibatch) but the accuracy is always 0.0.
Moreover, when I try to translate the validation sets, the translations seems totally random.

Translation Crashes when ground truth output is not provided

It seems like the current translation implementation assumes we always have access to the ground truth target translations (provided using the -tgt flag). This is not desirable when we don't have access to the ground truth target sentences.

StackedLSTM vs nn.LSTM

Hi, is there any difference between StackedLSTM and pytorch LSTM?
Specifically, it looks like the effect of

nn.LSTM(input_size, hidden_size,
                        num_layers=layers,
                        dropout=dropout)

and

StackedLSTM(layers, input_size, hidden_size, dropout)

is largely the same.

Is there any reason why StackedLSTM was used?

Translate fails when using model trained with DataParallel

GPU: 4 Titan X

Traceback (most recent call last):
File "/home/nikola/code/pytorch-examples/examples/OpenNMT/translate.py", line 116, in
main()
File "/home/nikola/code/pytorch-examples/examples/OpenNMT/translate.py", line 77, in main
predBatch, predScore, goldScore = translator.translate(srcBatch, tgtBatch)
File "/home/nikola/code/pytorch-examples/examples/OpenNMT/onmt/Translator.py", line 195, in translate
pred, predScore, attn, goldScore = self.translateBatch(batch)
File "/home/nikola/code/pytorch-examples/examples/OpenNMT/onmt/Translator.py", line 60, in translateBatch
encStates, context_t = self.model.encoder(srcBatch_t, hidden=encStates)
File "/home/nikola/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 235, in getattr
return object.getattribute(self, name)
AttributeError: 'DataParallel' object has no attribute 'encoder'

If I change self.model.encoder to self.model.module.encoder on line 60, i get OOM. Loading this model from checkpoint, to resume training, works fine.
Thanks!

No module named utils.rnn

$ python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo
Traceback (most recent call last):
File "preprocess.py", line 1, in
import onmt
File "/home/zeng/opensource/OpenNMT-py/onmt/init.py", line 2, in
import onmt.Models
File "/home/zeng/opensource/OpenNMT-py/onmt/Models.py", line 5, in
from torch.nn.utils.rnn import pad_packed_sequence as unpack
ImportError: No module named utils.rnn

About the _fix_enc_hidden() function

Hi, this project is great.
I found this code

    def _fix_enc_hidden(self, h):
        #  the encoder hidden is  (layers*directions) x batch x dim
        #  we need to convert it to layers x batch x (directions*dim)
        if self.encoder.num_directions == 2:
            return h.view(h.size(0) // 2, 2, h.size(1), h.size(2)) \
                    .transpose(1, 2).contiguous() \
                    .view(h.size(0) // 2, h.size(1), h.size(2) * 2)
        else:
            return h

If we do this, we assume the h_n is like
[
layer0_forward
layer0_backward
layer1_forward
layer1_backward
layer2_forward
layer2_backward
...
]
So how can I know it's correct instead of like
[
layer0_forward
layer1_forward
layer2_forward
layer0_backward
layer1_backward
layer2_backward
...
]

optim.optimizer.state_dict.state not found when loading from checkpoint

I encountered an error when using Adam optimizer and resuming training from checkpoint, which states that state in Adam optimizer is not found. I found that the line optim.set_parameters(model.parameters()) wipe out states in
optim.optimizer.state_dict.state.

if not opt.train_from_state_dict and not opt.train_from:
        for p in model.parameters():
            p.data.uniform_(-opt.param_init, opt.param_init)
        encoder.load_pretrained_vectors(opt)
        decoder.load_pretrained_vectors(opt)
        optim = onmt.Optim(
            opt.optim, opt.learning_rate, opt.max_grad_norm,
            lr_decay=opt.learning_rate_decay,
            start_decay_at=opt.start_decay_at
        )
    else:
        print('Loading optimizer from checkpoint:')
        optim = checkpoint['optim']
        print(optim)

    optim.set_parameters(model.parameters())

after I move the line optim.set_parameters(model.parameters()) into the block under if statement, the code works fine:

if not opt.train_from_state_dict and not opt.train_from:
        for p in model.parameters():
            p.data.uniform_(-opt.param_init, opt.param_init)
        encoder.load_pretrained_vectors(opt)
        decoder.load_pretrained_vectors(opt)
        optim = onmt.Optim(
            opt.optim, opt.learning_rate, opt.max_grad_norm,
            lr_decay=opt.learning_rate_decay,
            start_decay_at=opt.start_decay_at
        )
        optim.set_parameters(model.parameters())
    else:
        print('Loading optimizer from checkpoint:')
        optim = checkpoint['optim']
        print(optim)

About MultiGPU Usage

When i use Opennmt-py with MultiGPU, it will report the follow error:

Traceback (most recent call last):
File "train.py", line 356, in
main()
File "train.py", line 352, in main
trainModel(model, trainData, validData, dataset, optim)
File "train.py", line 234, in trainModel
train_loss, train_acc = trainEpoch(epoch)
File "train.py", line 198, in trainEpoch
outputs = model(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
return parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
raise output
ValueError: lengths array has incorrect size

But in Single GPU, it has no problem, is it a bug or other problem?

batchify in class Dataset ignores last batch in input

I have a training dataset with 1950 sentences, when I set the batch size to 64, and create a new dataset instance, I observed that the created dataset has only 30 batches, rather than 31, by inspection, I found out that the last 30 sentences has been ignored. The correct behavior will be to create 31 batches rather than 30, and the size of the last batch will be smaller than 64, but that should be fine!

MultiGPU issues

Hi,
This might be already known to you guys (since the README does mention at the end that multi-GPU is not supported), but I encountered this just now so thought I might give a heads up!
The code is supposed to support multiGPU via nn.DataParallel, however, it is breaking due to usage of nn.utils.pack_padded_sequence in the encoder. Basically, when the input data of padded sequences is sent to a DataParallel, it is split along the batch dimension, but the list containing the lengths is not split, resulting in a size mismatch error as follows:

File "/somepath/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
    raise output
ValueError: lengths array has incorrect size

If the input was a tuple of a B*L*D Variable and a B length list of lengths, where B is the batchsize, L is the padded length, and D is the feature size, the nn.DataParallel scatters the B*L*D into B/K*L*D, where K is the number of GPUs, while the B length list is shallow copied. This causes nn.utils.pack_padded_sequence to raise an error.

Translator replace_unk exception when max attention is on EOS

Translator.py#61
uses attention to replace unknowns in decoded tokens, but sometimes the max attention is on EOS, which causes an exception.

File "/home/user/pytorch-seq2seq/onmt/Translator.py", line 65, in buildTargetTokens
tokens[i] = src[maxIndex[0]]
IndexError: list index out of range

Should we guard against this case? Or maybe exclude specials when finding the max attention token?

multiple GPUs do not reduce training time

I am trying to use multiple GPU in training, but I am not able to reduce training time .

I have a machine with 3 GPUs (GeForce GTX 1080), and I train a network (details are below)
I tried with different amount of GPUS (1 or 2 or 3) and different batch_size (64,128,192,248)
Here is the table reporting the times of one epoch

using 1 GPU:
batch_size=64 43 seconds
batch_size=128 35 seconds
batch_size=192 32 seconds
batch_size=248 30 seconds

using 2 GPUs:
batch_size=64 78 seconds
batch_size=128 51 seconds
batch_size=192 43 seconds
batch_size=248 40 seconds

using 3 GPUs:
batch_size=64 94 seconds
batch_size=128 60 seconds
batch_size=192 50 seconds
batch_size=248 44 seconds

I also notice that the GPU utilization is quite low when multiple GPUs are used
with 1 GPU GPU utilization: 80-90%
with 2 GPU GPU utilization: 45-55%
with 3 GPU GPU utilization: 35-45%

I am using this setting (gpus and batch_size vary according to the experiments):
Namespace(batch_size=128, brnn=False, brnn_merge='concat', context_gate=None, curriculum=False, data='debugging/model.train.pt', dropout=0.3, encoder_type='text', epochs=13, extra_shuffle=False, gpus=[0], input_feed=1, layers=2, learning_rate=1.0, learning_rate_decay=0.5, log_interval=50, max_generator_batches=32, max_grad_norm=5, optim='sgd', param_init=0.1, pre_word_vecs_dec=None, pre_word_vecs_enc=None, rnn_size=500, rnn_type='LSTM', save_model='debugging/model', seed=-1, start_decay_at=8, start_epoch=1, train_from='', train_from_state_dict='', word_vec_size=500)

and this commit 58c8b52

Why training speed does not scale with the number of GPUs?
But rather it seems to slow down the training.

Have you already noticed this behavior?
Am I doing any error?

Any comment is welcome.

No such file or directory: 'multi30k_model_e13_*.pt'

I did the training successfully

...
Epoch 13,   450/  454; acc:  75.87; ppl:   2.87; 2975 src tok/s; 3085 tgt tok/s;   1714 s elapsed
Train perplexity: 2.91169
Train accuracy: 75.7449
Validation perplexity: 5.64726
Validation accuracy: 70.0177
Decaying learning rate to 0.015625

and then from the same folder

root@27277298c897:/# python translate.py -gpu 0 -model multi30k_model_e13_*.pt -src data/multi30k/test.en.atok -tgt data/multi30k/test.de.atok -replace_unk -verbose -output multi30k.test.pred.atok
Traceback (most recent call last):
  File "translate.py", line 135, in <module>
    main()
  File "translate.py", line 62, in main
    translator = onmt.Translator(opt)
  File "/musixmatch/onmt/Translator.py", line 12, in __init__
    checkpoint = torch.load(opt.model)
  File "/usr/local/lib/python2.7/dist-packages/torch/serialization.py", line 220, in load
    f = open(f, 'rb')
IOError: [Errno 2] No such file or directory: 'multi30k_model_e13_*.pt'

The root folder content is

-rw-r--r-- 1 root root      1137 Apr  3 15:07 LICENSE.md
-rw-r--r-- 1 root root      3606 Apr  3 15:07 README.md
drwxr-xr-x 5 root root      4096 Apr  3 15:53 data
-rw-r--r-- 1 root root      4826 Apr  3 15:53 multi-bleu.perl
-rw-r--r-- 1 root root 131591914 Apr  3 15:56 multi30k_model_acc_28.59_ppl_86.75_e1.pt
-rw-r--r-- 1 root root 131591914 Apr  3 15:58 multi30k_model_acc_37.08_ppl_38.35_e2.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:00 multi30k_model_acc_52.30_ppl_15.70_e3.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:02 multi30k_model_acc_58.82_ppl_10.89_e4.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:04 multi30k_model_acc_62.22_ppl_8.55_e5.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:09 multi30k_model_acc_64.36_ppl_7.17_e7.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:07 multi30k_model_acc_64.60_ppl_7.32_e6.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:11 multi30k_model_acc_66.44_ppl_6.52_e8.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:13 multi30k_model_acc_68.62_ppl_5.88_e9.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:15 multi30k_model_acc_69.69_ppl_5.68_e10.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:18 multi30k_model_acc_69.71_ppl_5.67_e11.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:22 multi30k_model_acc_70.02_ppl_5.65_e13.pt
-rw-r--r-- 1 root root 131591914 Apr  3 16:20 multi30k_model_acc_70.04_ppl_5.64_e12.pt
-rw-r--r-- 1 root root    196775 Apr  3 15:53 nonbreaking_prefix.de
-rw-r--r-- 1 root root    159010 Apr  3 15:53 nonbreaking_prefix.en
drwxr-xr-x 4 root root      4096 Apr  3 15:53 onmt
-rw-r--r-- 1 root root      5977 Apr  3 15:07 preprocess.py
-rwxr-xr-x 1 root root       464 Apr  3 15:52 preprocess.sh
-rw-r--r-- 1 root root     16790 Apr  3 15:53 tokenizer.perl
-rw-r--r-- 1 root root     14316 Apr  3 15:07 train.py
-rw-r--r-- 1 root root         0 Apr  3 15:44 train.sh
-rw-r--r-- 1 root root      4774 Apr  3 15:07 translate.py

So I do not see the multi30k_model_e13_*.pt model file there.

Add mask is set for Attn during training.

In Decoder.forward, no mask is set for attention model before attention computation. The softmax will has 0 (padding value) as input and the output will be exp(0)/sum exp(x_i) != 0

translate.py crashes with a medium size seq2seq model

python translate.py -model test_model
GPU : titan X pascal

error message:
THCudaCheck FAIL file=/data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.10_1488756735684/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "translate.py", line 116, in
main()
File "translate.py", line 77, in main
predBatch, predScore, goldScore = translator.translate(srcBatch, tgtBatch)
File "/home/user_name/openNMT_proj_name/onmt/Translator.py", line 199, in translate
pred, predScore, attn, goldScore = self.translateBatch(batch)
File "/home/user_name/openNMT_proj_name/onmt/Translator.py", line 64, in translateBatch
encStates, context_t = self.model.encoder(srcBatch_t, hidden=encStates)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 202, in call
result = self.forward(*input, **kwargs)
File "/home/user_name/openNMT_proj_name/onmt/Models.py", line 40, in forward
outputs, hidden_t = self.rnn(emb, hidden)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 202, in call
result = self.forward(*input, **kwargs)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/nn/modules/rnn.py", line 91, in forward
output, hidden = func(input, self.all_weights, hx)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/nn/_functions/rnn.py", line 327, in forward
return func(input, *fargs, **fkwargs)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/autograd/function.py", line 201, in _do_forward
flat_output = super(NestedIOFunction, self)._do_forward(*flat_input)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/autograd/function.py", line 223, in forward
result = self.forward_extended(*nested_tensors)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/nn/_functions/rnn.py", line 269, in forward_extended
cudnn.rnn.forward(self, input, hx, weight, output, hy)
File "/home/user_name/anaconda2/lib/python2.7/site-packages/torch/backends/cudnn/rnn.py", line 247, in forward
fn.weight_buf = x.new(num_weights)
RuntimeError: cuda runtime error (2) : out of memory at /data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.10_1488756735684/work/torch/lib/THC/generic/THCStorage.cu:66

Mask for attention

I wonder why we didn't apply mask in GlobalAttention since we paded zeros on the right. I find there actually have a applyMask function but not used.
Thank you.

Where is multi-bleu.perl?

haixu@my-machine:~/Desktop/OpenNMT-cls-mdn$ perl multi-bleu.perl data/test.de.atok < cls2mdn.test.pred.atok
Can't open perl script "multi-bleu.perl": No such file or directory

AttributeError: 'module' object has no attribute '_cuda_setDevice'

when I try to train the model,
python train.py -data data/multi30k.atok.low.train.pt -save_model multi30k_model -gpus 0
An error occurs. What should I do ?
Namespace(batch_size=64, brnn=False, brnn_merge='concat', curriculum=False, data='data/multi30k.atok.low.train.pt', dropout=0.3, epochs=13, extra_shuffle=False, gpus=[0], input_feed=1, layers=2, learning_rate=1.0, learning_rate_decay=0.5, log_interval=50, max_generator_batches=32, max_grad_norm=5, optim='sgd', param_init=0.1, pre_word_vecs_dec=None, pre_word_vecs_enc=None, rnn_size=500, save_model='multi30k_model', start_decay_at=8, start_epoch=1, train_from='', train_from_state_dict='', word_vec_size=500) Traceback (most recent call last): File "train.py", line 119, in <module> cuda.set_device(opt.gpus[0]) File "/home/ljy/anaconda2/lib/python2.7/site-packages/torch/cuda/__init__.py", line 161, in set_device torch._C._cuda_setDevice(device) AttributeError: 'module' object has no attribute '_cuda_setDevice''

got an error when translate a GRU -brnn model

/Users/ChaiDuo/.virtualenvs/pytorch/bin/python /Users/ChaiDuo/Code/Project/OpenNMT-py/translate.py
Traceback (most recent call last):
  File "/Users/ChaiDuo/Code/Project/OpenNMT-py/translate.py", line 162, in <module>
    main()
  File "/Users/ChaiDuo/Code/Project/OpenNMT-py/translate.py", line 109, in main
    tgtBatch)
  File "/Users/ChaiDuo/Code/Project/OpenNMT-py/onmt/Translator.py", line 268, in translate
    pred, predScore, attn, goldScore = self.translateBatch(src, tgt)
  File "/Users/ChaiDuo/Code/Project/OpenNMT-py/onmt/Translator.py", line 117, in translateBatch
    encStates = (self.model._fix_enc_hidden(encStates[0]),
  File "/Users/ChaiDuo/Code/Project/OpenNMT-py/onmt/Models.py", line 166, in _fix_enc_hidden
    return h.view(h.size(0) // 2, 2, h.size(1), h.size(2)) \
RuntimeError: dimension 2 out of range of 2D tensor at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensor.c:24

Process finished with exit code 1

got an error when translate a GRU -brnn model

`TypeError: 'NoneType' object is not callable` during training

I was trying to train on mac with cpu with the following steps:

  1. preprocess data and shrink src and tgt to have only the first 100 sentences by inserting the following lines after line133 in preprocess.py
    shrink = True
    if shrink:
        src = src[0:100]
        tgt = tgt[0:100]

then, I ran

python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo

  1. then I train using python train.py -data data/demo.train.pt -save_model demo_model

Then it rans ok for a while before an error appeared:

(dlnd-tf-lab)  ->python train.py -data data/demo.train.pt -save_model demo_model
Namespace(batch_size=64, brnn=False, brnn_merge='concat', curriculum=False, data='data/demo.train.pt', dropout=0.3, epochs=13, extra_shuffle=False, gpus=[], input_feed=1, layers=2, learning_rate=1.0, learning_rate_decay=0.5, log_interval=50, max_generator_batches=32, max_grad_norm=5, optim='sgd', param_init=0.1, pre_word_vecs_dec=None, pre_word_vecs_enc=None, rnn_size=500, save_model='demo_model', start_decay_at=8, start_epoch=1, train_from='', train_from_state_dict='', word_vec_size=500)
Loading data from 'data/demo.train.pt'
 * vocabulary size. source = 24999; target = 35820
 * number of training sentences. 100
 * maximum batch size. 64
Building model...
* number of parameters: 58121320
NMTModel (
  (encoder): Encoder (
    (word_lut): Embedding(24999, 500, padding_idx=0)
    (rnn): LSTM(500, 500, num_layers=2, dropout=0.3)
  )
  (decoder): Decoder (
    (word_lut): Embedding(35820, 500, padding_idx=0)
    (rnn): StackedLSTM (
      (dropout): Dropout (p = 0.3)
      (layers): ModuleList (
        (0): LSTMCell(1000, 500)
        (1): LSTMCell(500, 500)
      )
    )
    (attn): GlobalAttention (
      (linear_in): Linear (500 -> 500)
      (sm): Softmax ()
      (linear_out): Linear (1000 -> 500)
      (tanh): Tanh ()
    )
    (dropout): Dropout (p = 0.3)
  )
  (generator): Sequential (
    (0): Linear (500 -> 35820)
    (1): LogSoftmax ()
  )
)

Train perplexity: 29508.9
Train accuracy: 0.0216306
Validation perplexity: 4.50917e+08
Validation accuracy: 3.57853

Train perplexity: 1.07012e+07
Train accuracy: 0.06198
Validation perplexity: 103639
Validation accuracy: 0.944334

Train perplexity: 458795
Train accuracy: 0.031198
Validation perplexity: 43578.2
Validation accuracy: 3.42942

Train perplexity: 144931
Train accuracy: 0.0432612
Validation perplexity: 78366.8
Validation accuracy: 2.33598
Decaying learning rate to 0.5

Train perplexity: 58696.8
Train accuracy: 0.0278702
Validation perplexity: 14045.8
Validation accuracy: 3.67793
Decaying learning rate to 0.25

Train perplexity: 10045.1
Train accuracy: 0.0457571
Validation perplexity: 26435.6
Validation accuracy: 4.87078
Decaying learning rate to 0.125

Train perplexity: 10301.5
Train accuracy: 0.0490849
Validation perplexity: 24243.5
Validation accuracy: 3.62823
Decaying learning rate to 0.0625

Train perplexity: 7927.77
Train accuracy: 0.062812
Validation perplexity: 7180.49
Validation accuracy: 5.31809
Decaying learning rate to 0.03125

Train perplexity: 4573.5
Train accuracy: 0.047421
Validation perplexity: 6545.51
Validation accuracy: 5.6163
Decaying learning rate to 0.015625

Train perplexity: 3995.7
Train accuracy: 0.0549085
Validation perplexity: 6316.25
Validation accuracy: 5.4175
Decaying learning rate to 0.0078125

Train perplexity: 3715.81
Train accuracy: 0.0540765
Validation perplexity: 6197.91
Validation accuracy: 5.86481
Decaying learning rate to 0.00390625

Train perplexity: 3672.46
Train accuracy: 0.0540765
Validation perplexity: 6144.18
Validation accuracy: 6.01392
Decaying learning rate to 0.00195312

Train perplexity: 3689.7
Train accuracy: 0.0528286
Validation perplexity: 6113.55
Validation accuracy: 6.31213
Decaying learning rate to 0.000976562
Exception ignored in: <function WeakValueDictionary.__init__.<locals>.remove at 0x118b19b70>
Traceback (most recent call last):
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/lib/python3.5/weakref.py", line 117, in remove
TypeError: 'NoneType' object is not callable

Could you tell me how to fix it? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.