GithubHelp home page GithubHelp logo

opennmt / opennmt Goto Github PK

View Code? Open in Web Editor NEW
2.4K 2.4K 471.0 26.38 MB

Open Source Neural Machine Translation in Torch (deprecated)

Home Page: https://opennmt.net/

License: MIT License

Lua 96.34% Shell 0.23% Perl 0.49% Python 1.21% Forth 1.29% Dockerfile 0.43%
deep-learning lua machine-translation neural-machine-translation opennmt torch

opennmt's Introduction

This project is considered obsolete as the Torch framework is no longer maintained. If you are starting a new project, please use an alternative in the OpenNMT family: OpenNMT-tf (TensorFlow) or OpenNMT-py (PyTorch) depending on your requirements.

Build Status codecov

OpenNMT: Open-Source Neural Machine Translation

OpenNMT is a full-featured, open-source (MIT) neural machine translation system utilizing the Torch mathematical toolkit.

The system is designed to be simple to use and easy to extend, while maintaining efficiency and state-of-the-art translation accuracy. Features include:

  • Speed and memory optimizations for high-performance GPU training.
  • Simple general-purpose interface, only requires and source/target data files.
  • C++ implementation of the translator for easy deployment.
  • Extensions to allow other sequence generation tasks such as summarization and image captioning.

Installation

OpenNMT only requires a Torch installation with few dependencies.

  1. Install Torch
  2. Install additional packages:
luarocks install tds
luarocks install bit32 # if using LuaJIT

For other installation methods including Docker, visit the documentation.

Quickstart

OpenNMT consists of three commands:

  1. Preprocess the data.
th preprocess.lua -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo
  1. Train the model.
th train.lua -data data/demo-train.t7 -save_model model
  1. Translate sentences.
th translate.lua -model model_final.t7 -src data/src-test.txt -output pred.txt

For more details, visit the documentation.

Citation

A technical report on OpenNMT is available. If you use the system for academic work, please cite:

@ARTICLE{2017opennmt,
  author = {{Klein}, G. and {Kim}, Y. and {Deng}, Y. and {Senellart}, J. and {Rush}, A.~M.},
  title = "{OpenNMT: Open-Source Toolkit for Neural Machine Translation}",
  journal = {ArXiv e-prints},
  eprint = {1701.02810}
}

Acknowledgments

Our implementation utilizes code from the following:

Additional resources

opennmt's People

Contributors

akuckartz avatar anoidgit avatar arturgontijo avatar asaluja avatar aurelien-coquard avatar aureliensystran avatar bartvanhalder avatar bwang-systran avatar da03 avatar dblandan avatar dycsystran avatar guillaumekln avatar haplology avatar hrishikeshvganu avatar jmcrego avatar jroakes avatar jsenellart avatar jungikim avatar kpu avatar m4t1ss avatar mousai avatar natsegal avatar panosk avatar pltrdy avatar ruanchong avatar sebastiangehrmann avatar shahbazsyed avatar srush avatar vince62s avatar yoonkim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opennmt's Issues

How to use pretrained word vectors

Hi there! I'm looking for some guidance on how to use pretrained word vectors, either Google word2vec or GloVe. Any examples of how to convert these from download to a format that can be passed using -pre_word_vecs_enc or -pre_word_vecs_dec would be very helpful.

what is the meaning of 'max_sent_length' in translate option?

Form the options:
[Maximum sentence length. If any sequences in srcfile are longer than this then it will error out]

In my understanding, when the length of the source sentence is longer than this value, the translated output is error.

But when I went into the code, i found it is more about the longest translated sentence which can be generated. Is there anything wrong in my understanding? what is the actual meaning of the translate option: max_sent_length?

Error when converting gpu model to cpu modle

I'm trying to convert pretrained model on GPU to CPU: th ./tools/release_model.lua -model pretrained/onmt_baseline_wmt15-all.de-en_epoch13_8.98.t7 -output_model pretrained/onmt-wmt15-cpu.t7

I got the following error:

stack traceback:
[C]: in function 'error'
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:343: in function 'readObject'
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/Users/laam/torch/install/share/lua/5.1/nn/Module.lua:184: in function 'read'
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:353: in function 'readObject'
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:368: in function 'readObject'
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
...
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/Users/laam/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
./tools/release_model.lua:35: in function 'main'
./tools/release_model.lua:52: in main chunk
[C]: in function 'dofile'
...laam/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0104997d10

How to fix that?

Question related to the Arabic Dataset

In the demo, the Arabic-English translation is very accurate. It can also translate different dialects. I was wondering how can I get the dataset (parallel corpora) you used to train it. Is it open for the public? Or can I purchase it?

Last minibatch ignored

Looks like onmt/data/Dataset.lua skips the last minibatch when creating offsets. Here's a PR to fix it: #66

preprocess.lua

this command line throw an error.

th preprocess.lua -src_vocab_size 50000 -tgt_vocab_size 50000
-train_src data/europarl-v7.fr-en.$sl.tok
-train_tgt data/europarl-v7.fr-en.$tl.tok
-valid_src data/generic_valid.$sl.tok
-valid_tgt data/generic_valid.$tl.tok -save_data exp/model-$sl$tl

I checked the 4 tok files were tokenized the same way -case_feature and -sep_annotate

any clue ?

torch/install/bin/luajit: preprocess.lua:66: all sentences must have the same numbers of additional features
stack traceback:
[C]: in function 'assert'
preprocess.lua:66: in function 'makeVocabulary'
preprocess.lua:124: in function 'initVocabulary'
preprocess.lua:276: in function 'main'
preprocess.lua:311: in main chunk
[C]: in function 'dofile'
...oses/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

weird output

Is this some kind of a flaw because of the input ending with a "," ?
if so, the output is cut on what rule ? max sentence length ?

thanks.

SENT 635: As an analyst anxious about a familiar strangeness , he wanders in a forest of symbols , stops after each step to examine an object in the form of a puzzle , tormented by irritating questions ,
PRED 635: soucieux d ' un familier , il s ' est dans une forêt de symboles , s ' arrête après chaque étape d ' un puzzle , sous la forme d ' un puzzle , tourmenté par des questions , par des questions , par des questions , par des questions , par des questions , par des questions , par des questions , par des questions , par des questions , par des questions , par des questions , par des questions , par des questions , par des questions , par des questions , par des questions , par des questions , par des questions .

Teacher forcing

I wanted to implement teacher forcing as in Pascanau et. al. JMLR so that we feed in the predicted previous label $y_hat_{t}$ instead of the ground-truth label $y_{t}$ . Any plans for adding this feature? I can work on the code if you provide pointers.

Note: I don't know lua but am willing to try if I get some pointers

Why the program throw not enough memory when writting model?

I think the model has been fully built in memory already. But why it throw this error in the saving phase? How can I solve this problem?

ps. I have enough disk space.

th preprocess.lua -train_src mydata/src-train.txt -train_tgt mydata/tgt-train.txt -valid_src mydata/src-val.txt -valid_tgt mydata/tgt-val.txt -save_data mydata/demo200w -seq_length 100 -src_vocab_size 300000 -tgt_vocab_size 100000
Building source vocabulary...
Created dictionary of size 300004 (pruned from 616963)

Building target vocabulary...
Created dictionary of size 100004 (pruned from 208106)

Preparing training data...
... 100000 sentences prepared
... 200000 sentences prepared
... 300000 sentences prepared
... 400000 sentences prepared
... 500000 sentences prepared
... 600000 sentences prepared
... 700000 sentences prepared
... 800000 sentences prepared
... 900000 sentences prepared
... 1000000 sentences prepared
... 1100000 sentences prepared
... 1200000 sentences prepared
... 1300000 sentences prepared
... 1400000 sentences prepared
... 2200000 sentences prepared
... 2300000 sentences prepared
... 2400000 sentences prepared
... shuffling sentences
... sorting sentences by size
Prepared 2436170 sentences (8332 ignored due to length == 0 or > 100)

Preparing validation data...
... shuffling sentences
... sorting sentences by size
Prepared 9968 sentences (32 ignored due to length == 0 or > 100)

Saving source vocabulary to 'mydata/demo200w.src.dict'...
Saving target vocabulary to 'mydata/demo200w.tgt.dict'...
Saving data to 'mydata/demo200w-train.t7'...
/home/Programs/bin/luajit: not enough memory

Can CNN buffers be shared ?

I'm interested in if CNN is included in sequencer module, could its gradInput and output buffers be shared along clones ? (assume memory pre-allocation is enabled.)
For instance, nn.SpatialConvolution, nn.SpatialMaxPooling and their cudnn alternatives.

The model seems dropped some training data?

I make the training data with a file that contains 1000,000,training sentences. However, I found the log shows that there are only 117706, as:

 * vocabulary size: source = 50004; target = 50004
 * additional features: source = 0; target = 0
 * maximum sequence length: source = 50; target = 49
 * number of training sentences: 117706
 * maximum batch size: 64

I want to know whether the preprocessing drop a lot of data?

Error loading pretrained model

When loading the pre-trained model I get this error:

th translate.lua -model pretrained_models/onmt_baseline_wmt15-all.de-en_epoch13_8.98.t7 -src data/test.de                                                   
Loading 'pretrained_models/onmt_baseline_wmt15-all.de-en_epoch13_8.98.t7'...    
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/torch/File.lua:343: unknown Torch class <torch.CudaTensor>
stack traceback:
    [C]: in function 'error'
    /root/torch/install/share/lua/5.1/torch/File.lua:343: in function 'readObject'
    /root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /root/torch/install/share/lua/5.1/nn/Module.lua:184: in function 'read'
    /root/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
    /root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /root/torch/install/share/lua/5.1/torch/File.lua:353: in function 'readObject'
    /root/torch/install/share/lua/5.1/torch/File.lua:368: in function 'readObject'
    /root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    ...
    /root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /root/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
    /root/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
    ./onmt/translate/Translator.lua:29: in function '__init'
    /root/torch/install/share/lua/5.1/torch/init.lua:91: in function 'new'
    translate.lua:65: in function 'main'
    translate.lua:193: in main chunk
    [C]: in function 'dofile'
    /root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

There is a simple way of remedying this problem (a dirty hack): by loading cunn and cudnn in onmt/translate/Translator.lua.

error in releasing model

th tools/release_model.lua -model model/demo_epoch4_1503.07.t7 -output_model model/cdemo_epoch4_1503.07.t7 -gpuid 1

Loading model 'model/demo_epoch4_1503.07.t7'...
... done.
Converting model...
/root/torch/install/bin/luajit: tools/release_model.lua:49: attempt to call method 'float' (a nil value)
stack traceback:
tools/release_model.lua:49: in function 'main'
tools/release_model.lua:60: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

Some errors when using multi-GPU

I try to train by using multi-GPU. When I test it, there are some problems:

  1. When options -nparallel 2 and -gpuid 14 are set, the first GPU and the second GPU are used.(I have 16 GPUs.) So I can't set the specific GPU when using multi-GPU. The first and the second GPU are running other programs. And here is error:

Loading data from './data/test-train.t7'...

  • vocabulary size: source = 10002; target = 10004
  • additional features: source = 0; target = 0
  • maximum sequence length: source = 50; target = 51
  • number of training sentences: 9453
  • maximum batch size: 16
    Building model...
  • using input feeding
    Initializing parameters...
  • number of parameters: 28777004
    Preparing memory optimization...
  • sharing 70% of output/gradInput tensors memory between clones
    Start training...

/home/beichao/torch/install/bin/lua: ./onmt/train/Optim.lua:83: Assertion `THCTensor_(checkGPU)(state, 1, self)' failed. at /home/beichao/torch/extra/cutorch/lib/THC/generated/../generic/THCTensorMathReduce.cu:180
stack traceback:
[C]: in function 'norm'
./onmt/train/Optim.lua:83: in function 'prepareGrad'
train.lua:294: in function 'trainEpoch'
train.lua:411: in function 'trainModel'
train.lua:548: in function 'main'
train.lua:551: in main chunk
[C]: in function 'dofile'
...chao/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: in ?

  1. When options -nparallel 2, -gpuid 14 and -async_parallel are set, the first GPU and the second GPU are still used. It can run first epoch but when saving the model, there is an error:

Loading data from './data/test-train.t7'...

  • vocabulary size: source = 10002; target = 10004
  • additional features: source = 0; target = 0
  • maximum sequence length: source = 50; target = 51
  • number of training sentences: 9453
  • maximum batch size: 16
    Building model...
  • using input feeding
    Initializing parameters...
  • number of parameters: 28777004
    Preparing memory optimization...
  • sharing 70% of output/gradInput tensors memory between clones
    Start training...

Epoch 1 ; ... batch 50/610
Epoch 1 ; ... batch 100/610
Epoch 1 ; ... batch 150/610
Epoch 1 ; ... batch 200/610
Epoch 1 ; ... batch 250/610
Epoch 1 ; ... batch 300/610
Epoch 1 ; ... batch 350/610
Epoch 1 ; ... batch 400/610
Epoch 1 ; ... batch 450/610
Epoch 1 ; ... batch 500/610
Epoch 1 ; ... batch 550/610
Epoch 1 ; ... batch 600/610
Epoch 1 ; Iteration 612/610 ; Learning rate 1.0000 ; Source tokens/s 620 ; Perplexity 10669.35
Saving checkpoint to 'en2zh_test_model_checkpoint.t7'...
/home/beichao/torch/install/bin/lua: ./onmt/utils/Tensor.lua:81: Assertion `THCTensor_(checkGPU)(state, 1, self_)' failed. at /home/beichao/torch/extra/cutorch/lib/THC/generic/THCTensorMath.cu:21
stack traceback:
[C]: in function 'zero'
./onmt/utils/Tensor.lua:81: in function 'reuseTensor'
./onmt/utils/Tensor.lua:120: in function 'initTensorTable'
./onmt/modules/BiEncoder.lua:118: in function 'forward'
train.lua:166: in function 'eval'
train.lua:413: in function 'trainModel'
train.lua:548: in function 'main'
train.lua:551: in main chunk
[C]: in function 'dofile'
...chao/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: in ?

In a word, I meet the same problems when I train by synchronous parallelism and asynchronous parallelism, can't specify the GPUs and Assertion `THCTensor_(checkGPU)(state, 1, self_)' failed. Anyone can help me? Thanks anyway.

Parallel decoding on Multi Core CPU server

Hi,

Decoding on CPU server is very slow (approx 4 days for 15K sentences). Is it possible to add parallel decoding with openNMT? It will help if the test set is a bit large.

memory consumption after loading checkpoint

When the training is launched at the first time, everything works fine.
But after an interruption and reloading, it was stopped by an error "out of memory". (in GPU)
It seems like reloading checkpoint consumes more memory than initializing one.
Could anyone give an advice about how to avoid this ?
Many thanks.

Indicate progress while training

After "Start training..." it usually takes a long time until there is the next output. Please add some indication that there is progress.

Have model.lua implement fwd/bwd interface

Feature request from Yuntian.

He is constructing a different model for im captioning, but would like to use the same training code. He suggested this would be possible if model.lua somehow implemented the fwd/bwd interface.

The training process stops right after epoch 13 and can't be continued.

Here is the output of continue cmd:

Loading checkpoint 'model_epoch13_214.48.t7'...	
 * vocabulary size: source = 50004; target = 50004	
 * additional features: source = 0; target = 0	
 * maximum sequence length: source = 50; target = 51	
 * number of training sentences: 145525	
 * maximum batch size: 64	
Building model...	
Initializing parameters...	
Resuming training from epoch 14 at iteration 1...	
Loading data from 'data/demo-train.t7'...	
 * number of parameters: 84814004	
Preparing memory optimization...	
 * sharing 69% of output/gradInput tensors memory between clones	
Start training...	
 * vocabulary size: source = 50004; target = 50004	
 * additional features: source = 0; target = 0	
 * maximum sequence length: source = 50; target = 51	
 * number of training sentences: 145525	
 * maximum batch size: 64	
Building model...	
Initializing parameters...	
 * number of parameters: 84814004	
Preparing memory optimization...	
 * sharing 69% of output/gradInput tensors memory between clones	
Start training...	
[1]+  Done                    th train.lua -data data/demo-train.t7 -save_model model -train_from model_epoch13_214.48.t7 -save_every 20 -gpuid 4 -continue```

Simpler cloning?

This may be controversial, but can't we just do the per-timestep cloning in Sequencer with
network:clone('weight', 'gradWeight', 'bias', 'gradBias')

In my experience this leads to much faster creation of clones (and gives the same results). Here's a PR that does this: #67

Models link is broken

The models link at the end of the readme is broken and I couldn't find a working one.

All words must have the same number of Features

I am trying to perform preprocessing with different data than the demo and ended up getting following errors

/usr/bin/luajit: ./onmt/utils/Features.lua:17: all words must have the same number of features
stack traceback:
[C]: in function 'assert'
./onmt/utils/Features.lua:17: in function 'extract'
preprocess.lua:56: in function 'makeVocabulary'
preprocess.lua:122: in function 'initVocabulary'
preprocess.lua:266: in function 'main'
preprocess.lua:301: in main chunk
[C]: in function 'dofile'
/usr/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

Also I am wondering is there will be any preprocessor-shards.py here as data size grows few GB fitting them into memory will not be possible if we use preprocessing.py?

HOWTO on using -phrase_table option needed

Would be great to have a comment on the format of the resource file for -phrase_table option , as promised in the help message of translate.lua. I have tried the following but it does not seem to work when translating from English to German using the model from [http://opennmt.net//Models/]

probation period|||Probezeit
employment duty|||Tätigkeit

Additional comments on letter case in the phrase table and some details of the substitution algorithms would be appreciated as well.

-continue causes "invalid learningRate" error

$ onmt_train -data preprocess-train.t7 -save_model model -gpuid 1 -continue -train_from model_epoch2.t7
/home/ubuntu/torch/install/bin/luajit: .../ubuntu/torch/install/share/lua/5.1/onmt/train/Optim.lua:103: attempt to perform arithmetic on field 'learningRate' (a nil value)
stack traceback:
        .../ubuntu/torch/install/share/lua/5.1/onmt/train/Optim.lua:103: in function 'prepareGrad'
        .../install/lib/luarocks/rocks/opennmt/scm-1/bin/onmt_train:262: in function 'trainEpoch'
        .../install/lib/luarocks/rocks/opennmt/scm-1/bin/onmt_train:293: in function 'trainModel'
        .../install/lib/luarocks/rocks/opennmt/scm-1/bin/onmt_train:430: in function 'main' 

Same error occurs if manually passing -continue -learning_rate X

Add the validation perplexity

The traning print log dose not contain validation perplexity, add it may be better.

Epoch 1 ; Iteration 10/183 ; Learning rate 1.0000 ; Source tokens/s 196 ; Perplexity 157766.75
Epoch 1 ; Iteration 20/183 ; Learning rate 1.0000 ; Source tokens/s 318 ; Perplexity 138999.74
Epoch 1 ; Iteration 30/183 ; Learning rate 1.0000 ; Source tokens/s 415 ; Perplexity 174117.53
Epoch 1 ; Iteration 40/183 ; Learning rate 1.0000 ; Source tokens/s 491 ; Perplexity 619423.11

start decay after resume without "-continue"

Here is the scenario.
first training:
5 epochs, lr 1.0, start decay 5 decay_rate 0.5

Then resume training with -"continue" works fine.
start_epoch 6 end_epoch 9 ==> lr of epoch 6 = 0.5

if I don't use "-continue"
lr of epoch 6 will start at 1 and not decay right away eventhough I am passing options.
I could always use -continue, but I guess one might want to change the decay rate after a resume.

clear ?

typo in guide

  1. Translate sentences.

th evaluate.lua -model model_final.t7 -src data/src-val.txt [-gpuid 1]

===> th translate.lua

Strange learning rate decay strategy

To my understanding, the learning rate is decayed once when (i) perplexity does not decrease on the validation set or decayed continuous when (ii) epoch has gone past the start_decay_at_limit. But what I see in logs, the learning rate is decayed continuous when perplexity does not decrease on the validation set. I think it is not a good decay strategy.

Epoch 3 ; Iteration 160/183 ; Learning rate 1.0000 ; Source tokens/s 1202 ; Perplexity 2945.51
Epoch 3 ; Iteration 170/183 ; Learning rate 1.0000 ; Source tokens/s 1208 ; Perplexity 2849.41
Epoch 3 ; Iteration 180/183 ; Learning rate 1.0000 ; Source tokens/s 1206 ; Perplexity 2788.46
Validation perplexity: 1716.2622535092
Saving checkpoint to 'model/demo_epoch3_1716.26.t7'...

Epoch 4 ; Iteration 10/183 ; Learning rate 1.0000 ; Source tokens/s 1354 ; Perplexity 1575.35
Epoch 4 ; Iteration 20/183 ; Learning rate 1.0000 ; Source tokens/s 1283 ; Perplexity 1495.37
Epoch 4 ; Iteration 30/183 ; Learning rate 1.0000 ; Source tokens/s 1282 ; Perplexity 1535.33
Epoch 4 ; Iteration 40/183 ; Learning rate 1.0000 ; Source tokens/s 1254 ; Perplexity 1510.44
Epoch 4 ; Iteration 50/183 ; Learning rate 1.0000 ; Source tokens/s 1242 ; Perplexity 1522.32
Epoch 4 ; Iteration 60/183 ; Learning rate 1.0000 ; Source tokens/s 1233 ; Perplexity 1498.31
Epoch 4 ; Iteration 70/183 ; Learning rate 1.0000 ; Source tokens/s 1260 ; Perplexity 1490.61
Epoch 4 ; Iteration 80/183 ; Learning rate 1.0000 ; Source tokens/s 1258 ; Perplexity 1450.23
Epoch 4 ; Iteration 90/183 ; Learning rate 1.0000 ; Source tokens/s 1229 ; Perplexity 1422.15
Epoch 4 ; Iteration 100/183 ; Learning rate 1.0000 ; Source tokens/s 1207 ; Perplexity 1413.93
Epoch 4 ; Iteration 110/183 ; Learning rate 1.0000 ; Source tokens/s 1224 ; Perplexity 1460.59
Epoch 4 ; Iteration 120/183 ; Learning rate 1.0000 ; Source tokens/s 1229 ; Perplexity 1442.72
Epoch 4 ; Iteration 130/183 ; Learning rate 1.0000 ; Source tokens/s 1216 ; Perplexity 1409.24
Epoch 4 ; Iteration 140/183 ; Learning rate 1.0000 ; Source tokens/s 1198 ; Perplexity 1379.64
Epoch 4 ; Iteration 150/183 ; Learning rate 1.0000 ; Source tokens/s 1194 ; Perplexity 1346.32
Epoch 4 ; Iteration 160/183 ; Learning rate 1.0000 ; Source tokens/s 1195 ; Perplexity 1319.50
Epoch 4 ; Iteration 170/183 ; Learning rate 1.0000 ; Source tokens/s 1192 ; Perplexity 1306.23
Epoch 4 ; Iteration 180/183 ; Learning rate 1.0000 ; Source tokens/s 1197 ; Perplexity 1293.47
Validation perplexity: 1804.201667505
Saving checkpoint to 'model/demo_epoch4_1804.20.t7'...

Epoch 5 ; Iteration 10/183 ; Learning rate 0.5000 ; Source tokens/s 1298 ; Perplexity 799.44
Epoch 5 ; Iteration 20/183 ; Learning rate 0.5000 ; Source tokens/s 1272 ; Perplexity 740.41
Epoch 5 ; Iteration 30/183 ; Learning rate 0.5000 ; Source tokens/s 1254 ; Perplexity 728.96
Epoch 5 ; Iteration 40/183 ; Learning rate 0.5000 ; Source tokens/s 1253 ; Perplexity 718.26
Epoch 5 ; Iteration 50/183 ; Learning rate 0.5000 ; Source tokens/s 1232 ; Perplexity 708.09
Epoch 5 ; Iteration 60/183 ; Learning rate 0.5000 ; Source tokens/s 1215 ; Perplexity 694.47
Epoch 5 ; Iteration 70/183 ; Learning rate 0.5000 ; Source tokens/s 1218 ; Perplexity 685.28
Epoch 5 ; Iteration 80/183 ; Learning rate 0.5000 ; Source tokens/s 1231 ; Perplexity 683.29
Epoch 5 ; Iteration 90/183 ; Learning rate 0.5000 ; Source tokens/s 1217 ; Perplexity 675.64
Epoch 5 ; Iteration 100/183 ; Learning rate 0.5000 ; Source tokens/s 1234 ; Perplexity 673.04
Epoch 5 ; Iteration 110/183 ; Learning rate 0.5000 ; Source tokens/s 1222 ; Perplexity 666.83
Epoch 5 ; Iteration 120/183 ; Learning rate 0.5000 ; Source tokens/s 1238 ; Perplexity 661.42
Epoch 5 ; Iteration 130/183 ; Learning rate 0.5000 ; Source tokens/s 1228 ; Perplexity 656.02
Epoch 5 ; Iteration 140/183 ; Learning rate 0.5000 ; Source tokens/s 1223 ; Perplexity 657.60
Epoch 5 ; Iteration 150/183 ; Learning rate 0.5000 ; Source tokens/s 1206 ; Perplexity 648.93
Epoch 5 ; Iteration 160/183 ; Learning rate 0.5000 ; Source tokens/s 1201 ; Perplexity 645.08
Epoch 5 ; Iteration 170/183 ; Learning rate 0.5000 ; Source tokens/s 1195 ; Perplexity 640.99
Epoch 5 ; Iteration 180/183 ; Learning rate 0.5000 ; Source tokens/s 1205 ; Perplexity 640.26
Validation perplexity: 963.50181209573
Saving checkpoint to 'model/demo_epoch5_963.50.t7'...

Epoch 6 ; Iteration 10/183 ; Learning rate 0.2500 ; Source tokens/s 1244 ; Perplexity 532.09
Epoch 6 ; Iteration 20/183 ; Learning rate 0.2500 ; Source tokens/s 1222 ; Perplexity 501.11
Epoch 6 ; Iteration 30/183 ; Learning rate 0.2500 ; Source tokens/s 1203 ; Perplexity 488.91
Epoch 6 ; Iteration 40/183 ; Learning rate 0.2500 ; Source tokens/s 1176 ; Perplexity 479.09

Log error occured in training start

/root/torch/install/bin/luajit: ./onmt/utils/Log.lua:7: attempt to call method 'write' (a nil value)
stack traceback:
./onmt/utils/Log.lua:7: in function 'logJsonRecursive'
./onmt/utils/Log.lua:27: in function 'logJson'
train.lua:516: in function 'main'
train.lua:554: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

docker image lack some module

hi, some errors occur when i using the pulled opennmt docker image. Since i am not used to lua, this error is hard for me to resolve. Can anybody give me a help...

root@5b63449e0fe8:/home/ww110750/OpenNMT# th preprocess.lua -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/trepl/init.lua:389: /root/torch/install/share/lua/5.1/trepl/init.lua:389: /root/torch/install/share/lua/5.1/trepl/init.lua:389: /root/torch/install/share/lua/5.1/trepl/init.lua:389: module 'tds' not found:No LuaRocks module found for tds
no field package.preload['tds']
no file '/root/.luarocks/share/lua/5.1/tds.lua'
no file '/root/.luarocks/share/lua/5.1/tds/init.lua'
no file '/root/torch/install/share/lua/5.1/tds.lua'
no file '/root/torch/install/share/lua/5.1/tds/init.lua'
no file './tds.lua'
no file '/root/torch/install/share/luajit-2.1.0-beta1/tds.lua'
no file '/usr/local/share/lua/5.1/tds.lua'
no file '/usr/local/share/lua/5.1/tds/init.lua'
no file '/root/.luarocks/lib/lua/5.1/tds.so'
no file '/root/torch/install/lib/lua/5.1/tds.so'
no file '/root/torch/install/lib/tds.so'
no file './tds.so'
no file '/usr/local/lib/lua/5.1/tds.so'
no file '/usr/local/lib/lua/5.1/loadall.so'
stack traceback:
[C]: in function 'error'
/root/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
preprocess.lua:1: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

PANIC: unprotected error in call to Lua API (not enough memory)

I am trying to perform preprocessing of some data which has 15306038 sentences in training (in source and target) and I am getting not enough memory. My instance has 160GB memory and I can see that less than 10% of memory are using when I am getting this error. I am using Lua 5.1

Any help will be greatly appreciated

Thanks

Multiple same sentences in beam search n_best output

I trained a homology MT model using opennmt. The translated results contain multiple same outputs when i set n_best to 3 in translate.lua. As follows (chinese characters), you could see the print_log of BEST HYP, same hypothesis but different pred scores. Are there some parameters i miss in the decoding stage?

SENT 5: 毛 细 血管 血栓 吃 什么 药
PRED 5: 毛 细 血管 血栓 吃 什么 药 好
PRED SCORE: -1.8767

BEST HYP:
[-1.8767] 毛 细 血管 血栓 吃 什么 药 好
[-3.4304] 毛 细 血管 血栓 吃 什么 药
[-3.6762] 毛 细 血管 血栓 吃 什么 药

SENT 6: 发电 机 纵轴 和 横轴
PRED 6: 发电 机
PRED SCORE: -2.6976

BEST HYP:
[-2.6976] 发电 机
[-2.8108] 发电 机
[-3.7468] 发电 机

Cannot use `translate.lua` without `-gpuid`

Hi,

translate.lua without -gpuid

Following the guide on github(some differences has been noticed and reported between doc on github and opennmt.net), i ran:

th translate.lua <some_path>.t7 -src <data_path>.txt -output pred.txt
which gives:

Loading '<path>.t7'...                             
/home/<user>/torch/install/bin/luajit: /home/<user>/torch/install/share/lua/5.1/torch/File.lua:343: unknown Torch class <torch.CudaTensor>
stack traceback:
  [C]: in function 'error'
  /home/<user>/torch/install/share/lua/5.1/torch/File.lua:343: in function 'readObject'
  /home/<user>/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
  /home/<user>/torch/install/share/lua/5.1/nn/Module.lua:184: in function 'read'
  /home/<user>/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
  /home/<user>/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
  /home/<user>/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
  /home/<user>/torch/install/share/lua/5.1/torch/File.lua:353: in function 'readObject'
  /home/<user>/torch/install/share/lua/5.1/torch/File.lua:368: in function 'readObject'
  /home/<user>/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
  /home/<user>/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
  ...
  /home/<user>/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
  /home/<user>/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
  /home/<user>/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
  ./onmt/translate/Translator.lua:29: in function '__init'
  /home/<user>/torch/install/share/lua/5.1/torch/init.lua:91: in function 'new'
  translate.lua:65: in function 'main'
  translate.lua:193: in main chunk
  [C]: in function 'dofile'
  ...<user>/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
  [C]: at 0x00406670

translate.lua with -gpuid 1

th translate.lua <some_path_>.t7 -src <data_path>.txt -output pred.txt -gpuid 1
It runs! :)

For sure I want to run with GPU, still, I'm not sure if this is related to my setup, if its an actual issue or an undocumented requirement.

Hope it helps
pltrdy

How to process Large Train Data out of memory?

Hi, I have a model to train based on a huge train_dataset, which contains 200 million pairs of sentence. The preprocess.lua converts all train data into a single data file which will be loaded by train.lua. But how to load the subset train data iteratively by train.lua ? Since the machine memory will run out if load data all in once.
Thanks in advance.

Typos in http://opennmt.net//Guide/

Following the guide i tried to preprocess the data using:
th preprocess.lua -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -output data/demo

And got the following error:

invalid argument: -output	
Usage: [options] 

preprocess.lua

**Preprocess Options**


  -config                 Read options from this file []
  -train_src              Path to the training source data []
  -train_tgt              Path to the training target data []
  -valid_src              Path to the validation source data []
  -valid_tgt              Path to the validation target data []
  -save_data              Output file for the prepared data []
  -src_vocab_size         Size of the source vocabulary [50000]
  -tgt_vocab_size         Size of the target vocabulary [50000]
  -src_vocab              Path to an existing source vocabulary []
  -tgt_vocab              Path to an existing target vocabulary []
  -features_vocabs_prefix Path prefix to existing features vocabularies []
  -src_seq_length         Maximum source sequence length [50]
  -tgt_seq_length         Maximum target sequence length [50]
  -shuffle                Shuffle data [1]
  -seed                   Random seed [3435]
  -report_every           Report status every this many sentences [100000]

pltrdy

Question about Sequencer.lua

I've implemented recursive net, and initialize sequencer with that. (also memory optimizer)
Source code is

require('nngraph')
local RVNN, parent = torch.class('onmt.RVNN', 'nn.Container')

function RVNN:__init (outSize, relDim, numRel, dropout)
  parent.__init(self)
  self.outSize = outSize
  self.relDim = relDim
  self.numRel = numRel
  self.dropout = dropout
  self.net = self:_buildModel()
  self:add(self.net)
end

function RVNN:_buildModel ()
  local model = nn.Linear(self.outSize*2+self.relDim, self.outSize, true)
  local emb = nn.LookupTable(self.numRel, self.relDim)
  local inputs = {nn.Identity()(), nn.Identity()(), nn.Identity()()}
  local rel = emb(inputs[3])
  local proj = nn.JoinTable(2)({inputs[1], inputs[2], rel})
  if self.dropout > 0 then
    proj = onmt.BayesianDropout(self.dropout, 'recursive')(proj)
  end
  local out = nn.Tanh()(model(proj))
  return nn.gModule(inputs, {out})
end

function RVNN:updateOutput(input)
  self.output = self.net:updateOutput(input)
  return self.output
end

function RVNN:updateGradInput(input, gradOutput)
  return self.net:updateGradInput(input, gradOutput)
end

function RVNN:accGradParameters(input, gradOutput, scale)
  return self.net:accGradParameters(input, gradOutput, scale)
end

But I found it returns gradient with zero dimension.
I have to change the updateGradInput function to

function RVNN:updateGradInput(input, gradOutput)
  self.gradInput = self.net:updateGradInput(input, gradOutput)
  return self.gradInput
end

which is not necessary in LSTM.lua.
I can't find any difference between sequencer with LSTM and sequencer with my recursive nets.
I wondering in current sequencer implementation, how self.gradInput is redirected to self.net.gradInput ?
Thanks.

Re Teacher Forcing

This isn't necessarily an "issue" - but I was seeking clarity on whether this parameter enables teacher forcing:
"inputFeed - bool, enable input feeding"

Stuck on "preparing memory optimization"

Preparing memory optimization...	
 * sharing 63% of output/gradInput tensors memory between clones

MacOS v.10.12.1 (Sierra)
MacBook Pro (Mid 2010)
Processor: 2.4 GHz Intel Core 2 Duo
Memory: 4 GB 1067 MHz DDR3
Graphics: NVIDIA GeForce 320M 256 MB

No changes made to any of the training data.

How do I work around this? Is it a hardware config problem or a bug?

When i use multi-GPU, it seems will be error

The command i run is
th train.lua -data en-fr/en2fr-train.t7 -save_model model -gpuid 1,3,4

and it will prompt the error :invalid type for option -gpuid (should be number)

How to solve it ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.