GithubHelp home page GithubHelp logo

edinburghnlp / nematus Goto Github PK

View Code? Open in Web Editor NEW
798.0 79.0 271.0 2.82 MB

Open-Source Neural Machine Translation in Tensorflow

License: BSD 3-Clause "New" or "Revised" License

Python 83.99% Shell 1.29% Perl 5.42% Smalltalk 0.35% Emacs Lisp 3.15% JavaScript 3.15% NewLisp 0.29% Ruby 0.30% Slash 0.07% SystemVerilog 0.03% Hack 1.96%
neural-machine-translation sequence-to-sequence machine-translation nmt mt

nematus's Introduction

NEMATUS

Attention-based encoder-decoder model for neural machine translation built in Tensorflow.

Notable features include:

SUPPORT

For general support requests, there is a Google Groups mailing list at https://groups.google.com/d/forum/nematus-support . You can also send an e-mail to [email protected] .

INSTALLATION

Nematus requires the following packages:

  • Python 3 (tested on version 3.5.2)
  • TensorFlow 1.15 / 2.X (tested on version 2.0)

To install tensorflow, we recommend following the steps at: ( https://www.tensorflow.org/install/ )

the following packages are optional, but highly recommended

  • CUDA >= 7 (only GPU training is sufficiently fast)
  • cuDNN >= 4 (speeds up training substantially)

LEGACY THEANO VERSION

Nematus originated as a fork of dl4mt-tutorial by Kyunghyun Cho et al. ( https://github.com/nyu-dl/dl4mt-tutorial ), and was implemented in Theano. See https://github.com/EdinburghNLP/nematus/tree/theano for this Theano-based version of Nematus.

To use models trained with Theano with the current Tensorflow codebase, use the script nematus/theano_tf_convert.py.

DOCKER USAGE

You can also create docker image by running following command, where you change suffix to either cpu or gpu:

docker build -t nematus-docker -f Dockerfile.suffix .

To run a CPU docker instance with the current working directory shared with the Docker container, execute:

docker run -v `pwd`:/playground -it nematus-docker

For GPU you need to have nvidia-docker installed and run:

nvidia-docker run -v `pwd`:/playground -it nematus-docker

TRAINING SPEED

Training speed depends heavily on having appropriate hardware (ideally a recent NVIDIA GPU), and having installed the appropriate software packages.

To test your setup, we provide some speed benchmarks with `test/test_train.sh', on an Intel Xeon CPU E5-2620 v4, with a Nvidia GeForce GTX Titan X (Pascal) and CUDA 9.0:

GPU, CuDNN 5.1, tensorflow 1.0.1:

CUDA_VISIBLE_DEVICES=0 ./test_train.sh

225.25 sentenses/s

USAGE INSTRUCTIONS

All of the scripts below can be run with --help flag to get usage information.

Sample commands with toy examples are available in the test directory; for training a full-scale RNN system, consider the training scripts at http://data.statmt.org/wmt17_systems/training/

An updated version of these scripts that uses the Transformer model can be found at https://github.com/EdinburghNLP/wmt17-transformer-scripts

nematus/train.py : use to train a new model

data sets; model loading and saving

parameter description
--source_dataset PATH parallel training corpus (source)
--target_dataset PATH parallel training corpus (target)
--dictionaries PATH [PATH ...] network vocabularies (one per source factor, plus target vocabulary)
--save_freq INT save frequency (default: 30000)
--model PATH model file name (default: model)
--reload PATH load existing model from this path. Set to "latest_checkpoint" to reload the latest checkpoint in the same directory of --model
--no_reload_training_progress don't reload training progress (only used if --reload is enabled)
--summary_dir PATH directory for saving summaries (default: same directory as the --model file)
--summary_freq INT Save summaries after INT updates, if 0 do not save summaries (default: 0)

network parameters (all model types)

parameter description
--model_type {rnn,transformer} model type (default: rnn)
--embedding_size INT embedding layer size (default: 512)
--state_size INT hidden state size (default: 1000)
--source_vocab_sizes INT [INT ...] source vocabulary sizes (one per input factor) (default: None)
--target_vocab_size INT target vocabulary size (default: -1)
--factors INT number of input factors (default: 1) - CURRENTLY ONLY WORKS FOR 'rnn' MODEL
--dim_per_factor INT [INT ...] list of word vector dimensionalities (one per factor): '--dim_per_factor 250 200 50' for total dimensionality of 500 (default: None)
--tie_encoder_decoder_embeddings tie the input embeddings of the encoder and the decoder (first factor only). Source and target vocabulary size must be the same
--tie_decoder_embeddings tie the input embeddings of the decoder with the softmax output embeddings
--output_hidden_activation {tanh,relu,prelu,linear} activation function in hidden layer of the output network (default: tanh) - CURRENTLY ONLY WORKS FOR 'rnn' MODEL
--softmax_mixture_size INT number of softmax components to use (default: 1) - CURRENTLY ONLY WORKS FOR 'rnn' MODEL

network parameters (rnn-specific)

parameter description
--rnn_enc_depth INT number of encoder layers (default: 1)
--rnn_enc_transition_depth INT number of GRU transition operations applied in the encoder. Minimum is 1. (Only applies to gru). (default: 1)
--rnn_dec_depth INT number of decoder layers (default: 1)
--rnn_dec_base_transition_depth INT number of GRU transition operations applied in the first layer of the decoder. Minimum is 2. (Only applies to gru_cond). (default: 2)
--rnn_dec_high_transition_depth INT number of GRU transition operations applied in the higher layers of the decoder. Minimum is 1. (Only applies to gru). (default: 1)
--rnn_dec_deep_context pass context vector (from first layer) to deep decoder layers
--rnn_dropout_embedding FLOAT dropout for input embeddings (0: no dropout) (default: 0.0)
--rnn_dropout_hidden FLOAT dropout for hidden layer (0: no dropout) (default: 0.0)
--rnn_dropout_source FLOAT dropout source words (0: no dropout) (default: 0.0)
--rnn_dropout_target FLOAT dropout target words (0: no dropout) (default: 0.0)
--rnn_layer_normalisation Set to use layer normalization in encoder and decoder
--rnn_lexical_model Enable feedforward lexical model (Nguyen and Chiang, 2018)

network parameters (transformer-specific)

parameter description
--transformer_enc_depth INT number of encoder layers (default: 6)
--transformer_dec_depth INT number of decoder layers (default: 6)
--transformer_ffn_hidden_size INT inner dimensionality of feed-forward sub-layers (default: 2048)
--transformer_num_heads INT number of attention heads used in multi-head attention (default: 8)
--transformer_dropout_embeddings FLOAT dropout applied to sums of word embeddings and positional encodings (default: 0.1)
--transformer_dropout_residual FLOAT dropout applied to residual connections (default: 0.1)
--transformer_dropout_relu FLOAT dropout applied to the internal activation of the feed-forward sub-layers (default: 0.1)
--transformer_dropout_attn FLOAT dropout applied to attention weights (default: 0.1)
--transformer_drophead FLOAT dropout of entire attention heads (default: 0.0)

training parameters

parameter description
--loss_function {cross-entropy,per-token-cross-entropy, MRT} loss function. MRT: Minimum Risk Training https://www.aclweb.org/anthology/P/P16/P16-1159.pdf) (default: cross-entropy)
--decay_c FLOAT L2 regularization penalty (default: 0.0)
--map_decay_c FLOAT MAP-L2 regularization penalty towards original weights (default: 0.0)
--prior_model PATH Prior model for MAP-L2 regularization. Unless using " --reload", this will also be used for initialization.
--clip_c FLOAT gradient clipping threshold (default: 1.0)
--label_smoothing FLOAT label smoothing (default: 0.0)
--exponential_smoothing FLOAT exponential smoothing factor; use 0 to disable (default: 0.0)
--optimizer {adam} optimizer (default: adam)
--adam_beta1 FLOAT exponential decay rate for the first moment estimates (default: 0.9)
--adam_beta2 FLOAT exponential decay rate for the second moment estimates (default: 0.999)
--adam_epsilon FLOAT constant for numerical stability (default: 1e-08)
--learning_schedule {constant,transformer,warmup-plateau-decay} learning schedule (default: constant)
--learning_rate FLOAT learning rate (default: 0.0001)
--warmup_steps INT number of initial updates during which the learning rate is increased linearly during learning rate scheduling (default: 8000)
--plateau_steps INT number of updates after warm-up before the learning rate starts to decay (applies to 'warmup-plateau-decay' learning schedule only). (default: 0)
--maxlen INT maximum sequence length for training and validation (default: 100)
--batch_size INT minibatch size (default: 80)
--token_batch_size INT minibatch size (expressed in number of source or target tokens). Sentence-level minibatch size will be dynamic. If this is enabled, batch_size only affects sorting by length. (default: 0)
--max_sentences_per_device INT maximum size of minibatch subset to run on a single device, in number of sentences (default: 0)
--max_tokens_per_device INT maximum size of minibatch subset to run on a single device, in number of tokens (either source or target - whichever is highest) (default: 0)
--gradient_aggregation_steps INT number of times to accumulate gradients before aggregating and applying; the minibatch is split between steps, so adding more steps allows larger minibatches to be used (default: 1)
--maxibatch_size INT size of maxibatch (number of minibatches that are sorted by length) (default: 20)
--no_sort_by_length do not sort sentences in maxibatch by length
--no_shuffle disable shuffling of training data (for each epoch)
--keep_train_set_in_memory Keep training dataset lines stores in RAM during training
--max_epochs INT maximum number of epochs (default: 5000)
--finish_after INT maximum number of updates (minibatches) (default: 10000000)
--print_per_token_pro PATH PATH to store the probability of each target token given source sentences over the training dataset (without training). If set to False, the function will not be triggered. (default: False). Please get rid of the 1.0s at the end of each list which are the probability of padding.

minimum risk training parameters (MRT)

parameter description
--mrt_reference add reference into MRT candidates sentences (default: False)
--mrt_alpha FLOAT MRT alpha to control the sharpness of the distribution of sampled subspace (default: 0.005)
--samplesN INT the number of sampled candidates sentences per source sentence (default: 100)
--mrt_loss evaluation metrics used to compute loss between the candidate translation and reference translation (default: SENTENCEBLEU n=4)
--mrt_ml_mix FLOAT mix in MLE objective in MRT training with this scaling factor (default: 0)
--sample_way {beam_search, randomly_sample} the sampling strategy to generate candidates sentences (default: beam_search)
--max_len_a INT generate candidates sentences with maximum length: ax + b, where x is the length of the source sentence (default: 1.5)
--max_len_b INT generate candidates sentences with maximum length: ax + b, where x is the length of the source sentence (default: 5)
--max_sentences_of_sampling INT maximum number of source sentences to generate candidates sentences at one time (limited by device memory capacity) (default: 0)

validation parameters

parameter description
--valid_source_dataset PATH source validation corpus (default: None)
--valid_target_dataset PATH target validation corpus (default: None)
--valid_batch_size INT validation minibatch size (default: 80)
--valid_token_batch_size INT validation minibatch size (expressed in number of source or target tokens). Sentence-level minibatch size will be dynamic. If this is enabled, valid_batch_size only affects sorting by length. (default: 0)
--valid_freq INT validation frequency (default: 10000)
--valid_script PATH path to script for external validation (default: None). The script will be passed an argument specifying the path of a file that contains translations of the source validation corpus. It must write a single score to standard output.
--valid_bleu_source_dataset PATH source validation corpus for external validation (default: None). If set to None, the dataset for calculating validation loss (valid_source_dataset) will be used
--patience INT early stopping patience (default: 10)

display parameters

parameter description
--disp_freq INT display loss after INT updates (default: 1000)
--sample_freq INT display some samples after INT updates (default: 10000)
--beam_freq INT display some beam_search samples after INT updates (default: 10000)
--beam_size INT size of the beam (default: 12)

translate parameters

parameter description
--normalization_alpha [ALPHA] normalize scores by sentence length (with argument, " "exponentiate lengths by ALPHA)
--n_best Print full beam
--translation_maxlen INT Maximum length of translation output sentence (default: 200)
--translation_strategy {beam_search,sampling} translation_strategy, either beam_search or sampling (default: beam_search)

nematus/translate.py : use an existing model to translate a source text

parameter description
-v, --verbose verbose mode
-m PATH [PATH ...], --models PATH [PATH ...] model to use; provide multiple models (with same vocabulary) for ensemble decoding
-b INT, --minibatch_size INT minibatch size (default: 80)
-i PATH, --input PATH input file (default: standard input)
-o PATH, --output PATH output file (default: standard output)
-k INT, --beam_size INT beam size (default: 5)
-n [ALPHA], --normalization_alpha [ALPHA] normalize scores by sentence length (with argument, exponentiate lengths by ALPHA)
--n_best write n-best list (of size k)
--maxibatch_size INT size of maxibatch (number of minibatches that are sorted by length) (default: 20)

nematus/score.py : use an existing model to score a parallel corpus

parameter description
-v, --verbose verbose mode
-m PATH [PATH ...], --models PATH [PATH ...] model to use; provide multiple models (with same vocabulary) for ensemble decoding
-b INT, --minibatch_size INT minibatch size (default: 80)
-n [ALPHA], --normalization_alpha [ALPHA] normalize scores by sentence length (with argument, exponentiate lengths by ALPHA)
-o PATH, --output PATH output file (default: standard output)
-s PATH, --source PATH source text file
-t PATH, --target PATH target text file

nematus/rescore.py : use an existing model to rescore an n-best list.

The n-best list is assumed to have the same format as Moses:

sentence-ID (starting from 0) ||| translation ||| scores

new scores will be appended to the end. rescore.py has the same arguments as score.py, with the exception of this additional parameter:

parameter description
-i PATH, --input PATH input n-best list file (default: standard input)

nematus/theano_tf_convert.py : convert an existing theano model to a tensorflow model

If you have a Theano model (model.npz) with network architecture features that are currently supported then you can convert it into a tensorflow model using nematus/theano_tf_convert.py.

parameter description
--from_theano convert from Theano to TensorFlow format
--from_tf convert from Tensorflow to Theano format
--in PATH path to input model
--out PATH path to output model

PUBLICATIONS

if you use Nematus, please cite the following paper:

Rico Sennrich, Orhan Firat, Kyunghyun Cho, Alexandra Birch, Barry Haddow, Julian Hitschler, Marcin Junczys-Dowmunt, Samuel Läubli, Antonio Valerio Miceli Barone, Jozef Mokry and Maria Nadejde (2017): Nematus: a Toolkit for Neural Machine Translation. In Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 65-68.

@InProceedings{sennrich-EtAl:2017:EACLDemo,
  author    = {Sennrich, Rico  and  Firat, Orhan  and  Cho, Kyunghyun  and  Birch, Alexandra  and  Haddow, Barry  and  Hitschler, Julian  and  Junczys-Dowmunt, Marcin  and  L\"{a}ubli, Samuel  and  Miceli Barone, Antonio Valerio  and  Mokry, Jozef  and  Nadejde, Maria},
  title     = {Nematus: a Toolkit for Neural Machine Translation},
  booktitle = {Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics},
  month     = {April},
  year      = {2017},
  address   = {Valencia, Spain},
  publisher = {Association for Computational Linguistics},
  pages     = {65--68},
  url       = {http://aclweb.org/anthology/E17-3017}
}

the code is based on the following models:

Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio (2015): Neural Machine Translation by Jointly Learning to Align and Translate, Proceedings of the International Conference on Learning Representations (ICLR).

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (2017): Attention is All You Need, Advances in Neural Information Processing Systems (NIPS).

please refer to the Nematus paper for a description of implementation differences to the RNN model.

ACKNOWLEDGMENTS

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements 645452 (QT21), 644333 (TraMOOC), 644402 (HimL) and 688139 (SUMMA).

nematus's People

Contributors

andre-martins avatar avmb avatar bachstelze avatar bhaddow avatar bricksdont avatar chaojun-wang avatar chozelinek avatar cshanbo avatar donglixp avatar emjotde avatar franck-dernoncourt avatar franckbrl avatar jakezhaojb avatar jeffreyjosanne avatar jozef-mokry avatar julianhitschler avatar jvamvas avatar kocmitom avatar kyunghyuncho avatar laeubli avatar lprieb avatar lxafly avatar m4t1ss avatar mbartoli avatar mnadejde avatar orhanf avatar pjwilliams avatar proyag avatar rsennrich avatar wen-li avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nematus's Issues

Processes in deadlock on using -p 2 or more

In translate_single.sh script, when I am using number of processes -p with value 2 or more I am getting following output.

$model_dir/preprocess.sh | \
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=$device python $nematus_home/nematus/translate.py \
     -m $model_dir/model.l2r.ens1.npz --suppress-unk \
     -k 5 -n -p 2  | \
$model_dir/postprocess.sh

Output:

Detokenizer Version $Revision: 4134 $
Language: en
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.138 seconds.
Prefix dict has been built succesfully.
Using cuDNN version 6021 on context None
Mapped name None to device cuda: GeForce GTX TITAN X (0000:02:00.0)
Using cuDNN version 6021 on context None
Mapped name None to device cuda: GeForce GTX TITAN X (0000:02:00.0)
INFO: Waiting for existing lock by process '14569' (I am process '14570')
INFO: To manually release the lock, delete /home/himanshu/.theano/compiledir_Linux-4.4--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.15-64/lock_dir
INFO: Waiting for existing lock by process '14570' (I am process '14569')
INFO: To manually release the lock, delete /home/himanshu/.theano/compiledir_Linux-4.4--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.15-64/lock_dir
INFO: Waiting for existing lock by process '14570' (I am process '14569')
INFO: To manually release the lock, delete /home/himanshu/.theano/compiledir_Linux-4.4--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.15-64/lock_dir
INFO: Waiting for existing lock by process '14570' (I am process '14569')
INFO: To manually release the lock, delete /home/himanshu/.theano/compiledir_Linux-4.4--generic-x86_64-with-debian-jessie-sid-x86_64-2.7.15-64/lock_dir

And these two processes, keeps repeating these messages. Looks like this is a deadlock.
Let me know if I am missing something, also how can I fix this if it is an issue?

Cross-entropy broken for small training set or batches?

Hi,
when using the WMT de-en model to initialize a training run on a single sentence corpus, I get very incorrect cross-entropy results. I am using only default setting, no dropout or anything. I verified that the model is being loaded correctly. The values in the final layer before it goes into cross-entropy are also correct.

For instance for the pair

das ist ein Test .
this is a test .

The cost should be around 0.61868, verified by scoring with Amun. Nematus in its current version from master produces 102 for the first forward step. When repeating this sentence 20 times and increasing the batch size accordingly, the cost becomes 111.15, which makes no sense as it should at least average to the same cost as for the single sentences.

After inspecting the cost vector manually, it seems only the first value is being calculated correctly. Can anyone confirm this? Or is something wrong with my setup?

Different result after converting model from Theano to TF

Hi, I am experimenting with the pre-trained Nematus models for WMT'17, zh-en language pair.

I converted the pre-trained model to Tensorflow using this command: python nematus/theano_tf_convert.py --from_theano --in ../wmt17_systems/zh-en/model.l2r.ens1.npz --out ../wmt17_systems/zh-en/model-tf.l2r.ens1.npz.

And run a single model translation using this command:

$model_dir/preprocess.sh | \
CUDA_VISIBLE_DEVICES=0 python $nematus_tf_home/nematus/translate.py \
      -m $model_dir/model-tf.l2r.ens1.npz \
      -k 12 -n -p 1  | \
$model_dir/postprocess.sh

I observed quite significant drop in BLEU:

Theano version:
BLEU = 22.84, 56.7/29.1/17.0/10.4 (BP=0.982, ratio=0.982, hyp_len=52856, ref_len=53827)

Tensorflow version:
BLEU = 21.26, 53.8/26.8/15.4/9.2 (BP=1.000, ratio=1.044, hyp_len=56182, ref_len=53827)

By the way, when I run the conversion code, the following was printed:

Not saving decoder_c_tt because no TF equivalent
The following TF variables were not assigned (excluding Adam vars):
You should see only 'beta1_power', 'beta2_power' and 'time' variable listed
time:0
beta1_power:0
beta2_power:0

I also noticed that --suppress-unk is no longer available when calling translate.py. Anything I've missed here? thanks :)

stdout pollution

I translated from stdin to stdout. This garbage appeared in my stdout along with the translated text.
['nvcc', '-shared', '-O3', '-m64', '-Xcompiler', '-DCUDA_NDARRAY_CUH=c72d035fdf91890f3b36710688069b2e,-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION,-fPIC,-fvisibility=hidden', '-Xlinker', '-rpath,/home/heafield/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray', '-I/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda', '-I/usr/local/lib/python2.7/dist-packages/numpy/core/include', '-I/usr/include/python2.7', '-I/usr/local/lib/python2.7/dist-packages/theano/gof', '-o', '/home/heafield/.theano/compiledir_Linux-4.4--generic-x86_64-with-Ubuntu-16.04-xenial-x86_64-2.7.12-64/cuda_ndarray/cuda_ndarray.so', 'mod.cu', '-L/usr/lib', '-lcublas', '-lpython2.7', '-lcudart']
ESC]0;IPython: ro-en/docs^G

(Note the garbage has been postprocessed)

translate.py outputs probabilities greater than one

python $nematus/translate.py
-m $prefix.dev.npz
-i $file_base.$src -o $file_base.$src.output.dev -k 1 -n -p 5 --suppress-unk --print-word-probabilities

results in something like:

ein Kampf der Republikaner gegen die Wiederwahl Obamas
1.98620128632 0.375202327967 0.935490012169 0.990142166615 5.79434633255 0.540984451771 0.961822271347 1.74049687386 0.97704654932

Any idea why this happens? Is it related to the length normalization?

major differences between translation time and training time

When I use translate.py the results are quite weird, if I train my model for low number of iterations I end up with blank translations(each sentence from the test is translated to eos right away).
If I train my model for high number of iterations I end up with sentences which contains single word with repetitions.

The validation examples during training time looks OK (--sampleFreq param).

Do you have any thoughts?

Float16 does not work

In this branch, I removed all hardcoded references to float32 and I tried to train with float16, but it does not work:

Using cuDNN version 5105 on context None
Mapped name None to device cuda0: TITAN X (Pascal) (0000:02:00.0)
Loading data
Building model
Building sampler
Building f_init... Done
Building f_next.. Done
Building f_log_probs... Done
Computing gradient... Done
Building optimizers...Disabling C code for Elemwise{Cast{float32}} due to unsupported float16
Done
Total compilation time: 198.4s
Optimization
Seen 846 samples
NaN detected

I've also tried increasing the epsilon in the Adam optimizer, but it doesn't solve the issue.

number of next word predictions at translation time

Dear nematus community,

It has been couple of months I have started to deal with neural machine translation. I have a question related to nematus code, and I am sorry if it is so primitive for you.

In translation time, in the function gen_sample() under nmt.py, next_w predictions are not one, but it varies "as I have experimented up to now" up to 5, why is it?

So basically, I am getting alignment vector, context vector and next word predictions at translation time under the following loop:

# x is a sequence of word ids followed by 0, eos id
for ii in xrange(maxlen):

But the number of values at each time step at alignment vector, context vector and next word vector is more than one, up to 5.

Why?

Thanks for your time and answer.

Kind Regards,

the copy_unknown_words.py can not be used

Hi Rico,
Have you considered to output the original word without the UNK symbol? I found that the script copy_unknown_words.py in the UTILS folder can not replace the unknown words in target sentences with their aligned words in source sentences.
Do you have any suggestions ?

Please consider providing a demo container

Ideally a container with a pre-trained model in it could be available so that we can easily try the system without having to run numerous setup and training steps manually.

nematus/server directory deleted during TF merge, breaking server.py

After the 9b1ebb5 merge, on a fresh install, nematus/server.py is going to crash since these two lines

from server.response import TranslationResponse
from server.api.provider import request_provider, response_provider

refer to the now-deleted server directory. I've figured out a way to revert that using PyCharm (I'm not enough of a git wizard to do it "by hand", I guess 😄) that conserves the files' git history using git log --full-history (my test commit is the only one I see via the usual git log nematus/server), which I can submit as a PR if you'd like.

Also, 🎉 for TF and Python 3 compatibility!

maxlen in data iterators

In domain_iterator.py and domain_interpolation_data_iterator.py there is this fragment of code:

            if len(ss) > self.maxlen and len(tt) > self.maxlen:
                continue

which skips a sentence pair if both the source sentence and the target sentence exceed the maximum allowed length. Is this behavior correct? Shouldn't we skip the pair if any of the sentences exceeds the maximum length?

score.py can not save alignments

Traceback (most recent call last):
File "/fs/meili0/amiceli/nematus-crelu/nematus/score.py", line 132, in
args.output, b=args.b, normalization_alpha=args.n, verbose=args.v, alignweights=args.walign)
File "/fs/meili0/amiceli/nematus-crelu/nematus/score.py", line 106, in main
rescore_model(source_file, nbest_file, saveto, models, options, b, normalization_alpha, verbose, alignweights)
File "/fs/meili0/amiceli/nematus-crelu/nematus/score.py", line 91, in rescore_model
for line in all_alignments:
NameError: global name 'all_alignments' is not defined

Epoch number is lost after resume

When training has been interrupted and is later resumed the number of iterations is correctly displayed, however, the epoch number is lost and does not seem to increase later on during training.

TypeError: param_init_embedding_layer() missing 2 required positional arguments: 'n_words' and 'dims

I have been trying to get nematus to work using the wmt16 model, but I think there is something wrong in the code. I tried to fix it but no matter what I (guessed) to change, it keeps crashing in different places. Here's the root problem:

Warning: No built-in rules for language de.
Detokenizer Version $Revision: 4134 $
Language: de
Tokenizer Version 1.1
Language: en
Number of threads: 1
Translating <stdin> ...
Using cuDNN version 5110 on context None
Mapped name None to device cuda: GeForce GTX 960M (0000:01:00.0)
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/media/data/translation/nematus/nematus/translate.py", line 54, in translate_model
    f_init, f_next = build_sampler(tparams, option, use_noise, trng, return_alignment=return_alignment)
  File "/media/data/translation/nematus/nematus/nmt.py", line 359, in build_sampler
    x, ctx = build_encoder(tparams, options, trng, use_noise, x_mask=None, sampling=True)
  File "/media/data/translation/nematus/nematus/nmt.py", line 186, in build_encoder
    emb = get_layer_constr('embedding')(tparams, x, suffix='', factors= options['factors'])
TypeError: param_init_embedding_layer() missing 2 required positional arguments: 'n_words' and 'dims'
Error: translate worker process 10737 crashed with exitcode 1

If you look at layers.py, line 76, I think it's true. But where to get n_words and dims from?

translate.py

hi i want to knwo that i ran my training till 350 epochs an i got these models 👍
model.npz-30000.data-00000-of-00001 model.npz.index
model.npz-30000.index model.npz.json
model.npz-30000.meta model.npz.meta
model.json model.npz-30000.progress.json model.npz.progress.json
model.npz model.npz.data-00000-of-00001

Then i ran the command ~/data/nematus-master# python nematus/score.py --models model model.npz model.npz.progress model.npz-30000.meta model.npz-30000.progress --source /root/data/nematus-master/data2/test/decldesc_test_bpe --target /root/data/nematus-master/data2/test/bodies_test_bpe --output /root/data/nematus-master/data2/test/output2.txt

and i got an error
Traceback (most recent call last):
File "nematus/score.py", line 82, in
main(source_file, target_file, output_file, scorer_settings)
File "nematus/score.py", line 68, in main
fill_options(options[-1])
File "/root/data/nematus-master/nematus/compat.py", line 19, in fill_options
first_factor_size = options['n_words_src']
KeyError: u'n_words_src'

can you please tell me why this eror is coming . please guide me where i am going wrong.

nematus and device=cuda versus device=gpu in theano 0.8.2 vs dev

Hi Nematus Team,

Apologies in advance if this is a known issue or if I am misunderstanding something.

For reference, our hardware is an Intel Xeon 1620v3 and a GeForce 1070.

We are able to use nematus quite easily under Debian 8.7, Theano 0.8.2, running:
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu ./test_train.sh
and
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu ./test_translate.sh

When moving to Theano 0.9.0dev5.dev, this command:
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu ./test_train.sh
works fine but results in a message about a deprecated device interface:

WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release.  Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

So we use this instead:
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=cuda ./test_train.sh

So far, so good. We see speed improvement from 147 sentences/sec to 204 sentences/sec with device=cuda instead of device=gpu.

However, when we run:
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=cuda ./test_translate.sh

we receive the following output:

Translating ../../en-de/in ...
Using cuDNN version 5105 on context None
Mapped name None to device cuda0: GeForce GTX 1070 (0000:02:00.0)
Building f_init... Done
Building f_next.. Done
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/cmc/nmt/nematus/nematus/translate.py", line 72, in translate_model
    seq = _translate(x)
  File "/home/cmc/nmt/nematus/nematus/translate.py", line 52, in _translate
    suppress_unk=suppress_unk, return_hyp_graph=return_hyp_graph)
  File "/home/cmc/nmt/nematus/nematus/nmt.py", line 489, in gen_sample
    ret = f_init[i](x)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 886, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 873, in __call__
    self.fn() if output_subset is None else\
RuntimeError: Invalid value or operation
Apply node that caused the error: GpuAdvancedSubtensor1(Wemb, GpuReshape{1}.0)
Toposort index: 73
Inputs types: [GpuArrayType<None>(float32, (False, False)), GpuArrayType<None>(int64, (False,))]
Inputs shapes: [(85000, 500), (10,)]
Inputs strides: [(2000, 4), (-8,)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[GpuIncSubtensor{InplaceSet;::, int64:int64:}(GpuAlloc<None>{memset_0=True}.0, GpuAdvancedSubtensor1.0, Constant{0}, ScalarFromTensor.0)]]

But, it is still possible to run it with device=gpu (albeit with the deprecation warning message).

We appreciate any advice you can give us!

training became extremely slow on GPU

Hi,

I am training a translation model with around 15 million parallel corpora. After around 15 epochs, my training went down from around 70s/s to around 20s/s.

I use tesla k40, and cudnn 5.1. What might be the problem?

I have checked cpu and gpu usage, I saw that gpu is allocated, and only 1 cpu is allocated.

What might be the problem?

Thanks,

Error when run test_train.sh

There are error message in console:
Loading data
Building model
Traceback (most recent call last):
File "../nematus/nmt.py", line 1208, in
train(**vars(args))
File "../nematus/nmt.py", line 795, in train
build_model(tparams, model_options)
File "../nematus/nmt.py", line 237, in build_model
x, ctx = build_encoder(tparams, options, trng, use_noise, x_mask, sampling=False)
File "../nematus/nmt.py", line 194, in build_encoder
profile=profile)
File "/Users/Mr.Wu/wup/nematus/nematus/layers.py", line 171, in gru_layer
strict=True)
File "/Users/Mr.Wu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan.py", line 1041, in scan
scan_outs = local_op(*scan_inputs)
File "/Users/Mr.Wu/anaconda/lib/python2.7/site-packages/theano/gof/op.py", line 611, in call
node = self.make_node(*inputs, **kwargs)
File "/Users/Mr.Wu/anaconda/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 538, in make_node
inner_sitsot_out.type.dtype))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 3) has dtype float32, while the result of the inner function (fn) has dtype float64. This can happen if the inner function of scan results in an upcast or downcast.

If you suspect this is an IPython bug, please report it at:
https://github.com/ipython/ipython/issues
or send an email to the mailing list at [email protected]

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
%config Application.verbose_crash=True

My environment are:
Python 2.7.3
numpy 1.11.3
theano 0.8.2
Could you help me solve it ?thanks

Domain Adaptation with nematus

Hi folks,

I have trained a translation model with a dataset. After 510000th iteration, I have killed the training, and started a new training with a new dataset, by using the last 510000th model I had. For this reason, I have created a new models folder, and copied model.iter510000.npz as model.npz, and model.iter510000.npz.gradinfo.npz as model.npz.gradinfo.npz. But I forgot to copy model.iter510000.progress.json to my new models file.

Theoretically, it shouldn't effect the fact that I continue training from 510000th iteration, right? Because since I have not copied progress.json file, the output of the code shows like I have started from 0.

Using Minimum Risk Training (MRT)

Hi Rico,

We are trying to use the MRT feature of Nematus but somehow unable to train it properly. PFA the attached configuration of the model. Please suggest if any issue in the configuration.

config.py.txt

The trained model is not even able to translate properly (translation output is "." only ).

n_factors related bug in nmt.py

Hi all,

There are 2 potential conditions that might encounter a list index out of range exception (see below) in nmt.py here (condition 1) and data_iterator.py here (condition 2)

  1. When the first sentence of a mini-batch is an empty line and the skip_empty flag is not set as True.
  2. When we don't want to use any factor other than the word itself, but the first sentence of a mini-batch startswith a | symbol in the training data.
    I think it's better that we use a try, exception to capture such exceptions.
    I'll test it and raise a PR if you want.

##############################
The exception is like:

Traceback (most recent call last):
  File "../../nematus/nmt_worker.py", line 666, in <module>
    update_algorithm=args.update_algorithm
  File "../../nematus/nmt_worker.py", line 443, in train_on_multi_gpu
    epoch, x, x_mask, y, y_mask = get_one_mini_batch(train_it)
  File "../../nematus/nmt_worker.py", line 422, in get_one_mini_batch
    n_words=n_words)
  File "/nematus/nmt.py", line 68, in prepare_data
    n_factors = len(seqs_x[0][0])
IndexError: list index out of range

If you suspect this is an IPython bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at [email protected]

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
    %config Application.verbose_crash=True

Algorithm

hi ,
can u please tell me what is the algorithm of you neural machine translator. what all steps you have done to make it so different from other neural machine translator .
So, i request you to please tell me the algorithm.

best,
Bhagat

EOFError in ERROR: test_ende (__main__.TestTranslate)

I'm a beginner in NMT. I got an error when I try to run the test_traslate.py with the script: THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=cpu python test_translate.py

The feedback was:
1522250383082

It will be great if you can help with my problem. Thanks a lot!

little guidance required

hey, if i want to use NEMATUS for code2doc task how to use it . Can u please guide me with the steps.Thank you so much.
I have already done all the preprocessing tast as suggested by you . now i am not able to get it that how to use the vocab file .json file and the x.train , y.train.

Cost 0.0 in training

Hi!

I'm running the last version of Nematus in a machine with Ubuntu16.04, cuda-8.0 and Theano 0.9, but when I run test_train.sh the cost of the iteration is 0.0.
The same thing happens when running an actual training on parallel data (IWSLT 2016 en-fr). The objective functions is cross-entropy.
I used previous versions of nematus but I've never encountered this problem

KeyError: 'deep_fusion_lm'

File "/nematus/nematus/nmt.py", line 609, in build_sampler
if options['deep_fusion_lm']:
KeyError: 'deep_fusion_lm'
Does anyone know how to solve the problem? Thank you:)

Training speed benchmarks

Hi,

We are looking into training speed, using the test_train.sh script. Comparing our numbers to the benchmarks currently reported in the readme, our "words/s" numbers are in the range of the reported "sentences/s". So either our training is really slow, or the benchmark numbers should actually be words per second.

Comparing a commit from November (when the benchmarks where added) to the current version, supports the latter hypothesis. Could you confirm and adapt the benchmark numbers?

Thank you!

build_dictionary.py : python 3 version

I just wanted to upload a python 3.6 version of the build_dictionary.py file for anyone that would like to use it.

I used this stackoverflow suggestion as the reasoning behind my changes.
https://stackoverflow.com/questions/39284842/order-dictionary-index-in-python

#!/usr/bin/python

import numpy
import json

import sys
import io

from collections import OrderedDict

def main():
    for filename in sys.argv[1:]:
        print ('Processing', filename)
        word_freqs = OrderedDict()
        with open(filename, 'r') as f:
            for line in f:
                words_in = line.strip().split(' ')
                for w in words_in:
                    if w not in word_freqs:
                        word_freqs[w] = 0
                    word_freqs[w] += 1
        words = list(word_freqs.keys())
        freqs = list(word_freqs.values())

        sorted_idx = numpy.argsort(freqs)
        sorted_words = [words[ii] for ii in sorted_idx[::-1]]

        worddict = OrderedDict()
        worddict['eos'] = 0
        worddict['UNK'] = 1
        for ii, ww in enumerate(sorted_words):
            worddict[ww] = ii+2

        with open('%s.json'%filename, 'w', encoding="utf-8") as f:
            json.dump(worddict, f, indent=2, ensure_ascii=False)

        print('Done')

if __name__ == '__main__':
    main()

rescore.py with -w parameter

Hi Rico,
thank you for Nematus, it is such a great tool. We've run into a slight problem while using reranking with -w option to get the attention matrix, eg:

THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=$device,on_unused_input=warn,lib.cnmem=0.8 python $nematus/nematus/rescore.py \
     -m model.r2l.npz \
     -s  preprocessed.tok \
     -i  reversed.tok \
     -o rescored.tok \
     -b 80 -n -w
  File "/home/current_nematus/nematus/nematus/rescore.py", line 97, in <module>
    main(source_file, nbest_file, output_file, rescorer_settings)
  File "/home/current_nematus/nematus/nematus/rescore.py", line 88, in main
    rescore_model(source_file, nbest_file, output_file, rescorer_settings, options)
  File "/home/current_nematus/nematus/nematus/rescore.py", line 77, in rescore_model
    align_OUT.write(line + "\n")
TypeError: can only concatenate list (not "str") to list

As you can see, variable line is a list, so I've made a simple modification in the code. I added a check for a variable type on the line before the error occurs (76 in current rescore.py), e.g. from:

if rescorer_settings.alignweights:
	for line in alignments:
		align_OUT.write(line + "\n")

to:

if rescorer_settings.alignweights:
	for line in alignments:
		if type(line)==list:
			for l in line:
				align_OUT.write(l + "\n")
		else:
			align_OUT.write(line + "\n")

I'm not sure if that it is the correct way to handle this, it works ok for us though, hope it will help if somebody else comes across this issue. If there is a better way to solve this, please let me know if I can be of any help.
Thanks, Josef.

code generation

In the code generation as you have mentioned to use Namatus for the nmt in that the input is a declaration,docstring and output is body . then how the code will be generated ?

in translation like en to german we know that hi means hello .. but over here how the translation actually working and how the tokenization has been done?

i am new to generation of code .. so need some advice to implement this model and i have also read the nematus how it is working but nothing is described about the translaion of declation,docstring+body
i will be very thankfull to you if you can guide me how to use nematus for code generation ..

Better message for encoder_truncate_gradient

Ran this (which I assume is a slightly out of date model):

THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu,on_unused_input=warn python /fs/magni0/heafield/ro-en/nematus/nematus/translate.py -m /mnt/baldur0/rsennrich/wmt16_neural/ro-en/exp6/model.npz -k 12 -n -p 1 --suppress-unk

Get an error message:

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/fs/magni0/heafield/ro-en/nematus/nematus/translate.py", line 42, in translate_model
    f_init, f_next = build_sampler(tparams, option, use_noise, trng, return_alignment=return_alignment)
  File "/mnt/magni0/heafield/ro-en/nematus/nematus/nmt.py", line 393, in build_sampler
    x, ctx = build_encoder(tparams, options, trng, use_noise, x_mask=None, sampling=True)
  File "/mnt/magni0/heafield/ro-en/nematus/nematus/nmt.py", line 229, in build_encoder
    truncate_gradient=options['encoder_truncate_gradient'],
KeyError: 'encoder_truncate_gradient'

Note that this config does not mention encoder_truncate_gradient at all.

json and pkl files missing

Hi, I think the translate.py script (at line 111) should say something more meaningful when both the .pkl and .json files are missing. I'm currently missing both files, I think it's because saveFreq was not set properly.

Spanish translation

Hello, I would like to collaborate with the Spanish translation, is it done?

nematus/score.py broken because of error in nematus/theano_util.py

Hi there,

I've been testing score.py as in master commit b5469b4 with the following script.

#!/bin/sh

# theano device, in case you do not want to compute on gpu, change it to cpu
# device=gpu
device=cpu

# path to nematus ( https://www.github.com/rsennrich/nematus )
nematus=~/Research/Resources/nematus

## Path to the directory to save corpus data
DATA=..

# path to source files
ST=$DATA/alignments/sentence/mbitexts/word/en_ceb

# SL
SL=en

# TL
TL=es

# path to the target files
TT=$DATA/alignments/sentence/mbitexts/word/es

# path to the output directory
OUTDIR=$DATA/alignments/sentence/nmt_cbe_output

# mkdir OUTDIR
mkdir -p $OUTDIR

## model
MODEL=~/CORPORA/nmt-cristina/model_L1L2w_v80k.npz

for i in $ST/*.txt
do
    echo ${i##*/}
    THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=$device,on_unused_input=warn python $nematus/nematus/score.py \
         -b 80 \
         -v \
         -m $MODEL \
         -s $i \
         -t $TT/${i##*/} \
         -o $OUTDIR/${i##*/}
done

And I got an error whose traceback is as follows:

Traceback (most recent call last):
  File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 132, in <module>
    args.output, b=args.b, normalization_alpha=args.n, verbose=args.v, alignweights=args.walign)
  File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 106, in main
    rescore_model(source_file, nbest_file, saveto, models, options, b, normalization_alpha, verbose, alignweights)
  File "/Users/jmmmac/Research/Resources/nematus/nematus/score.py", line 35, in rescore_model
    params = load_params(model, param_list)
  File "/Users/jmmmac/Research/Resources/nematus/nematus/theano_util.py", line 72, in load_params
    new_params[with_prefix+kk] = pp[kk].astype(floatX, copy=False)
TypeError: float() argument must be a string or a number

Best!

ValueError: Parent directory of model doesn't exist, can't save.

INFO: Validation loss (AVG/SUM/N_SENT): 212.140906401 93129.8579102 439
2018-06-30 02:08:31.421233: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:109 : Not found: ; No such file or directory
Traceback (most recent call last):
File "nematus/nmt.py", line 692, in
train(config, sess)
File "nematus/nmt.py", line 313, in train
saver.save(sess, save_path=config.saveto)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1720, in save
raise exc
ValueError: Parent directory of model doesn't exist, can't save.

Can you please tell me why i am getting this error when i am running the nematus model for code generation task?

How can I get the context vector at translation time?

in translation time I would like to get the context vector(with soft attention) and words embeddings for each word wi with all the alphas and hidden state, for example (english to french):

h1 h2 h3 h4 h5 h6
| | | | | |
the boy played -> Le garçon jouait
| | | | | |
x1 x2 x3 x4 x5 x6

if we look at the word "garçon" at translation time, I would like to get alpha1h1+alpha2h2+alpha3*h3 sum, and h1,h2,h3,alpha1,alpha2,alpha3 for this word.
I would also like to get the source word embeddings(x1,x2,x3)

IOError

Hi,
I'm getting some strange IOError exceptions while running Nematus training (both on baldur and meili).

Traceback (most recent call last):
File "train.py", line 60, in
main()
File "train.py", line 56, in main
external_validation_script=WDIR + '/scripts/validate.sh')
File "/fs/meili0/amiceli/nematus-dev/nematus/nmt.py", line 964, in train
numpy.savez(saveto, history_errs=history_errs, uidx=uidx, **params)
File "/mnt/meili0/rsennrich/tools/virtual_environment/local/lib/python2.7/site-packages/numpy/lib/npyio.py", line 574, in savez
_savez(file, args, kwds, False)
File "/mnt/meili0/rsennrich/tools/virtual_environment/local/lib/python2.7/site-packages/numpy/lib/npyio.py", line 642, in _savez
zipf.write(tmpfile, arcname=fname)
File "/usr/lib/python2.7/zipfile.py", line 1184, in write
self.fp.write(buf)
IOError: [Errno 5] Input/output error

Python documentation says that Errno 5 is a generic I/O error.

I've also got some IOErrors of the same kind while reading the corpus in data_iterator.py but I kinda fixed them just by catching the exception and reshuffling and reopening the files.

Any ideas on what is going on?

Approximated softmax

Hi,

I am wondering why the last layer is a softmax and not an approximated version such as Hierarchical softmax or noise contrastive estimation.
Maybe the improvement on time performance wouldn't be significant?

Thanks,
Mattia

we train Neural Machine Translation (NMT) models in both direction using Nematus

Hi,
I was reading your paper "A Parallel Corpus of Python Functions and Documentation Strings for
Automated Code Documentation and Code Generation" where you have told that you have trained "Neural Machine Translation (NMT) models in both direction using Nematus"

my question is how to preprocess the dataset as the dataset is a parallel corpus of python funtion and doc strings.
for ex- if we use a parallel corpora of only text like eng-deu , we use word embeddings. so , in this case what we will use ?
Thank you so much

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.