GithubHelp home page GithubHelp logo

dl4mt-nonauto's Introduction

Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement

PyTorch implementation of the models described in the paper Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement.

We present code for training and decoding both autoregressive and non-autoregressive models, as well as preprocessed datasets and pretrained models.

Dependencies

Python

  • Python 3.6
  • PyTorch 0.3
  • Numpy
  • NLTK
  • torchtext
  • torchvision

GPU

  • CUDA (we recommend using the latest version. The version 8.0 was used in all our experiments.)

Related code

Downloading Datasets & Pre-trained Models

The original translation corpora can be downloaded from (IWLST'16 En-De, WMT'16 En-Ro, WMT'15 En-De, MS COCO). For the preprocessed corpora and pre-trained models, see below.

Dataset Model
IWSLT'16 En-De Data Models
WMT'16 En-Ro Data Models
WMT'15 En-De Data Models
MS COCO Data Models

Before you run the code

Set correct path to data in data_path() function located in data.py:

Loading & Decoding from Pre-trained Models

  1. For vocab_size, use 60000 for WMT'15 En-De, 40000 for the other translation datasets and 10000 for MS COCO.
  2. For params, use big for WMT'15 En-De and small for the other translation datasets.

Autoregressive

$ python run.py --dataset <dataset> --vocab_size <vocab_size> --ffw_block highway --params <params> --lr_schedule anneal --mode test --debug --load_from <checkpoint>

Non-autoregressive

$ python run.py --dataset <dataset> --vocab_size <vocab_size> --ffw_block highway --params <params> --lr_schedule anneal --fast --valid_repeat_dec 20 --use_argmax --next_dec_input both --mode test --remove_repeats --debug --trg_len_option predict --use_predicted_trg_len --load_from <checkpoint>

For adaptive decoding, add the flag --adaptive_decoding jaccard to the above.

Training New Models

Autoregressive

$ python run.py --dataset <dataset> --vocab_size <vocab_size> --ffw_block highway --params <params> --lr_schedule anneal

Non-autoregressive

$ python run.py --dataset <dataset> --vocab_size <vocab_size> --ffw_block highway --params <params> --lr_schedule anneal --fast --valid_repeat_dec 8 --use_argmax --next_dec_input both --denoising_prob --layerwise_denoising_weight --use_distillation

Training the Length Prediction Model

  1. Take a checkpoint pre-trained non-autoregressive model
  2. Resume training using these in addition to the same flags used in step 1: --load_from <checkpoint> --resume --finetune_trg_len --trg_len_option predict

MS COCO dataset

  • Run pre-trained autoregressive model
python run.py --dataset mscoco --params big --load_vocab --mode test --n_layers 4 --ffw_block highway --debug --load_from mscoco_models_final/ar_model --batch_size 1024
  • Run pre-trained non-autoregressive model
python run.py --dataset mscoco --params big --use_argmax --load_vocab --mode test --n_layers 4 --fast --ffw_block highway --debug --trg_len_option predict --use_predicted_trg_len --load_from mscoco_models_final/nar_model --batch_size 1024
  • Train new autoregressive model
python run.py --dataset mscoco --params big --batch_size 1024 --load_vocab --eval_every 1000 --drop_ratio 0.5 --lr_schedule transformer --n_layers 4
  • Train new non-autoregressive model
python run.py --dataset mscoco --params big --use_argmax --batch_size 1024 --load_vocab --eval_every 1000 --drop_ratio 0.5 --lr_schedule transformer --n_layers 4 --fast --use_distillation --ffw_block highway --denoising_prob 0.5 --layerwise_denoising_weight --load_encoder_from mscoco_models_final/ar_model

After training it, train the length predictor (set correct path in load_from argument)

python run.py --dataset mscoco --params big --use_argmax --batch_size 1024 --load_vocab --mode train --n_layers 4 --fast --ffw_block highway --eval_every 1000 --drop_ratio 0.5 --drop_len_pred 0.0 --lr_schedule anneal --anneal_steps 100000 --use_distillation --load_from mscoco_models/new_nar_model --trg_len_option predict --finetune_trg_len --max_offset 20

Citation

If you find the resources in this repository useful, please consider citing:

@article{Lee:18,
  author    = {Jason Lee and Elman Mansimov and Kyunghyun Cho},
  title     = {Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement},
  year      = {2018},
  journal   = {arXiv preprint arXiv:1802.06901},
}

dl4mt-nonauto's People

Contributors

jaseleephd avatar kyunghyuncho avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.