GithubHelp home page GithubHelp logo

dumpmemory / fully-nat Goto Github PK

View Code? Open in Web Editor NEW

This project forked from shawnkx/fully-nat

0.0 0.0 0.0 3.17 MB

Shell 1.58% C++ 0.60% Python 95.80% Lua 0.15% Cuda 1.43% Makefile 0.02% Batchfile 0.03% Cython 0.38%

fully-nat's Introduction

Fully Non-autoregressive Neural Machine Translation

This page mainly includes instructions for reproducing results from the following papers

Installation

This code is based on an old version of Fairseq, which has changed significantly after the above paper was published. We previously found it was not easy to merge our code to the up-to-date fairseq version, so we decided to open-source with a saperate code.

To start with,

git clone --recursive [email protected]:shawnkx/Fully-NAT.git

(Optional) ctcdecode is used for advanced decoding with beam-search and language model. Installation follows the following instruction

cd ctcdecode; pip install .

Then,

cd fairseq-stable
python setup.py build_ext --inplace

This code is tested on pytorch==1.7.1, cuda 10.2.

Dataset

We follow the instructions here. You can also download the preprocessed datasets as follows:

Direction Download
Ro <-> En download (.zip)
En <-> De download (.zip)
Ja --> En download (.zip)

Pre-trained Models

Ro-En

Direction Model BLEU Direction Model BLEU
Ro -> En roen.distill.ctc 34.07 En -> Ro enro.distill.ctc 33.41
Ro -> En roen.distill.ctc.glat 34.16 En -> Ro enro.distill.ctc.glat 33.71
Ro -> En roen.distill.ctc.vae 33.87 En -> Ro enro.distill.ctc.vae 33.79

En-De

Direction Model BLEU Direction Model BLEU
De -> En deen.distill.ctc 30.46 En -> De ende.distill.ctc 26.51
De -> En deen.distill.ctc.glat 31.37 En -> De ende.distill.ctc.glat 27.20
De -> En deen.distill.ctc.vae 31.10 En -> De ende.distill.ctc.vae 27.49

Ja-En

Direction Model BLEU BLEU + 4gram-LM + beam20
Ja -> En jaen.distill.ctc.vae 18.73 21.41 (Download 4-gram En-LM)

Train a model

The following command will train a baseline NAT model with CTC loss.

python train.py ${databin_dir} \
    --fp16 \
    --left-pad-source False --left-pad-target False \
    --arch cmlm_transformer_ctc --task translation_lev \
    --noise 'full_mask' --valid-noise 'full_mask' \
    --dynamic-upsample --src-upsample 3 \
    --decoder-learned-pos --encoder-learned-pos \
    --apply-bert-init --share-all-embeddings \
    --optimizer adam --adam-betas '(0.9, 0.999)' --adam-eps 1e-06 \
    --clip-norm 2.4 --dropout 0.3 --lr-scheduler inverse_sqrt \
    --warmup-init-lr 1e-07 --warmup-updates 10000 --lr 0.0005 --min-lr 1e-09 \
    --criterion nat_loss --predict-target 'all' --loss-type 'ctc' \
    --axe-eps --force-eps-zero \
    --label-smoothing 0.1 --weight-decay 0.01 \
    --max-tokens 4096 --update-freq 1 \
    --max-update 300000 --save-dir ${model_dir} --save-interval-updates 5000 \
    --no-epoch-checkpoints --keep-interval-updates 10 --keep-best-checkpoints 5 \
    --seed 2 --log-interval 100 --no-progress-bar \
    --eval-bleu --eval-bleu-args '{"iter_decode_max_iter":0,"iter_decode_collapse_repetition":true}' \
    --eval-bleu-detok 'space' \
    --eval-tokenized-bleu --eval-bleu-remove-bpe '@@ ' --eval-bleu-print-samples \
    --best-checkpoint-metric bleu --maximize-best-checkpoint-metric \
    --tensorboard-logdir ${ckpt_dir}/tensorboard/${expname} 

Use the --latent-dim 8 flag to add the latent variables (VAEs).
Use --glat-sampling-ratio 0.5 --poisson-mask --replace-target-embed to add the GLAT training.

Translate

For inference,

python fairseq_cli/generate.py ${databin_dir} \
    --task translation_lev \
    --path checkpoints/checkpoint_best.pt \
    --gen-subset test \
    --axe-eps  --iter-decode-collapse-repetition --force-eps-zero \
    --left-pad-source False --left-pad-target False \
    --iter-decode-max-iter 0 --beam 1 \
    --remove-bpe --batch-size 200 \

In the end of the generation, we can see the tokenized BLEU score for the translation. For datasets based on BPEs (Ro-En, En-De), we report standard tokenized BLEU, for datasets using sentencepiece (Ja-En), we report sacre-bleu by replacing --remove-bpe as --remove-bpe sentencepiece --scoring sacrebleu.

Set --batch-size 1 to compute the latency when decoding one sentence at a time.

Advanced Decoding Methods

CTC Beam Search

We can use CTC Beam Search (+ language model) to improve the translation quality

python fairseq_cli/generate.py ${databin_dir} \
    --task translation_lev \
    --path checkpoints/checkpoint_best.pt \
    --gen-subset test \
    --axe-eps  --iter-decode-collapse-repetition --force-eps-zero \
    --left-pad-source False --left-pad-target False \
    --iter-decode-max-iter 0 --beam 1 \
    --remove-bpe sentencepiece --scoring sacrebleu \
    --batch-size 20 \
    --model-overrides "{'use_ctc_beamsearch':True, 'ctc_bs_beam':${ctc_beam_size}, 'ctc_bs_alpha':${alpha}, 'ctc_bs_beta':${beta}, 'ctc_bs_lm_path':'${lm_path}'}"

For example, for Ja-En model, we use alpha=0.4, beta=2.2, ctc_beam_size=20
If ${lm_path} is None, we apply CTC beam search without language model.

Citation

@inproceedings{gu-kong-2021-fully,
    title = "Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade",
    author = "Gu, Jiatao  and
      Kong, Xiang",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.11",
    doi = "10.18653/v1/2021.findings-acl.11",
    pages = "120--133",
}

fully-nat's People

Contributors

multipath avatar shawnkx avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.