GithubHelp home page GithubHelp logo

ktnmt's Introduction

KTNMT

Introduction

Source code for the ACL 2023 main conference long paper "Knowledge Transfer in Incremental Learning for Multilingual Neural Machine Translation" (Outstanding Paper Award)

In this work, we propose a knowledge transfer method that can efficiently adapt original MNMT models to diverse incremental language pairs.


Get Started

(Core) Data Preprocessing.

Standard BPE Procedure: following https://github.com/google/sentencepiece with 64k merged BPE tokens.

After obtaining the original vocabulary and the incremental vocabulary, you must get an incremental vocabulary of external models.

Model Training:

This system has been tested in the following environment.

Python version == 3.7

Pytorch version == 1.8.0

Fairseq version == 0.12.0 (pip install fairseq)

Note that it only influences the training procedure of the original and incremental model. You can choose your favorite deep learning library for model training.


Incremental Learning

We build the incremental learning procedure for Multilingual Neural Machine Translation as follows:

  1. Get original multilingual translation models (or train a multilingual translation model by yourself).

  2. Preprocessing incremental data

  • Data Clean (optional, if needed)
  • Get Vocabulary (optional, if needed, follow standard BPE procedure)
  1. Incremental Training. We provide all runing scripts in the folder ''src''. Here is an example:
    echo "Task format: "
    export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
    export SEED=222
    export CHECKPOINT_PATH='' # save new model
    export CL_CHECKPOINT_PATH='' # old model
    export DATA_PATH='' # incremental data bin
    export USER_PATH='mkt/' # user dir
    export ARCH=transformer_adapter
    export TASK=translation_multi_adapter
    export ADAPTER_MODULE=serial # serial / parallel / parallel_gate / parallel_norm / parallel_dp
    export KRAIL_MODULE='' # krail / krail_ffn / krail_map / krail_map_last / krail_map_avg / krail_adaptive / krail_adaptive_final
    export LANG_DICT='misc/cl_adapter/lang_dicts_uk.txt' # new lang dict
    export KRAIL_MODEL='mPLM/418M_last_checkpoint.pt' # external model path
    export NEW_DICT_PATH='misc/cl_adapter/model_dict.m2m100.vex.txt' # combination model dict, we can get this vocab by scripts
    export KRAIL_DICT_PATH='misc/m2m100/model_dict.128k.txt' # external model dict
    export lang_pairs=en-uk
    export CLIP_NORM=0.0
    export OPTIMIZER=adam
    export ADAM_EPS=1e-9
    export LR=5e-4
    export LR_SCHEDULER=inverse_sqrt
    export WARMUP=4000
    export DROPOUT=0.2
    export ATT_DROPOUT=0.2
    export WEIGHT_DECAY=0.0001
    export CRITERION=label_smoothed_cross_entropy
    export LABEL_SMOOTHING=0.1
    export MAX_TOKENS=1024
    export UPDATE_FREQ=8
    export SAVE_INTERVAL_UPDATES=2500
    export KEEP_INTERVAL_UPDATES=1
    export MAX_UPDATE=500000
    
    fairseq-train $DATA_PATH \
    --user-dir $USER_PATH \
    --share-all-embeddings \
    --encoder-normalize-before --decoder-normalize-before \
    --encoder-embed-dim 1024 --encoder-ffn-embed-dim 8192 --encoder-attention-heads 16 \
    --decoder-embed-dim 1024 --decoder-ffn-embed-dim 8192 --decoder-attention-heads 16 \
    --encoder-layers 24 --decoder-layers 24 \
    --left-pad-source False --left-pad-target False \
    --arch $ARCH \
    --task $TASK \
    --sampling-method temperature \
    --sampling-temperature 5 \
    --lang-tok-style multilingual \
    --lang-dict $LANG_DICT \
    --lang-pairs $lang_pairs \
    --encoder-langtok src \
    --decoder-langtok \
    --clip-norm $CLIP_NORM \
    --optimizer $OPTIMIZER \
    --adam-betas '(0.9, 0.98)' \
    --adam-eps $ADAM_EPS \
    --lr $LR \
    --lr-scheduler $LR_SCHEDULER \
    --warmup-updates $WARMUP \
    --dropout $DROPOUT \
    --attention-dropout $ATT_DROPOUT \
    --weight-decay $WEIGHT_DECAY \
    --criterion $CRITERION \
    --label-smoothing $LABEL_SMOOTHING \
    --max-tokens $MAX_TOKENS \
    --save-dir $CHECKPOINT_PATH/checkpoints/ \
    --update-freq $UPDATE_FREQ \
    --save-interval-updates $SAVE_INTERVAL_UPDATES \
    --keep-interval-updates $KEEP_INTERVAL_UPDATES \
    --max-update $MAX_UPDATE \
    --no-epoch-checkpoints \
    --seed $SEED --log-format simple --log-interval 300 \
    --patience 10 \
    --tensorboard-logdir $CHECKPOINT_PATH/logs/ \
    --no-progress-bar \
    --ddp-backend no_c10d \
    --finetune-from-model $CL_CHECKPOINT_PATH \
    --freeze-all \
    --adapter-inner-dims 4096 \
    --adapter-module $ADAPTER_MODULE \
    --krail-module $KRAIL_MODULE \
    --vocab-adapter \
    --base-embed-dim 32211 \
    --krail-model $KRAIL_MODEL \
    --new-dict-path $NEW_DICT_PATH \
    --krail-dict-path $KRAIL_DICT_PATH

Inference & Evaluation

Please refer to src/inference.sh & run_sh/evaluate.sh


Citation

@inproceedings{huang-etal-2023-knowledge,
    title = "Knowledge Transfer in Incremental Learning for Multilingual Neural Machine Translation",
    author = "Huang, Kaiyu  and
      Li, Peng  and
      Ma, Jin  and
      Yao, Ting  and
      Liu, Yang",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.852",
    doi = "10.18653/v1/2023.acl-long.852",
    pages = "15286--15304",
}

ktnmt's People

Contributors

radoth avatar

Stargazers

 avatar Liku Amare avatar  avatar Kaiyu Huang avatar  avatar

Watchers

Jiacheng Zhang avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.