GithubHelp home page GithubHelp logo

raburabu91 / mls Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pkunlp-icler/mls

0.0 0.0 0.0 114.15 MB

Source code of our paper "Focus on the Target’s Vocabulary: Masked Label Smoothing for Machine Translation" @ ACL 2022

C++ 0.53% Shell 3.98% Python 94.23% Makefile 0.01% Cython 0.29% Batchfile 0.02% Cuda 0.84% Lua 0.09%

mls's Introduction

Focus on the Target’s Vocabulary: Masked Label Smoothing for Machine Translation

For latest update, please move to this repo.

Hi, this is the source code of our paper "Focus on the Target’s Vocabulary: Masked Label Smoothing for Machine Translation" accepted by ACL 2022. You can find the paper in the root directory (uploading to arxiv soon).

Introduction

Label smoothing and vocabulary sharing are two widely used techniques in neural machine translation models. However, we argue that simply applying both techniques can be conflicting and even leads to sub-optimal performance. When allocating smoothed probability, original label smoothing treats the source-side words that would never appear in the target language equally to the real target-side words, which could bias the translation model. To address this issue, we propose Masked Label Smoothing (MLS), a new mechanism that masks the soft label probability of source-side words to zero. Simple yet effective, MLS manages to better integrate label smoothing with vocabulary sharing.


Venn graph showing the token distribution between lanugages.

Illustration of Masked Label Smoothing (bottom right) and Weighted Label Smoothing (upper right & bottom left)

Preparations

git clone [email protected]:chenllliang/MLS.git
cd MLS

conda create -n MLS python=3.7
conda activate MLS

cd fairseq # We place the MLS criterions inside fairseq's criterion sub-folder, you can find them there.
pip install --editable ./
pip install sacremoses

# Make sure you have the right version of pytorch and CUDA, we use torch 1.9.0+cu111

We adopt mosesdecoder for tokenization , subword-nmt for BPE and fairseq for experiment pipelines. You need to clone the first two repos into ./Tools before next step.

Preprocess

We have prepared a pre-processed binary data of IWSLT14 DE-EN in the ../databin folder (unzip it and put the two unzipped folders under ../databin/, you can jump to next section then) .

If you plan to try your own dataset. You may refer to this script for preprocessing and parameter setting.

cd script
bash preprocess.sh ../data/dataset-src-tgt/ src tgt

if it works succeefully, two folders containing binary files will be saved in the databin folder.

Train with MLS and original LS

cd scripts
bash train_LS.sh # end up in 20 epoches with valid_best_bleu = 36.91

bash train_MLS.sh # end up in 20 epoches with valid_best_bleu = 37.16

The best valid checkpoint will be saved in checkpoints folder for testing.

Get result on Test Set

cd scripts

bash generate.sh ../databin/iwslt14-de-en-joined-new ../checkpoints/de-en-LS-0.1 ../Output/de-en-ls-0.1.out # get BLEU4 = 35.20


bash generate.sh ../databin/iwslt14-de-en-joined-new ../checkpoints/de-en-MLS-0.1 ../Output/de-en-mls-0.1.out # get BLEU4 = 35.76

We have uploaded the generated texts in the Output folder, which you can also refer to.

Some Results on single GPU

BLEU IWSLT14 DE-EN WMT16 RO-EN
LS dev: 36.91 test: 35.20 dev: 22.38 test: 22.54
MLS(Ours) dev: 37.16 test: 35.76 dev: 22.72 test: 22.89

Using Weighted Label Smoothing

You can change the lp_beta,lp_gamma,lp_eps in train_WLS.sh to control the weights distribution.

cd scripts

bash train_WLS.sh  # you should change the path to the source,target and joined vocabulary individually

The test procedure follows previous section.

Citation

If you feel our work helpful, please kindly cite

@inproceedings{chen2022focus,
   title={Focus on the Target’s Vocabulary: Masked Label Smoothing for Machine Translation},
   author={Chen, Liang and Xu, Runxin and Chang, Baobao},
   booktitle={The 60th Annual Meeting of the Association for Computational Linguistics},
   year={2022}
}

mls's People

Contributors

chenllliang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.