GithubHelp home page GithubHelp logo

mt-distilation's Introduction

Exploring Indirect Knowledge Transfer in Multilingual Machine Translation Through Targeted Distillation

Course project of LDA-T313 Approaches to Natural Language Understanding

Zihao Li & Chao Wang

Overview

This repository contains the implementation of the project "Exploring Indirect Knowledge Transfer in Multilingual Machine Translation Through Targeted Distillation". The project aims to investigate the efficiency of cross-linguistic knowledge transfer in multilingual Neural Machine Translation (NMT) using knowledge distillation techniques.

Objectives

The study focuses on two main objectives:

  1. Cross-Linguistic Knowledge Transfer: Evaluate how effectively student models trained on one language perform in translating other related languages within the same language family.
  2. Correlation of Language Similarity with Transfer Effectiveness: Investigate whether the effectiveness of cross-linguistic knowledge transfer correlates with the degree of linguistic similarity among languages.

Methodology

Teacher Models

We utilize two pre-trained multilingual NMT models from the Helsinki-NLP OPUS-MT project:

  • opus-mt-tc-big-gmq-en: Translates from Danish, Norwegian, Swedish to English.
  • opus-mt-tc-big-zle-en: Translates from Belarusian, Russian, Ukrainian to English.

Training Datasets

Our training datasets are derived from the NLLB corpus, filtered for quality using Opusfilter. Each dataset contains 5 million parallel sentences for each language pair involving English.

Distillation Process

The distillation process uses the outputs of pre-trained teacher models as the target translations for training smaller student models.

train.sh

Model Configurations

Parameter Teacher Student
Embedding dimension 1024 256
Attention heads 16 8
Feed forward network dimension 4096 2048
Hidden layers 6 3

Evaluation Metrics

The models are evaluated using BLEU and COMET metrics to measure translation accuracy and fluency.

Testing Datasets

We used the Tatoeba Translation Challenge and FLORES-200 datasets for evaluating the student models.

Key Findings

  1. General Performance: Student models show reduced performance when translating languages they were not directly trained on, with a pronounced decline in the East Slavic languages.
  2. Lexical Similarity Impact: Models trained on languages with closer lexical ties to the target language demonstrated enhanced translation accuracy, particularly evident in the North Germanic languages.

mt-distilation's People

Watchers

 avatar

Forkers

chaowang0524

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.