GithubHelp home page GithubHelp logo

i-machine-think / compositionality_paradox_mt Goto Github PK

View Code? Open in Web Editor NEW
8.0 3.0 1.0 19.18 MB

Codebase for analysing compositional generalisation in NMT models, which allows you to run systematicity, productivity, substitutivity and idiom processing analyses.

NewLisp 94.73% Python 0.14% Jupyter Notebook 5.01% Shell 0.12%
compositionality interpretability

compositionality_paradox_mt's Introduction

The paradox of the compositionality of natural language: a neural machine translation case study

This repository contains the data and evaluation scripts for the following paper:

@inproceedings{dankers2022paradox,
  title={The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study},
  author={Dankers, Verna and Bruni, Elia and Hupkes, Dieuwke},
  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={4154--4175},
  year={2022}
}

In this paper, we present tests to evaluate compositionality "in the wild". As a case study, we consider the compositional behaviour of English-Dutch NMT models. This repository contains five folders, with their own READMEs:

  • data: this folder contains the generated src files for the data types of synthetic and semi-natural that form the backbone of our tests, along with the vocabulary used to generate the synthetic data templates.
  • overgeneralisation: this folder contains synthethic, semi-natural and natural data with idioms, used to assess if a model overgeneralises or provides an idiomatic translation.
  • substitutivity: this folder contains English data with synonym replacements for the synthetic, semi-natural and natural data templates.
  • systematicity: this folder contains synthetic and semi-natural data used to assess systematicity of recombinations of noun and verb phrases (s_np_vp) and sentences with a conjunction (s_conj). For the latter setup, there is natural data too.
  • scripts: This folder contains various scripts that facilitate using the data.

Usage

  • If you are curious about our synthetic, semi-natural and natural data sources (see Section 3 of the paper), go to the data folder.
  • If you are looking to run a specific test, go to the folder of the experiment of interest:
    1. For Section 4.1 / systematicity, go to systematicity.
    2. For Section 4.2 / substitutivity, go to substitutivity.
    3. For Section 4.2 / global compositionality, go to overgeneralisation.
  • If you would like to use our evaluation or visualisation scripts, go to scripts.

Note that the models referred to as small, medium and full in the paper are referred to as tiny, small, all in the repository.

compositionality_paradox_mt's People

Contributors

vernadankers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

jlrussin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.