GithubHelp home page GithubHelp logo

aloriosa / gen_work_mem Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 37 KB

Generative working memory in Transformer decoder

Home Page: https://arxiv.org/abs/2406.14213

Python 100.00%
transformer translation working-memory

gen_work_mem's Introduction

Generative working memory in Transformer decoder

This repo contains the implementation of the method for augmenting the Transformer with working memory in decoder. The method was firstly presented in Extending Transformer Decoder with Working Memory for Sequence to Sequence Tasks and further discussed and analyzed in Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task.

Running the training and evaluation

  1. Install the required packages
pip install -r requirements.txt
  1. See the run.py script for initial training of the model on TED dataset followed by the fine-tuning on Open Subtitles dataset.
python run.py

Citation

@article{SAGIROVA202216,
title = {Complexity of symbolic representation in working memory of Transformer correlates with the complexity of a task},
journal = {Cognitive Systems Research},
volume = {75},
pages = {16-24},
year = {2022},
issn = {1389-0417},
doi = {https://doi.org/10.1016/j.cogsys.2022.05.002},
url = {https://www.sciencedirect.com/science/article/pii/S1389041722000274},
author = {Alsu Sagirova and Mikhail Burtsev},
keywords = {Neuro-symbolic representation, Transformer, Working memory, Machine translation},
abstract = {Even though Transformers are extensively used for Natural Language Processing tasks, especially for machine translation, they lack an explicit memory to store key concepts of processed texts. This paper explores the properties of the content of symbolic working memory added to the Transformer model decoder. Such working memory enhances the quality of model predictions in machine translation task and works as a neural-symbolic representation of information that is important for the model to make correct translations. The study of memory content revealed that translated text keywords are stored in the working memory, pointing to the relevance of memory content to the processed text. Also, the diversity of tokens and parts of speech stored in memory correlates with the complexity of the corpora for machine translation task.}
}

gen_work_mem's People

Contributors

aloriosa avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.