GithubHelp home page GithubHelp logo

vishalbelsare / delight Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sacmehta/delight

0.0 0.0 0.0 977 KB

DeLighT: Very Deep and Light-Weight Transformers

License: MIT License

Python 96.88% C++ 0.84% Cuda 1.93% Shell 0.12% Lua 0.22%

delight's Introduction

DeLighT: Very Deep and Light-weight Transformers

This repository contains the source code of our work on building efficient sequence models: DeFINE (ICLR'20) and DeLighT (preprint).

Table of contents

  1. Overview
  2. Requirements and installation
  3. Training, evaluation, and results
  4. Multiplication-addition operations
  5. Citation
  6. Acknowledgement
  7. Issues

Overview

In this repository, we share the source code of our paper DeLight, that delivers similar or better performance than transformer-based models with significantly fewer parameters. DeLighT more efficiently allocates parameters both (1) within each Transformer block using DExTra, a deep and light-weight transformation and (2) across blocks using block-wise scaling, that allows for shallower and narrower DeLighT blocks near the input and wider and deeper DeLighT blocks near the output. Overall, DeLighT networks are 2.5 to 4 times deeper than standard transformer models and yet have fewer parameters and operations. For details, see our papers: DeFINE and and DeLighT.

DeLighT unit

Requirements and Installation

  • PyTorch version >= 1.4.0
  • Python version >= 3.6
  • For training new models, you'll also need an NVIDIA GPU and NCCL
  • To use DeLighT, you need to install fairseq and develop locally:
git clone https://github.com/sacmehta/delight
cd delight
pip install --editable ./
  • For faster training install NVIDIA's apex library:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

Training, Evaluation, and Results

For training, evaluation, and results, see below links. To ease reproduction of our results, we also provide links to training logs.

Neural machine translation

Language Modeling

Multiplication-Addition Operations

We have added module profiling for both Transformer and DeLight networks. This can be enabled using --print-stats argument. A model summary will be printed (by default for 20 tokens), similar to below screenshot. To use larger sequence lengths for source and target for profiling statistics, you can use --src-len-ps and --tgt-len-ps flags.

Model statistics

Citation

If you find our work useful, please consider citing following works:

@misc{mehta2020delight,
    title={DeLighT: Very Deep and Light-weight Transformer},
    author={Sachin Mehta and Marjan Ghazvininejad and Srinivasan Iyer and Luke Zettlemoyer and Hannaneh Hajishirzi},
    year={2020},
    eprint={2008.00623},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
@inproceedings{mehta2019define,
  title={DeFINE: Deep Factorized Input Token Embeddings for Neural Sequence Modeling},
  author={Mehta, Sachin and Koncel-Kedziorski, Rik and Rastegari, Mohammad and Hajishirzi, Hannaneh},
  booktitle={International Conference on Learning Representations},
  year={2019}
}

Acknowledgements

We would like to thank Fairseq team for building easy-to-use sequence library.

Issues

Thanks for your interest in our work. For any issues, please raise a request.

delight's People

Contributors

sacmehta avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.