GithubHelp home page GithubHelp logo

attention's Introduction

Pytorch implementation of various attention-based language models

Purpose

Implements attention based language models in a unified structure, assuring code accuracy. Any PRs are warmly welcomed!

How to run

Set up env

pip3 install -r requirements.txt

Prepare dataset

First download related spacy language model

python -m spacy download en
python -m spacy download de

Then, download related datasets of each attention model. Currently, transformer, bert are supported.

sh download.sh ${model}

Run language model

You can run implemented language models using attention/run_attention_model.py file. Example of training transformer model is given below.

Detailed parameters of each models are given in next section.

python3 transformer/run_attention_model.py \
	--language_model transformer \
	--data_pkl .data/multi30k/m30k_deen_shr.pkl \
	--d_model 512 \
	--d_word_vec 512 \
	--d_inner_hid 2048 \
	--d_k 64 \
	--d_v 64 \
	--n_head 8 \
	--n_layers 6 \
	--batch_size 256 \
	--embs_share_weight \
	--proj_share_weight \
	--label_smoothing \
	--output_dir output \
	--no_cuda \
	--n_warmup_steps 128000 \
	--epoch 400

Currently, transformer, bert models are supported. To see the related arguments of each model, see below.

List of implemented models and related parameters

Models Referred content Run example
Transformer link link
Bert link link

Transformer

Should set --language_model parameter as transformer.

python3 attention/run_attention_model.py \
	--language_model transformer \
	--data_pkl .data/multi30k/m30k_deen_shr.pkl \
	--d_model 512 \
	--d_word_vec 512 \
	--d_inner_hid 2048 \
	--d_k 64 \
	--d_v 64 \
	--n_head 8 \
	--n_layers 6 \
	--batch_size 256 \
	--embs_share_weight \
	--proj_share_weight \
	--label_smoothing \
	--output_dir output \
	--no_cuda \
	--n_warmup_steps 128000 \
	--epoch 400
Parameter explanations
Parameter name Explanation
language_model Name of transformer
data_pkl Directory of preprocessed pickle file
d_model Projection dimension of q,k,v
d_word_vec Dimension of word vectors
d_inner_hid Inner dimension when feedforward network is done
d_k d_model=d_k * n_head
d_v d_model=d_v * n_head
n_head Number of multi-head attention
batch_size Size of a batch
embs_share_weight Whether sharing embedding weight between source, target vocab or not.
If set, both of vocab sizes will become same
proj_share_weight Whether sharing embedding weight between target vocab and final projection vocab embedding
label_smoothing Whether smoothing when calculating cross entropy
output_dir Directory of model history, check point
no_cuda Whether use cuda or not
n_warmup_steps Warmup steps before training
epoch number of epochs

Bert

Should set --language_model parameter as Bert.

python3 attention/run_attention_model.py \
    --language_model bert \
    --d_model 512 \
    --d_word_vec 512 \
    --d_inner_hid 2048 \
    --d_k 64 \
    --d_v 64 \
    --n_head 8 \
    --n_layers 6 \
    --batch_size 256 \
    --output_dir output \
    --no_cuda \
    --epoch 400 \
    --movie_conversations ./data/movie_conversations.txt \
    --movie_lines ./data/movie_lines.txt \
    --raw_text ./data \
    --output ./data
Parameter explanations
Parameter name Explanation
language_model Name of bert
d_model Projection dimension of q,k,v
d_word_vec Dimension of word vectors
d_inner_hid Inner dimension when feedforward network is done
d_k d_model=d_k * n_head
d_v d_model=d_v * n_head
n_head Number of multi-head attention
batch_size Size of a batch
no_cuda Whether use cuda or not
n_warmup_steps Warmup steps before training
epoch number of epochs
movie_conversations Directory where movie_conversations.txt exists
movie_lines Directory where movie_lines.txt exists
raw_text Directory where preprocessed text will be saved
output Directory where results will be saved

How to run pytest

pytest

attention's People

Contributors

bohyunshin avatar

Watchers

Kostas Georgiou avatar  avatar

attention's Issues

Implementation of continuous integration

Summary of feature

When opening pr, continuous integration checking success of pytest should be implemented

Work list

  • Investigation of github action
  • Docker image to use

Bert implementation

Summary of feature

  • Implement bert using same code structure as transformer

Work list

  • Summarize bert paper
  • Analyze bert pytorch code
  • Write down bert code in our repository's template

Inference method from pretrained models

Summary of feature

Inference method using pretrained models should be implemented

Work list

  • Inference method of transformer model
  • Inference method of bert model

GPT implementation

Summary of feature

  • Implement minGPT using kaparthy's code

Work list

  • Summarize gpt paper
  • Analyze gpt pytorch code
  • Write down code in the same template of our repository

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.