GithubHelp home page GithubHelp logo

leviswind / pytorch-transformer Goto Github PK

View Code? Open in Web Editor NEW
236.0 3.0 57.0 254 KB

pytorch implementation of Attention is all you need

Python 100.00%
pytorch attention-is-all-you-need translation transformer

pytorch-transformer's Introduction

A Pytorch Implementation of the Transformer: Attention Is All You Need

Our implementation is largely based on Tensorflow implementation

Requirements

Why This Project?

I'm a freshman of pytorch. So I tried to implement some projects by pytorch. Recently, I read the paper Attention is all you need and impressed by the idea. So that's it. I got similar result compared with the original tensorflow implementation.

Differences with the original paper

I don't intend to replicate the paper exactly. Rather, I aim to implement the main ideas in the paper and verify them in a SIMPLE and QUICK way. In this respect, some parts in my code are different than those in the paper. Among them are

  • I used the IWSLT 2016 de-en dataset, not the wmt dataset because the former is much smaller, and requires no special preprocessing.
  • I constructed vocabulary with words, not subwords for simplicity. Of course, you can try bpe or word-piece if you want.
  • I parameterized positional encoding. The paper used some sinusoidal formula, but Noam, one of the authors, says they both work. See the discussion in reddit
  • The paper adjusted the learning rate to global steps. I fixed the learning to a small number, 0.0001 simply because training was reasonably fast enough with the small dataset (Only a couple of hours on a single GTX 1060!!).

File description

  • hyperparams.py includes all hyper parameters that are needed.
  • prepro.py creates vocabulary files for the source and the target.
  • data_load.py contains functions regarding loading and batching data.
  • modules.py has all building blocks for encoder/decoder networks.
  • train.py has the model.
  • eval.py is for evaluation.

Training

wget -qO- https://wit3.fbk.eu/archive/2016-01//texts/de/en/de-en.tgz | tar xz; mv de-en corpora
  • STEP 2. Adjust hyper parameters in hyperparams.py if necessary.
  • STEP 3. Run prepro.py to generate vocabulary files to the preprocessed folder.
  • STEP 4. Run train.py or download pretrained weights, put it into folder './models/' and change the eval_epoch in hpyerparams.py to 18
  • STEP 5. Show loss and accuracy in tensorboard
tensorboard --logdir runs

Evaluation

  • Run eval.py.

Results

I got a BLEU score of 16.7.(tensorflow implementation 17.14) (Recollect I trained with a small dataset, limited vocabulary) Some of the evaluation results are as follows. Details are available in the results folder.

source: Ich bin nicht sicher was ich antworten soll
expected: I'm not really sure about the answer
got: I'm not sure what I'm going to answer

source: Was macht den Unterschied aus
expected: What makes his story different
got: What makes a difference

source: Vielen Dank
expected: Thank you
got: Thank you

source: Das ist ein Baum
expected: This is a tree
got: So this is a tree

pytorch-transformer's People

Contributors

leviswind avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pytorch-transformer's Issues

A little difference between tf version and your code in layer_normalization

hi, thanks for your implementation~
When I look up your code and compare with the tf-version by Kyubyong, I fond that the implementation of layer normalization in your repository is slightly differing with Kyubyong's.

followings are you code and K's code, and he added **0.5 in the denominator:

https://github.com/leviswind/pytorch-transformer/blob/master/modules.py#L69

https://github.com/Kyubyong/transformer/blob/master/modules.py#L36

there are many errors in this implementation, it's really difficult to run it.

1st, backend.Embedding is removed in pytorch.
2nd, the tensors in this repo are not in the same device type, this really messed up the code, I tried to modify this device the tensors in the code stored on, but the more I modify, the more errors it raised, finally I have no choice but giving up read and deploy this code. I suggest you try to define you code purely on cpu, not adding many codes in this repo to clarify than you run some operations on cuda, this really mess this repo up if you didn't make it clear that all the operation is compatible with the storing device type of all the variables.

How to add some functionalities to this code?

Hi. Please i would like to add some features to this code that i have read on beam search. Like coverage penalty and length normalization. But i don't know where to start. Can you help please?

Error whilte running train.py

Traceback (most recent call last):
File "train.py", line 90, in
train()
File "train.py", line 81, in train
writer.export_scalars_to_json(hp.model_dir + '/all_scalars.json')
AttributeError: 'SummaryWriter' object has no attribute 'export_scalars_to_json'

How can i correct it, please?

A little misktake in modules.py

in if __name__ == '__main__':
outputs = position_encoding(num_units)(inputs)
should be
outputs = positional_encoding(num_units)(inputs)

Question of the parameter 'sinusoid'

I want to know why using the parameter 'sinusoid'? And could I not set the parameter ‘maxlen’ and using varying length of the text between batch?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.