GithubHelp home page GithubHelp logo

universal-transformer-pytorch's Introduction

Universal-Transformer-Pytorch

Simple and self-contained implementation of the Universal Transformer (Dehghani, 2018) in Pytorch. Please open issues if you find bugs, and send pull request if you want to contribuite.

GIF taken from: https://twitter.com/OriolVinyalsML/status/1017523208059260929

Universal Transformer

The basic Transformer model has been taken from https://github.com/kolloldas/torchnlp. For now it has been implemented:

  • Universal Transformer Encoder Decoder, with position and time embeddings.
  • Adaptive Computation Time (Graves, 2016) as describe in Universal Transformer paper.
  • Universal Transformer for bAbI data.

Dependendency

python3
pytorch 0.4
torchtext
argparse

How to run

To run standard Universal Transformer on bAbI run:

python main.py --task 1

To run Adaptive Computation Time:

python main.py --task 1 --act

Results

10k over 10 run, get the maximum.

In task 16 17 18 19 I notice that are very hard to converge also in training set. The problem seams to be the lr rate scheduling. Moreover, on 1K setting the results are very bad yet, maybe I have to tune some hyper-parameters.

Task Uni-Trs + ACT Original
1 0.0 0.0 0.0
2 0.0 0.2 0.0
3 0.8 2.4 0.4
4 0.0 0.0 0.0
5 0.4 0.1 0.0
6 0.0 0.0 0.0
7 0.4 0.0 0.0
8 0.2 0.1 0.0
9 0.0 0.0 0.0
10 0.0 0.0 0.0
11 0.0 0.0 0.0
12 0.0 0.0 0.0
13 0.0 0.0 0.0
14 0.0 0.0 0.0
15 0.0 0.0 0.0
16 50.5 50.6 0.4
17 13.7 14.1 0.6
18 4 6.9 0.0
19 79.2 65.2 2.8
20 0.0 0.0 0.0
--- --- --- ---
avg 7.46 6.98 0.21
fail 3 3 0

TODO

  • Visualize ACT on different tasks

universal-transformer-pytorch's People

Contributors

andreamad8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

universal-transformer-pytorch's Issues

Unable to reproduce results (tested on Task 1 & 2)

Hi,

I ran the experiments on the 10K setting, but my results are way worse than the reported ones.
I didn't change any of the default parameters except from setting the tenK param in main.py, line 64 to True. Then I ran python main.py --act --verbose --cuda.

There are no errors and the results from 10 runs are:
Task 1
Noam False ACT True Task: 1 Max: 0.492 Mean: 0.42350000000000004 Std: 0.0808062497582953
Task 2
Noam False ACT True Task: 2 Max: 0.323 Mean: 0.26880000000000004 Std: 0.04480580319556831

I have not tried the other tasks (but at least task 3 seems to be the same) as something seems to be going wrong generally. The results are equal in a non-cuda setup and worse without act enabled.

I'm running with the following versions:
python 3.6.8
pytorch 0.4.0 (also tried 0.4.1 and 1.0.0)
torchtext 0.3.1
argparse 1.4.0

Thanks for your help!

"task" argument has no effect

Hi,
currently the --task argument is being ignored, due to line 153ff in main.py, so the script always runs all bAbi tasks in a row.

Execution without cuda throws error

Hi,
when running the script on a machine without cuda support, I'm getting the following error:

File ".../Universal-Transformer-Pytorch/models/UTransformer.py", line 236, in forward
halting_probability = torch.zeros(inputs.shape[0],inputs.shape[1]).cuda()
RuntimeError: torch.cuda.FloatTensor is not enabled.

I suppose the lines 236 - 242 in UTransformer.py require an additional cuda-check.

if-statement on projecting embedding to hidden size

I found that in models/UTransformer.py:110&194, you have the following codes:

self.proj_flag = False
if(embedding_size == hidden_size):
    self.embedding_proj = nn.Linear(embedding_size, hidden_size, bias=False)
    self.proj_flag = True

I'm confused that you project embedding to hidden_size when embedding_size==hidden_size, but what if embedding_size!=hidden_size? Doing nothing? Wouldn't it leads to size mismatch?

Question about PE

Hi, I did notice you implement the function to calculate the position embedding. However, I found nowhere it was used. Can you please help me understand how you incorporate the position information into the model?

Questions about the result

Very glad to find this codes. I'm a little confused the output result. Is the output acc the accuracy or the error rate? According to the code, it seems to be accuracy. But if so, the accuracy is too small as I got the output acc as almost 0. Could you please help me to address the problem?

ReLU in PositionwiseFeedForward

Here i is the index of self.layers, therefore it is always less than the length of self.layers.

Probably you mean

if i < len(self.layers) - 1

Then no ReLU and Dropout after the last positionwise layer.

Manifesting interest to this work

Hi, i found this implementation very interesting.
I would like to understand more about Universal Transformer since i think this could allow much smaller LLMs with higher performance.

p.s. i am italian too

Can i ask you something about?

probability exceed threshold at step 2 from second epoch onwards

hi,

when I run the model, I realize at first epoch it can reach max step 24, but start from second or third epoch, the probability by "p = self.sigma(self.p(state)).squeeze(-1)" become very near to threshold and it will exceed at step 2. So my encoder layer become only has 2 layer. Any idea why?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.