GithubHelp home page GithubHelp logo

keon / seq2seq Goto Github PK

View Code? Open in Web Editor NEW
683.0 15.0 172.0 32 KB

Minimal Seq2Seq model with Attention for Neural Machine Translation in PyTorch

License: MIT License

Python 100.00%
seq2seq deep-learning machine-translation

seq2seq's Introduction

mini seq2seq

Minimal Seq2Seq model with attention for neural machine translation in PyTorch.

This implementation focuses on the following features:

  • Modular structure to be used in other projects
  • Minimal code for readability
  • Full utilization of batches and GPU.

This implementation relies on torchtext to minimize dataset management and preprocessing parts.

Model description

Requirements

  • GPU & CUDA
  • Python3
  • PyTorch
  • torchtext
  • Spacy
  • numpy
  • Visdom (optional)

download tokenizers by doing so:

python -m spacy download de
python -m spacy download en

References

Based on the following implementations

seq2seq's People

Contributors

amitmy avatar keon avatar linao1996 avatar pskrunner14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

seq2seq's Issues

About overfitting

I tried the code yesterday, after 100 epochs, the training error almost went down to zero, yet the test error is 7.23, rendering the model almost useless.
Early stop won't help, since val error never went below 4.
Any advice?

torchtext Multi30k

when using the following method to create data
train, val, test = Multi30k.splits(exts=('.de', '.en'), fields=(DE, EN))
I got the following error message


//anaconda/lib/python3.5/site-packages/torchtext/datasets/translation.py in init(self, path, exts, fields, **kwargs)
31
32 examples = []
---> 33 with open(src_path) as src_file, open(trg_path) as trg_file:
34 for src_line, trg_line in zip(src_file, trg_file):
35 src_line, trg_line = src_line.strip(), trg_line.strip()

FileNotFoundError: [Errno 2] No such file or directory: '.data/val.de'

Do you have any idea on it?
Thank you in advance

A problem with loss computation.

loss = F.nll_loss(output[1:].view(-1, vocab_size), trg[1:].contiguous().view(-1), ignore_index=pad)

The loss computed by the above line is the average at every time step, which can cause it difficult to train the model.
So I suggest accumulating the loss at every time step. In my experiments, this makes it easier to train the model.

about the way to calculate attention weight

It seems that the way to calculate attention weight is different from origin paper: softmax(v* tanh(W*[s,h])), relu are used after softmax here, can you give some reasons or reference?

` def forward(self, hidden, encoder_outputs):
timestep = encoder_outputs.size(0)
h = hidden.repeat(timestep, 1, 1).transpose(0, 1)
encoder_outputs = encoder_outputs.transpose(0, 1) # [BTH]
attn_energies = self.score(h, encoder_outputs)
return F.relu(attn_energies).unsqueeze(1)

def score(self, hidden, encoder_outputs):
    # [B*T*2H]->[B*T*H]
    energy = F.softmax(self.attn(torch.cat([hidden, encoder_outputs], 2)), dim=2)
    energy = energy.transpose(1, 2)  # [B*H*T]
    v = self.v.repeat(encoder_outputs.size(0), 1).unsqueeze(1)  # [B*1*H]
    energy = torch.bmm(v, energy)  # [B*1*T]
    return energy.squeeze(1)  # [B*T]`

About the usage of initial hidden state in calculating attention

Hi, it is a super good implementation for seq2seq in pytorch, but I have a doubt in the following line:

hidden = hidden[:self.decoder.n_layers]

We can see from the code here, the hidden state (which is to calculate the attention) is the final hidden state from encoder, while in theory it should be the hidden state from decoder at current time step. Do you agree? Thx:-)

Why using relu to compute additaive attention

1、Attention's formula

  • In Normal Additive version, the attention score as follow:
score = v * tanh(W * [hidden; encoder_outputs])
  • In your code
score = v * relu(W * [hidden; encoder_outputs])

2、question

Is there some trick here? or this is a result after experimental comparision.

Model still uses teacher forcing when evaluating

Hi, I think there might be a bug in evaluate function from train.py; when the trained model is evaluated using the evaluate function, the 'model' still uses teacher forcing to evaluate the trained model. Therefore, there might be 50% probability to use true labels as an input for the hidden state at the next timestamp. I think this operation might be unsuitable. Because I think we always need to use the predicted labels as an input for the next hidden state to evaluate.

A question about the nn.Embedding

Thank you for sharing this project code, and I have a question for nn.Embedding.

In this project, the shape of src and trg is (maxLen, batch size). The forward of Encoder is:

    def forward(self, src, hidden=None):
        embedded = self.embed(src)
        outputs, hidden = self.gru(embedded, hidden)
        # sum bidirectional outputs
        outputs = (outputs[:, :, :self.hidden_size] +
                   outputs[:, :, self.hidden_size:])
        return outputs, hidden

When I debug it, the shape of src is (37, 32), in which 32 is the batch size.
However, when I read the explanation of nn.Embedding, the example code shows:

>>> # a batch of 2 samples of 4 indices each
>>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
>>> embedding(input)

Thus, the input of Embedding should be (batch size, maxLen).

This problem make me very confuzed.

Any suggestion is apprciated!

A bug of the Loss function in the def 'train' and 'evaluate'

Hi, I found the loss function in the def 'train' and 'evaluate' is cross-entropy. But in the model.py, the output from the decoder is operated by a log-softmax function. According to the definition of cross_entropy, the log_softmax operation has been included in the cross_entropy. I think the loss function in 'train' and 'evaluate' might be nll-loss. Then the entire loss function containing final operation of the decoder and nll_loss is the cross_entropy.

[EROOR] Not Work Relu

The following error occurs when executing "train.py".

TypeError: relu() got an unexpected keyword argument 'dim'

I checked the official documentation of pyTorch and it was not necessary to specify "dim" in any version.

random seed

Maybe the demo don`t fix the random seed, lead to log is different when run every time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.