GithubHelp home page GithubHelp logo

graykode / nlp-tutorial Goto Github PK

View Code? Open in Web Editor NEW
13.7K 290.0 3.9K 361 KB

Natural Language Processing Tutorial for Deep Learning Researchers

Home Page: https://www.reddit.com/r/MachineLearning/comments/amfinl/project_nlptutoral_repository_who_is_studying/

License: MIT License

Python 38.97% Jupyter Notebook 61.03%
nlp natural-language-processing tutorial pytorch tensorflow transformer attention paper bert

nlp-tutorial's Introduction

nlp-tutorial

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch. Most of the models in NLP were implemented with less than 100 lines of code.(except comments or blank lines)

  • [08-14-2020] Old TensorFlow v1 code is archived in the archive folder. For beginner readability, only pytorch version 1.0 or higher is supported.

Curriculum - (Example Purpose)

1. Basic Embedding Model

2. CNN(Convolutional Neural Network)

3. RNN(Recurrent Neural Network)

4. Attention Mechanism

5. Model based on Transformer

Dependencies

  • Python 3.5+
  • Pytorch 1.0.0+

Author

  • Tae Hwan Jung(Jeff Jung) @graykode
  • Author Email : [email protected]
  • Acknowledgements to mojitok as NLP Research Internship.

nlp-tutorial's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nlp-tutorial's Issues

seq2seq_torch maybe have a small mistake

# output : [max_len+1, batch_size, num_directions(=1) * n_hidden]
    output = output.transpose(0, 1) # [batch_size, max_len+1(=6), num_directions(=1) * n_hidden]

to

# output : [max_len+1, batch_size, n_class]
    output = output.transpose(0, 1) # [batch_size, max_len+1(=6), n_class]

Why is src_len+1 in Transformer demo?

self.pos_emb = nn.Embedding.from_pretrained(get_sinusoid_encoding_table(src_len+1, d_model),freeze=True)

The position encoding table should be (max_len, d_model), why add 1?

The Adam in 5-1.Transformer should be replaced by SGD

Line 202 :
optimizer = optim.Adam(model.parameters(), lr=0.001)

In practice, I think the effect of Adam is quite bad. When epoch = 10, cost is 1.6; when epoch = 100 or 1000, cost is still equal to 1.6.
So I think we can change Adam to SGD, that is, optimizer = optim.SGD(model.parameters(), lr=0.001)

Here are the effects of using SGD:

Epoch: 0100 cost = 0.047965
Epoch: 0200 cost = 0.020129
Epoch: 0300 cost = 0.012563
Epoch: 0400 cost = 0.009101
Epoch: 0500 cost = 0.007131
Epoch: 0600 cost = 0.005862
Epoch: 0700 cost = 0.004978
Epoch: 0800 cost = 0.004325
Epoch: 0900 cost = 0.003823
Epoch: 1000 cost = 0.003426

Seq2Seq(Attention)Input Shape Question

Seq2Seq(Attention)\Seq2Seq(Attention)-Tensor.py

The shape of the input should be [max_time, batch_size,...]. The input = tf. transpose (dec_inputs, [1, 0, 2]) has already been transformed. In tf. expand_dims (inputs [i], 1), the expansion is indeed one dimension. It seems that there should be zero dimension expansion here. Although the final shape is correct, whether it is intentional or not is here. What about a little trick?

In code 4-1.Seq2Seq might have wrong section

At the function translate(in line 90), there's no pre defined object 'args'.
And the function make_batch has no expected args but '[[word, 'P' * len(word)]], args' are given

so, I think the code should be modified.

from

    def translate(word, args):
        input_batch, output_batch, _ = make_batch([[word, 'P' * len(word)]], args)

        # make hidden shape [num_layers * num_directions, batch_size, n_hidden]
        hidden = torch.zeros(1, 1, args.n_hidden)
        output = model(input_batch, hidden, output_batch)
        # output : [max_len+1(=6), batch_size(=1), n_class]

        predict = output.data.max(2, keepdim=True)[1] # select n_class dimension
        decoded = [char_arr[i] for i in predict]
        end = decoded.index('E')
        translated = ''.join(decoded[:end])

        return translated.replace('P', '')

to

# Test
    def translate(word):
        input_batch, output_batch = make_testbatch(word)

        # make hidden shape [num_layers * num_directions, batch_size, n_hidden]
        hidden = torch.zeros(1, 1, n_hidden)
        output = model(input_batch, hidden, output_batch)
        # output : [max_len+1(=6), batch_size(=1), n_class]

        predict = output.data.max(2, keepdim=True)[1] # select n_class dimension
        decoded = [char_arr[i] for i in predict]
        end = decoded.index('E')
        translated = ''.join(decoded[:end])

        return translated.replace('P', '')

and make_testbatch should pre declared

#make test batch
def make_testbatch(input_word):
    input_batch, output_batch = [], []

    input_w = input_word + 'P' * (n_step - len(input_word))
    input = [num_dic[n] for n in input_w]
    
    #make a sequence with just start token(S) and pad tokens(P)
    output = [num_dic[n] for n in 'S' + 'P' * n_step]

    input_batch = np.eye(n_class)[input]
    output_batch = np.eye(n_class)[output]

    return torch.FloatTensor(input_batch).unsqueeze(0), torch.FloatTensor(output_batch).unsqueeze(0)

Thank you

TextCNN-Tensor.py

Hello,

I think there is a problem with this file, it is the same file as TextCNN-Torch.py.

I guess it should be the version with Tensorflow?

Thanks anyway for this repo

Some problems about Bert

line 70: index = randint(0, vocab_size - 1) # random index in vocabulary.
I think the replace index can't involve 'cls' ,'sep' and 'mask'!

Seq2Seq pytorch

Hi
thanks for sharing your codes.

I've had read your seq2seq implementation and I was wondering about the RNN Encode-Decode model.

in the paper, 'Learning Phrase Representations using RNN Encoder–Decoder
for Statistical Machine Translation'

They say

image

proposed gating unit
image

and I couldn't find the new hidden-state activation function in your code.

Do you have any plan to add the proposed activation process?
or is it okay to just skip the parts?

thank you so much in advance

Version 2.0 will be updated

Hello. It's been about 2 years since the repository started, and thank you for your interest.
Most of them are now written in legacy code that is not used in pytorch or tensorflow, so we want to update to a new version.

  • Pytorch higher than 1.0.0

There is no plan to support tensorflow v2 because the python-like pytorch is more readable for beginners.
In addition, the philosophy of pytorch and tensorflow is very different, and good code cannot be produced by trying to implement them similarly.

Therefore, the existing tensorflow v1 related code will be archived in a new folder.

Which kind of model is better for keyword-set classification?

There exists a similar task that is named text classification.

But I want to find a kind of model that the inputs are keyword set. And the keyword set is not from a sentence.

For example:

input ["apple", "pear", "water melon"] --> target class "fruit"
input ["tomato", "potato"] --> target class "vegetable"

Another example:

input ["apple", "Peking", "in summer"]  -->  target class "Chinese fruit"
input ["tomato", "New York", "in winter"]  -->  target class "American vegetable"
input ["apple", "Peking", "in winter"]  -->  target class "Chinese fruit"
input ["tomato", "Peking", "in winter"]  -->  target class "Chinese vegetable"

Thank you.

Transformer/Transformer(Greedy_decoder)-Torch.py on gpu

Hello, I want to put the Transformer (Greedy_decoder)-Torch.py code on the gpu, using model=model.to(device), input_data also to (device), but the error still appears "Expected object of backend CUDA but backend CPU for argument #2 'mat2”

about seq2seq(attention)-Torch multiple sample training question

hello, first thank your code, but i want to know if batch_size is more than 1, i should how to modify the code, thank you

    def get_att_weight(self, output, enc_output):  # get attention weight one 'output' with 'enc_output'
        '''
        output: [1, batch_size, num_directions(=1) * n_hidden]
        enc_output: [n_step+1, batch_size, num_directions(=1) * n_hidden]
        '''
        length = len(enc_output)
        attn_scores = torch.zeros(length)  # attn_scores : [batch_size, n_step+1]
        for i in range(length):
            attn_scores[i] = self.get_att_score(output, enc_output[i])

        # Normalize scores to weights in range 0 to 1
        # return [batch_size, 1, n_step+1]
        return F.softmax(attn_scores).view(batch_size, 1, -1)

    def get_att_score(self, output, enc_output):
        '''
        output: [batch_size, num_directions(=1) * n_hidden]
        enc_output: [batch_size, num_directions(=1) * n_hidden]
        '''
        score = self.attn(enc_output)  # score : [1, n_hidden]
        return torch.dot(output.view(-1), score.view(-1))  # inner product make scalar value, get a real number

TextCNN_Torch have wrong comment

def forward(self, X): embedded_chars = self.W[X] # [batch_size, sequence_length, sequence_length]

I think the shape is [batch_size, sequence_length,embedding_size]

A questions about decoder in seq2seq-torch

input_batch, output_batch, _ = make_batch([[word, 'P' * len(word)]])

Hi,I‘m a nlp rookie.I want to ask you a question.I read the seq2seq's paper,which use t-1 output as the t input in decoder. Your code in this line use 'SPPPPP' as the decoder input.So,is this way harm to the result?
If you see this issues, please answer me in your free time.
Although my english is poor, I still want to express my gratitude to you.
image

A questions about seq2seq-torch.py in the 43th line

Hi, Im an nlp rookie, I want to ask u a question, your code extract input(context) in a fixed window in 43th area, and "word sequence" is a sentences list , some words may extract their neighbour words form different sentences, so, is this way harm to the result?

And my training result seems not very well and I didn't change the codes.
image

If u see this issues, please answer me in your free time.
Although my english is poor, I still want to express my gratitude to u.

A question about seq2seq-torch.py in the 74th line

I think there is a small problem in the 74th line of the "seq2seq-torch.py",the dimension of input_batch and output_batch is not [batch_size, max_len, n_hidden] but [batch_size, max_len, n_class]. Or I don't fully understand your code:),please help me,thx~

Question about tensor.view operation in Bi-LSTM(Attention)

hidden = final_state.view(-1, n_hidden * 2, 1) # hidden : [batch_size, n_hidden * num_directions(=2), 1(=n_layer)]

Hi, this repo is awesome, but there might be something wrong in the code above. According to the comment above, this snippet intends to change a tensor from shape [num_layers(=1) * num_directions(=2), batch_size, n_hidden] to shape [batch_size, n_hidden * num_directions(=2), 1(=n_layer)], i.e. to concatenate the 2 hidden vector from different direction for every data example in a batch(By saying "data example", I mean a batch has batch_size examples). But I think the code above will mess up the data examples in a batch and lead to unexpected result.

For example, we can use IPython to check the effect of the snippet above.

# create a tensor with shape [num_layers(=1) * num_directions(=2), batch_size, n_hidden]                                                                                           
In [10]: a=torch.arange(2*3*5).reshape(2,3,5) 
                                                                       
In [11]: a                                                             
Out[11]:                                                               
tensor([[[ 0,  1,  2,  3,  4],                                         
         [ 5,  6,  7,  8,  9],                                         
         [10, 11, 12, 13, 14]],                                        
                                                                       
        [[15, 16, 17, 18, 19],                                         
         [20, 21, 22, 23, 24],                                         
         [25, 26, 27, 28, 29]]])                                       
                                                                       
In [12]: a.view(-1,10,1)                                               
Out[12]:                                                               
tensor([[[ 0],                                                         
         [ 1],                                                         
         [ 2],                                                         
         [ 3],                                                         
         [ 4],                                                         
         [ 5],                                                         
         [ 6],                                                         
         [ 7],                                                         
         [ 8],                                                         
         [ 9]],                                                        
                                                                       
        [[10],                                                         
         [11],                                                         
         [12],                                                         
         [13],                                                         
         [14],                                                         
         [15],                                                         
         [16],                                                         
         [17],                                                         
         [18],                                                         
         [19]],                                                        
                                                                       
        [[20],                                                         
         [21],                                                         
         [22],                                                         
         [23],                                                         
         [24],                                                         
         [25],                                                         
         [26],                                                         
         [27],                                                         
         [28],                                                         
         [29]]])                                                       
                                                                       
                         

As you can see, we create a tensor with batch_size=3 and n_hidden=5, e.g [ 0, 1, 2, 3, 4] and [15, 16, 17, 18, 19] belong to the same data example in the batch, but they are from different directions, so what we want is to concatenate them in the resulting tensor. But what the code really does is to concatenate [ 0, 1, 2, 3, 4] and [ 5, 6, 7, 8, 9], which are from different data examples in a batch.

I think it can be fixed by changing the line of code to hidden=torch.cat(final_state[0],final_state[1]],1).view(-1,10,1)

The effect of the new code can be shown as follows:

In [13]: torch.cat([a[0],a[1]],1).view(-1,10,1)
Out[13]:
tensor([[[ 0],
         [ 1],
         [ 2],
         [ 3],
         [ 4],
         [15],
         [16],
         [17],
         [18],
         [19]],

        [[ 5],
         [ 6],
         [ 7],
         [ 8],
         [ 9],
         [20],
         [21],
         [22],
         [23],
         [24]],

        [[10],
         [11],
         [12],
         [13],
         [14],
         [25],
         [26],
         [27],
         [28],
         [29]]])

A question about Autocomplete LSTM Tensorflow

In Autocomplete We already have

X = tf.placeholder(tf.float32, [None, n_step, n_class]) # [batch_size, n_step, n_class]
Y = tf.placeholder(tf.float32, [None, n_class])         

to guess next missing character

  1. How I can customize them to guess more than 1 character ? I don't have any idea about multiplies a tensor by tensor.
  2. In outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
    Why the shape of states alway (2,), what really 2 mean ?
    Thank you for sharing the information.
    aTDpS

doubts about the TextCNN Code

`class TextCNN(nn.Module):
def init(self):
super(TextCNN, self).init()

    self.num_filters_total = num_filters * len(filter_sizes)
    self.W = nn.Parameter(torch.empty(vocab_size, embedding_size).uniform_(-1, 1)).type(dtype)
    self.Weight = nn.Parameter(torch.empty(self.num_filters_total, num_classes).uniform_(-1, 1)).type(dtype)
    self.Bias = nn.Parameter(0.1 * torch.ones([num_classes])).type(dtype)

def forward(self, X):
    embedded_chars = self.W[X] # [batch_size, sequence_length, sequence_length]
    embedded_chars = embedded_chars.unsqueeze(1) # add channel(=1) [batch, channel(=1), sequence_length, embedding_size]

    pooled_outputs = []
    for filter_size in filter_sizes:
        # conv : [input_channel(=1), output_channel(=3), (filter_height, filter_width), bias_option]
        conv = nn.Conv2d(1, num_filters, (filter_size, embedding_size), bias=True)(embedded_chars)
        h = F.relu(conv)
        # mp : ((filter_height, filter_width))
        mp = nn.MaxPool2d((sequence_length - filter_size + 1, 1))
        # pooled : [batch_size(=6), output_height(=1), output_width(=1), output_channel(=3)]
        pooled = mp(h).permute(0, 3, 2, 1)
        pooled_outputs.append(pooled)

    h_pool = torch.cat(pooled_outputs, len(filter_sizes)) # [batch_size(=6), output_height(=1), output_width(=1), output_channel(=3) * 3]
    h_pool_flat = torch.reshape(h_pool, [-1, self.num_filters_total]) # [batch_size(=6), output_height * output_width * (output_channel * 3)]

    model = torch.mm(h_pool_flat, self.Weight) + self.Bias # [batch_size, num_classes]
    return model`

I wonder if it's wrong to create conv inside the loop?

Some mistake in Transformer Position Encoding & BERT

1. mistake in Transformer

# Padding Should be Zero
src_vocab = {'P' : 0, 'ich' : 1, 'mochte' : 2, 'ein' : 3, 'bier' : 4}
src_vocab_size = len(src_vocab)

tgt_vocab = {'P' : 0, 'i' : 1, 'want' : 2, 'a' : 3, 'beer' : 4, 'S' : 5, 'E' : 6}
number_dict = {i: w for i, w in enumerate(tgt_vocab)}
tgt_vocab_size = len(tgt_vocab)

I changed my code more clearly.
There are some mis-points in Transformer about Position Encoding, beacause of torch.LongTensor([[1,2,3,4,5]]) that the indexing of Embedding is a mixed issue.

So I fixed right with shape of get_sinusoid_encoding_table.
In Encoder, self.pos_emb(torch.LongTensor([[5,1,2,3,4]])) is right as ich mochte ein bier P and Decoder, self.pos_emb(torch.LongTensor([[5,1,2,3,4]])) is right as S i want a beer

2. Too heavy BERT as tutorial

In original paper, maxlen is 512, n_layer(number of layers) are 12, but in this tutorial, that is too heavy to run,, so I fiex below this.

# BERT Parameters
maxlen = 30
batch_size = 6
max_pred = 5 # max tokens of prediction
n_layers = 6
n_heads = 12
d_model = 768
d_ff = 768*4 # 4*d_model, FeedForward dimension
d_k = d_v = 64  # dimension of K(=Q), V
n_segments = 2

Also other implementation repository about BERT, when pre processing about masking, [CLS], [SEP], [PAD] should not to be changed as MASK

cand_maked_pos = [i for i, token in enumerate(input_ids)] # wrong this.

https://github.com/dhlee347/pytorchic-bert/blob/master/pretrain.py#L132 this code is right, so i fixed it.

Then, I added SEGMENT MASK for masking where token is zero padding.
This is very import problem.

BERT-Torch.py may have a small mistake

line 69-70 :
index = randint(0, vocab_size - 1) # random index in vocabulary
input_ids[pos] = word_dict[number_dict[index]]
The length of number_dict is 25, but the length of vocab_size is 29, so number_dict[index] might be out of range.
May be we should change line 69 into index = randint(0, len(word_list) - 1)?

3-3-bilstm-torch comment error

class BiLSTM(nn.Module):
def init(self):
super(BiLSTM, self).init()

    self.lstm = nn.LSTM(input_size=n_class, hidden_size=n_hidden, bidirectional=True)
    self.W = nn.Parameter(torch.randn([n_hidden * 2, n_class]).type(dtype))
    self.b = nn.Parameter(torch.randn([n_class]).type(dtype))

def forward(self, X):
    input = X.transpose(0, 1)  # input : [n_step, batch_size, n_class]

    hidden_state = Variable(torch.zeros(1*2, len(X), n_hidden))   # [num_layers(=1) * num_directions(=1), batch_size, n_hidden]
    cell_state = Variable(torch.zeros(1*2, len(X), n_hidden))     # [num_layers(=1) * num_directions(=1), batch_size, n_hidden]

    outputs, (_, _) = self.lstm(input, (hidden_state, cell_state))
    **outputs = outputs[-1]  # [batch_size, n_hidden]**
    model = torch.mm(outputs, self.W) + self.b  # model : [batch_size, n_class]
    return model

error: "outputs = outputs[-1] # [batch_size, n_hidden]"
the shape should be [batch_size,2*n_hidden]

about skip-gram code

I don't quite understand that why 'batch_inputs', 'batch_labels' should be updated during each loop in Word2Vec-Skipgram-Tensor(Softmax).py .

Also ,what does 'trained_embeddings = W.eval()' mean?

Could you explain it for me?I am a bit confused.

`# code

for epoch in range(5000):
    batch_inputs, batch_labels = random_batch(skip_grams, batch_size)
    _, loss = sess.run([optimizer, cost], feed_dict={inputs: batch_inputs, labels: batch_labels})

    if (epoch + 1)%1000 == 0:
        print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))

    trained_embeddings = W.eval()`

Bi-LSTM attention calc may be wrong

lstm_output : [batch_size, n_step, n_hidden * num_directions(=2)], F matrix

def attention_net(self, lstm_output, final_state): 
    batch_size = len(lstm_output) 
    hidden_forward=final_state[0] 
    hidden_backward=final_state[1]
    hidden_f_b=torch.cat((hidden_forward, hidden_backward), 1) 
    hidden = hidden_f_b.view(batch_size, -1, 1)   #  
    hidden = final_state.view(batch_size, -1, 1)   # this line in source code is wrong, bi-lstm's hidden is[2,batch,embed_size] ,we need to concatenate forward and backward hidden state. if we   final_state.view(batch_size, -1, 1)   the  hidden state is not concatenate by final_state[0][0] and final_state[1][0] 

a question about transformer

class MultiHeadAttention(nn.Module):
def init(self):
super(MultiHeadAttention, self).init()
self.W_Q = nn.Linear(d_model, d_k * n_heads)
self.W_K = nn.Linear(d_model, d_k * n_heads)
self.W_V = nn.Linear(d_model, d_v * n_heads)
def forward(self, Q, K, V, attn_mask):
# q: [batch_size x len_q x d_model], k: [batch_size x len_k x d_model], v: [batch_size x len_k x d_model]
residual, batch_size = Q, Q.size(0)
# (B, S, D) -proj-> (B, S, D) -split-> (B, S, H, W) -trans-> (B, H, S, W)
q_s = self.W_Q(Q).view(batch_size, -1, n_heads, d_k).transpose(1,2) # q_s: [batch_size x n_heads x len_q x d_k]
k_s = self.W_K(K).view(batch_size, -1, n_heads, d_k).transpose(1,2) # k_s: [batch_size x n_heads x len_k x d_k]
v_s = self.W_V(V).view(batch_size, -1, n_heads, d_v).transpose(1,2) # v_s: [batch_size x n_heads x len_k x d_v]

    attn_mask = attn_mask.unsqueeze(1).repeat(1, n_heads, 1, 1) # attn_mask : [batch_size x n_heads x len_q x len_k]

    # context: [batch_size x n_heads x len_q x d_v], attn: [batch_size x n_heads x len_q(=len_k) x len_k(=len_q)]
    context, attn = ScaledDotProductAttention()(q_s, k_s, v_s, attn_mask)
    context = context.transpose(1, 2).contiguous().view(batch_size, -1, n_heads * d_v) # context: [batch_size x len_q x n_heads * d_v]
    output = nn.Linear(n_heads * d_v, d_model)(context)
    return nn.LayerNorm(d_model)(output + residual), attn # output: [batch_size x len_q x d_model]

the last second line instantiates a class every time , is it right ? the class should be instantiate in the init function ??

5.1 Transformer may have wrong position embed

  1. in"class Encoder": enc_outputs = self.src_emb(enc_inputs) + self.pos_emb(torch.LongTensor([[1,2,3,4,0]]))

I think it may be: enc_outputs = self.src_emb(enc_inputs) + self.pos_emb(torch.LongTensor([[0,1,2,3,4]]))

  1. in"class Decoder": dec_outputs = self.tgt_emb(dec_inputs) + self.pos_emb(torch.LongTensor([[5,1,2,3,4]]))

I think it may be: dec_outputs = self.tgt_emb(dec_inputs) + self.pos_emb(torch.LongTensor([[0,1,2,3,4]]))

Question?

Did this repository is not supported anymore?

seq2seq(attention) have wrong comment

context = tf.matmul(attn_weights, enc_outputs)
dec_output = tf.squeeze(dec_output, 0)  # [1, n_step]
context = tf.squeeze(context, 1)  # [1, n_hidden]

I think dec_output shape is [1,n_hidden]

BiLstm(tf) maybe have mistake

calculate attention_score
`

Attention

outputs = tf.concat([output[0], output[1]], 2) # output[0] : lstm_fw, output[1] : lstm_bw
outputs = tf.transpose(outputs, [1, 0, 2]) # [n_step, batch_size, n_hidden]

只用了最后一个步长的输出

final_hidden_state = outputs[-1]
output_all = tf.concat([output[0], output[1]], 2)
final_hidden_state = tf.expand_dims(final_hidden_state, 2)
attn_weights = tf.squeeze(tf.matmul(output_all, final_hidden_state), 2) `

3-3.Bi-LSTM may have wrong padding

In line 16 you use
input = input + [0] * (max_len - len(input))
the padding, you use 0, which means the first word 'Lorem'.
but it is not the right choose.
I think you can change like that

    # word_dict = {w: i for i, w in enumerate(list(set(sentence.split())))}
    # number_dict = {i: w for i, w in enumerate(list(set(sentence.split())))}
    word_dict = {w: i for i, w in enumerate(['PAD']+list(set(sentence.split())))}
    number_dict = {i: w for i, w in enumerate(['PAD']+list(set(sentence.split())))}

Faster attention calculation in 4-2.Seq2Seq?

Thanks for sharing! Just found out Attention.get_att_weight is calculating attention in a for-loop? this looks rather slow isn't it?

4-2.Seq2Seq(Attention)/Seq2Seq(Attention).ipynb

    def get_att_weight(self, dec_output, enc_outputs):  # get attention weight one 'dec_output' with 'enc_outputs'
        n_step = len(enc_outputs)
        attn_scores = torch.zeros(n_step)  # attn_scores : [n_step]

        for i in range(n_step):
            attn_scores[i] = self.get_att_score(dec_output, enc_outputs[i])

        # Normalize scores to weights in range 0 to 1
        return F.softmax(attn_scores).view(1, 1, -1)

    def get_att_score(self, dec_output, enc_output):  # enc_outputs [batch_size, num_directions(=1) * n_hidden]
        score = self.attn(enc_output)  # score : [batch_size, n_hidden]
        return torch.dot(dec_output.view(-1), score.view(-1))  # inner product make scalar value

Suggested parallel version

    def get_att_weight(self, dec_output, enc_outputs):  # get attention weight one 'dec_output' with 'enc_outputs'
        n_step = len(enc_outputs)
        attn_scores = torch.zeros(n_step,device=self.device)  # attn_scores : [n_step]

        enc_t = self.attn(enc_outputs)
        score = dec_output.transpose(1,0).bmm(enc_t.transpose(1,0).transpose(2,1))
        out1   = score.softmax(-1)
        return out1

About make_batch of NNLM

input = [word_dict[n] for n in word[:-1]]  # create (1~n-1) as input
target = [word_dict[word[-1]]]

it constraints the length of input and n_step. I think the following example is even better

    for i in range(len(words) - window_size + 1):`
       ` x_train.append(words[i: i + window_size - 1])`
        `y_train.append(words[i + window_size - 1])

Attention BiLSTM

How is it possible to use the Attention Layer in (4.3) for sequence-to-sequence classification something like Named Entity Recognition or Semantic Role Labeling?

different Embedding way

In the code 'Seq2seq-torch.py', i saw u use np.eye,the one-hot representation, to represent embedding, so i change in a normal way ,using nn.Embedding(dict_length,embedding_dim),it can work out. but the loss i got is very high.
i wanna ask the differences between this two ways. here are my code and the result.

image
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.