PyTorch implementation of recurrent batch normalization

Python 100.00%

recurrent-batch-normalization-pytorch's Introduction

Recurrent Batch Normalization

PyTorch implementation of Recurrent Batch Normalization proposed by Cooijmans et al. (2017).

recurrent-batch-normalization-pytorch's People

Stargazers

Watchers

recurrent-batch-normalization-pytorch's Issues

Boolean to Float Tensor

What do you mean in line 262 mask = (time < length).float().unsqueeze(1).expand_as(h_next) I just got error AttributeError: 'bool' object has no attribute 'float' since (time < length) returns True

sequence-wise normalization

Helow,

I am looking for pytorch code for LSTM batch normalization written as sequence-wise normalization in the Cooijmans' paper applicable to input with variable time length, say in Penn Treebank. I have your codes for s-MNIST/p-MNIST running on my local machine. I would like to expand that environment, hopefully, replacing SeparateBatchNorm1d with a new one. Thank you in advance.

Hiroshi

'<' not supported between instances of 'int' and 'tuple'

Hi thanks for the implementation.

When i run your code, i got the error: TypeError: '<' not supported between instances of 'int' and 'tuple'

at the line: mask = (time < length).float().unsqueeze(1).expand_as(h_next)
in the _forward_rnn function of the LSTM class.

Do you have any idea what might cause it?

Thanks.

RuntimeError: arguments are located on different GPUs

Hi, thank you for sharing with us the implemented BN-LSTM. I have successfully used it when this is only one GPU on my machine.

However, if I use the data parallel layer for multiple GPUs (4 in my machine),

model = torch.nn.DataParallel(model).cuda()

There is an error:
RuntimeError: arguments are located on different GPUs at /py/conda-bld/pytorch_1493677666423/work/torch/lib/THC/generic/THCTensorMathBlas.cu:232

The error occurs at wh = torch.mm(h_0, self.weight_hh)

Has anybody faced the same issue?

eps and momentum parameters

The initialization function of the SeparatedBatchNorm1d module has two arguments eps and momentum.

def __init__(self, num_features, max_length, eps=1e-5, momentum=0.1,
                 affine=True):

Is this momentum the same we use in the in our optimization algorithm (e.g. SGD), or it's an additional momentum just for the batch normalization process?

I couldn't find any mention about this in the original paper.

eps and momentum parameters

Why do you NOT use packed padding?

Why do you NOT use packed padding but instead used masks?

How the mean and variance statistics are done separately for each time step in your code?

Thanks for sharing the code. I have read it, but still couldn't find the code doing the batch statics. It seems you just call

bn_wh = self.bn_hh(wh)
bn_wi = self.bn_ih(wi)

each time doing the forward. I guess the pytorch batchnorm module doesn't automatically compute different means and variances for multiple forwards. Could you explain this point? Thanks.

Is dropout really applied?

Hi, thanks for sharing the code, I have a question regarding dropout, hope it's not stupid.
Here in the code

    for layer in range(self.num_layers):
          cell = self.get_cell(layer)
          hx_layer = (hx[0][layer,:,:], hx[1][layer,:,:])
          
          if layer == 0:
              layer_output, (layer_h_n, layer_c_n) = LSTM._forward_rnn(
                  cell=cell, input_=input_, length=length, hx=hx_layer)
          else:
              layer_output, (layer_h_n, layer_c_n) = LSTM._forward_rnn(
                  cell=cell, input_=layer_output, length=length, hx=hx_layer)
          input_ = self.dropout_layer(layer_output)

It seems to me the correct dropout should be assigned to layer_output? Because input_ is not used after layer == 0.

RuntimeError: the number of sizes provided must be greater or equal to the number of dimensions in the tensor at /b/wheel/pytorch-src/torch/lib/TH/generic/THTensor.c:290

Hello, I get this error when I try to replace the LSTM in the language model with the modified (but normal) version found in this repo. Do you have idea where I should look to fix it? I would love to make a PR, but really having a hard time debugging this.

The error seems to happen on this line:

mask = (time < length).float().unsqueeze(1).expand_as(h_next)

Size mismatch when using bnlstm and layer>1 - solution

Hello,
I was using your implementation for my project and have encountered a size mismatch error when using bnlstm and 2 layers.

The mismatch happens because the second layer is initiated with "input size" parameter of "hidden-size" (You can see it here: https://github.com/jihunchoi/recurrent-batch-normalization-pytorch/blob/master/bnlstm.py#L237) but the forward function still passes the regular input which is of size "input size" (you can see this here: https://github.com/jihunchoi/recurrent-batch-normalization-pytorch/blob/master/bnlstm.py#L288)
So at this line of code: https://github.com/jihunchoi/recurrent-batch-normalization-pytorch/blob/master/bnlstm.py#L211
The input_ variable is of size (input_size * something) but the second layer expects input_ variable to be of size (hidden * something).

To solve this issue I changed lines 288-289 to this:

            if layer == 0:
                layer_output, (layer_h_n, layer_c_n) = LSTM._forward_rnn(
                    cell=cell, input_=input_, length=length, hx=hx)
            else:
                layer_output, (layer_h_n, layer_c_n) = LSTM._forward_rnn(
                    cell=cell, input_=layer_output, length=length, hx=hx)

Thanks for the publishing the implementation.
You can close this issue when/if you wish.

P.s: By the way, I would appreciate if you could find a possible solution to the other open issue about reset_parameters, or maybe we could say that the reset_parameters does not affect the network too much so we can just make it default and forget it.

Not a number

Hey,

Thank you for providing a great example on how to implement custom LSTMs. I have a nan issue, however. I am trying to use your LSTM as a drop-in replacement for the pytorch LSTM. In the first iterations all the hidden states are 0 vectors and the values become nan very soon. Do you have any idea what might be causing the issue?

Thanks!

Bi-direction; weight in line 281 is not defiended

Thanks very much for releasing the code! Great job!

However I have met two questions during lingering the code.
1.How to realise Bi-Direnctional BathNormLSTM?
2.The weight in line 281 and 282 in bnlstm.py has not been defined. I have changed it into several ways of define but failed, any suggestions?
specifically, the code:
if hx is None: hx = (Variable(nn.init.xavier_uniform(weight.new(self.num_layers, batch_size, self.hidden_size))), Variable(nn.init.xavier_uniform(weight.new(self.num_layers, batch_size, self.hidden_size))))

jihunchoi / recurrent-batch-normalization-pytorch Goto Github PK

recurrent-batch-normalization-pytorch's Introduction

Recurrent Batch Normalization

recurrent-batch-normalization-pytorch's People

Stargazers

Watchers

Forkers

recurrent-batch-normalization-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs