PyTorch implementation of Recurrent Batch Normalization proposed by Cooijmans et al. (2017).
jihunchoi / recurrent-batch-normalization-pytorch Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of recurrent batch normalization
PyTorch implementation of recurrent batch normalization
PyTorch implementation of Recurrent Batch Normalization proposed by Cooijmans et al. (2017).
What do you mean in line 262 mask = (time < length).float().unsqueeze(1).expand_as(h_next)
I just got error AttributeError: 'bool' object has no attribute 'float'
since (time < length)
returns True
Helow,
I am looking for pytorch code for LSTM batch normalization written as sequence-wise normalization in the Cooijmans' paper applicable to input with variable time length, say in Penn Treebank. I have your codes for s-MNIST/p-MNIST running on my local machine. I would like to expand that environment, hopefully, replacing SeparateBatchNorm1d with a new one. Thank you in advance.
Hiroshi
Hi thanks for the implementation.
When i run your code, i got the error: TypeError: '<' not supported between instances of 'int' and 'tuple'
at the line: mask = (time < length).float().unsqueeze(1).expand_as(h_next)
in the _forward_rnn function of the LSTM class.
Do you have any idea what might cause it?
Thanks.
Hi, thank you for sharing with us the implemented BN-LSTM. I have successfully used it when this is only one GPU on my machine.
However, if I use the data parallel layer for multiple GPUs (4 in my machine),
model = torch.nn.DataParallel(model).cuda()
There is an error:
RuntimeError: arguments are located on different GPUs at /py/conda-bld/pytorch_1493677666423/work/torch/lib/THC/generic/THCTensorMathBlas.cu:232
The error occurs at wh = torch.mm(h_0, self.weight_hh)
Has anybody faced the same issue?
The initialization function of the SeparatedBatchNorm1d module has two arguments eps
and momentum
.
def __init__(self, num_features, max_length, eps=1e-5, momentum=0.1,
affine=True):
Is this momentum
the same we use in the in our optimization algorithm (e.g. SGD), or it's an additional momentum just for the batch normalization process?
I couldn't find any mention about this in the original paper.
Why do you NOT use packed padding but instead used masks?
Thanks for sharing the code. I have read it, but still couldn't find the code doing the batch statics. It seems you just call
bn_wh = self.bn_hh(wh)
bn_wi = self.bn_ih(wi)
each time doing the forward. I guess the pytorch batchnorm module doesn't automatically compute different means and variances for multiple forwards. Could you explain this point? Thanks.
Hi, thanks for sharing the code, I have a question regarding dropout, hope it's not stupid.
Here in the code
for layer in range(self.num_layers):
cell = self.get_cell(layer)
hx_layer = (hx[0][layer,:,:], hx[1][layer,:,:])
if layer == 0:
layer_output, (layer_h_n, layer_c_n) = LSTM._forward_rnn(
cell=cell, input_=input_, length=length, hx=hx_layer)
else:
layer_output, (layer_h_n, layer_c_n) = LSTM._forward_rnn(
cell=cell, input_=layer_output, length=length, hx=hx_layer)
input_ = self.dropout_layer(layer_output)
It seems to me the correct dropout should be assigned to layer_output? Because input_ is not used after layer == 0.
Hello, I get this error when I try to replace the LSTM in the language model with the modified (but normal) version found in this repo. Do you have idea where I should look to fix it? I would love to make a PR, but really having a hard time debugging this.
The error seems to happen on this line:
mask = (time < length).float().unsqueeze(1).expand_as(h_next)
Hello,
I was using your implementation for my project and have encountered a size mismatch error when using bnlstm and 2 layers.
The mismatch happens because the second layer is initiated with "input size" parameter of "hidden-size" (You can see it here: https://github.com/jihunchoi/recurrent-batch-normalization-pytorch/blob/master/bnlstm.py#L237) but the forward function still passes the regular input which is of size "input size" (you can see this here: https://github.com/jihunchoi/recurrent-batch-normalization-pytorch/blob/master/bnlstm.py#L288)
So at this line of code: https://github.com/jihunchoi/recurrent-batch-normalization-pytorch/blob/master/bnlstm.py#L211
The input_ variable is of size (input_size * something) but the second layer expects input_ variable to be of size (hidden * something).
To solve this issue I changed lines 288-289 to this:
if layer == 0:
layer_output, (layer_h_n, layer_c_n) = LSTM._forward_rnn(
cell=cell, input_=input_, length=length, hx=hx)
else:
layer_output, (layer_h_n, layer_c_n) = LSTM._forward_rnn(
cell=cell, input_=layer_output, length=length, hx=hx)
Thanks for the publishing the implementation.
You can close this issue when/if you wish.
P.s: By the way, I would appreciate if you could find a possible solution to the other open issue about reset_parameters, or maybe we could say that the reset_parameters does not affect the network too much so we can just make it default and forget it.
Hey,
Thank you for providing a great example on how to implement custom LSTMs. I have a nan issue, however. I am trying to use your LSTM as a drop-in replacement for the pytorch LSTM. In the first iterations all the hidden states are 0 vectors and the values become nan very soon. Do you have any idea what might be causing the issue?
Thanks!
Thanks very much for releasing the code! Great job!
However I have met two questions during lingering the code.
1.How to realise Bi-Direnctional BathNormLSTM?
2.The weight in line 281 and 282 in bnlstm.py has not been defined. I have changed it into several ways of define but failed, any suggestions?
specifically, the code:
if hx is None: hx = (Variable(nn.init.xavier_uniform(weight.new(self.num_layers, batch_size, self.hidden_size))), Variable(nn.init.xavier_uniform(weight.new(self.num_layers, batch_size, self.hidden_size))))
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.