Comments (3)
No, that code is fine. The initial hidden state is set
Line 177 in 32fcb42
Line 196 in 32fcb42
Line 193 in 32fcb42
Also note that output from l-1 is happening because raw_output is being re-assigned here.
https://github.com/salesforce/awd-lstm-lm/blob/master/model.py#L81
Note: I have nothing to do with this project, but hope I have helped.
from awd-lstm-lm.
Thank you for your reply. My problem is exactly with this line:
Line 81 in 32fcb42
because as I debug it
hidden[0]
(layer 0) is zeros as expected, but after the first iteration, hidden[1]
(layer 1) is zeros again (because the list hidden
does not change).
I though that the hidden states of each layer should be initialised with the hidden states of the previous layer, just like with the input. You can see in the line above that the output of every layer becomes input to the next one (raw_output
is the same name for input and output). This does not happen with the hidden state (hidden[l]
!= new_h
).
The most possible scenario is that I miss something, I hope I'm not confusing anyone! Thanks again
from awd-lstm-lm.
Each layer of an RNN has it's own hidden. These will all start at 0 and be updated to different values. The return from the forward function here returns the hiddens for each layer in a list. On the next mb, these will be used as the starting point. To see this, run 1 MB through and look at new_hidden. It will be of length = num_layers and contain different values. You can also see this in the init_hidden function that creates the properly shaped initial hidden, 1 for each layer of the RNN.
from awd-lstm-lm.
Related Issues (20)
- save model
- Hyper parameter settings for PyTorch 1.0? HOT 2
- possible error on the neural cache model, pointer? Why use the groundtruth data? HOT 1
- Problems with the next word predictions
- Language model infer
- Increasingly occupied GPU memory HOT 6
- Score a test sentence
- AttributeError: 'Program' object has no attribute '_program' HOT 1
- AssertionError HOT 1
- why is hidden layer initialized only on beginning of an epoch? HOT 2
- Dropconnect layer implementation.
- TypeError: cannot assign 'torch.cuda.FloatTensor' as parameter 'weight_hh_l0' (torch.nn.Parameter or None expected) HOT 1
- torch.cuda.FloatTensor
- AttributeError: 'LSTM' object has no attribute 'weight_hh_l0' HOT 1
- A UsingWarning about call flatten_parameters()
- val_loss suddenly increases to a larger number(from ~5 to more than 10000) HOT 1
- The model behaves normally during training, but during prediction, the weightdrop mechanism cannot be stopped and an error is reported
- The might be a bug in splitcross.py
- initial input dropout mask is not considered.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from awd-lstm-lm.