GithubHelp home page GithubHelp logo

Comments (3)

ccarter-cs avatar ccarter-cs commented on June 1, 2024

No, that code is fine. The initial hidden state is set

hidden = model.init_hidden(args.batch_size)
. This variable is re-assigned here
output, hidden, rnn_hs, dropped_rnn_hs = model(data, hidden, return_h=True)
. Finally the previous hiddens are detached here
hidden = repackage_hidden(hidden)
.

Also note that output from l-1 is happening because raw_output is being re-assigned here.
https://github.com/salesforce/awd-lstm-lm/blob/master/model.py#L81

Note: I have nothing to do with this project, but hope I have helped.

from awd-lstm-lm.

mourga avatar mourga commented on June 1, 2024

Thank you for your reply. My problem is exactly with this line:

raw_output, new_h = rnn(raw_output, hidden[l])

because as I debug it hidden[0] (layer 0) is zeros as expected, but after the first iteration, hidden[1] (layer 1) is zeros again (because the list hidden does not change).

I though that the hidden states of each layer should be initialised with the hidden states of the previous layer, just like with the input. You can see in the line above that the output of every layer becomes input to the next one (raw_output is the same name for input and output). This does not happen with the hidden state (hidden[l] != new_h).

The most possible scenario is that I miss something, I hope I'm not confusing anyone! Thanks again

from awd-lstm-lm.

ccarter-cs avatar ccarter-cs commented on June 1, 2024

Each layer of an RNN has it's own hidden. These will all start at 0 and be updated to different values. The return from the forward function here returns the hiddens for each layer in a list. On the next mb, these will be used as the starting point. To see this, run 1 MB through and look at new_hidden. It will be of length = num_layers and contain different values. You can also see this in the init_hidden function that creates the properly shaped initial hidden, 1 for each layer of the RNN.

from awd-lstm-lm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.