GithubHelp home page GithubHelp logo

chronornn's People

Contributors

atremblay avatar gbmarc1 avatar jamorafo avatar julesgm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

chronornn's Issues

Add bias initialization to 1 for the LSTM.

For the tests the authors are using an LSTM with the biases initialized to 1. Currently we don't support this, only the chrono initialization.
So either a new class for this or add a switch to ChronoLSTM to initialize the bias to 1, up to the person working on this issue.

Copy Task

  • Experiment one (standard copy)
    With 𝑇 = 500 and 𝑇 = 2000. LSTM (128 hidden units) with biases initialized to 1 and another LSTM (128 hidden units) with chrono initialization.

This is to reproduce the first two graphics of Figure 3 of the paper

copy data: data.copy_data(variable=False)

  • Experiment two (variable copy)
    With 𝑇 = 500 and 𝑇 = 1000. LSTM (128 hidden units) with biases initialized to 1 and another LSTM (128 hidden units) with chrono initialization.

This is to reproduce the third and fourth graphics of Figure 3 of the paper.
The Variable Copy Task is where the number of characters between the end of the sequence to copy and the signal character is drawn at random between 1 and 𝑇 .

copy data: data.copy_data(variable=True)

Fix the copyTask

Highlighted in bold is the detail that needs to be implemented. The input has 𝑇 + 20 characters and the signal character is the (𝑇 + 10)th character, then at that moment the network starts outputting it's value (i.e. that's where the loss starts). The network runs with an input for 10 more characters and keep going without any input for the rest of the output sequence.

The copy task checks whether a model is able to remember information for arbi- trarily long durations. We use the setup from (Hochreiter & Schmidhuber, 1997; Arjovsky et al., 2016), which we summarize here. Consider an alphabet of 10 characters. The ninth character is a dummy character and the tenth character is a signal character. For a given 𝑇 , input sequences consist of 𝑇 + 20 characters. The first 10 characters are drawn uniformly randomly from the first 8 letters of the alphabet. These first characters are followed by 𝑇 βˆ’ 1 dummy characters, a signal character, whose aim is to signal the network that it has to provide its outputs, and the last 10 characters are dummy characters. The target sequence consists of 𝑇 + 10 dummy characters, followed by the first 10 characters of the input. This dataset is thus about remembering an input sequence for exactly 𝑇 timesteps. We also provide results for the variable copy task setup presented in (Henaff et al., 2016), where the number of characters between the end of the sequence to copy and the signal character is drawn at random between 1 and 𝑇 .

Here:

https://github.com/atremblay/chronoRNN/blob/master/task/copyTask.py#L67

Reset biases for LSTM

The LSTM Pytorch implementation has two biases for all the gates, input-hidden and hidden-hidden.
http://pytorch.org/docs/master/nn.html#torch.nn.LSTM

This is a bit different than the regular way of computing the gates (as seen in the paper, equation 11 to 15).

Initializing the biases could be done in the method reset_bias() which is called in the constructor. The input-hidden biases are all in one vector (of size 4*hidden_size), same thing for hidden-hidden. Changing the biases for the input and forget gates will require indexing in those two vectors.

Equation 16 gives the initialization to use. But since we have two biases for each gates I wonder if we have to reimplement a full LSTM with only one bias per gate or if we can split the initialization in two biases. I'm not sure what will be the impact on the results of the tasks.

In order to reproduce the paper as closely as possible I would be tempted to reimplement an LSTM from scratch, which shouldn't be too complicated.

Todo list

This is not exhaustive

  • Implement the code for the experiments
    • Models:
      • VanillaRNNs
      • LeakyRNNs
      • GatedRNNs
      • VanillaLSTM
      • ChronoLSTM
    • Tasks:
      • Add
      • Warp
      • Copy
  • Do the experiments
    • Add
      • LeakyRNNs
      • GatedRNNs
      • VanillaLSTM
      • ChronoLSTM
    • Warp
      • LeakyRNNs
      • GatedRNNs
      • VanillaLSTM
      • ChronoLSTM
    • Copy
      • LeakyRNNs
      • GatedRNNs
      • VanillaLSTM
      • ChronoLSTM
  • Complete the report

Warp and padding task

For each value of maximum_warping, the train dataset consists of 50, 000 length-500 randomly warped random sequences, with either uniform or variable time warpings. The alphabet is of size 10 (including a dummy symbol). Contiguous characters are enforced to be different. After warping, each sequence is truncated to length 500. Test datasets of 10, 000 sequences are generated similarily. The criterion to be minimized is the cross entropy in predicting the next character of the output sequence.

Leaky RNN

Need to implement the leaky RNN (equation 5) with
h𝑑+1 =𝛼tanh(π‘Šπ‘₯π‘₯𝑑 +π‘Šhh𝑑 +𝑏)+(1βˆ’π›Ό)h𝑑

This could probably be directly done in the Rnn class that we already have and use a different argument to turn on the leakage.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.