atremblay / chronornn Goto Github PK
View Code? Open in Web Editor NEWReproducing "Can Recurrent Neural Networks Warp Time?"
Reproducing "Can Recurrent Neural Networks Warp Time?"
For the tests the authors are using an LSTM with the biases initialized to 1. Currently we don't support this, only the chrono initialization.
So either a new class for this or add a switch to ChronoLSTM to initialize the bias to 1, up to the person working on this issue.
This is to reproduce the first two graphics of Figure 3 of the paper
copy data: data.copy_data(variable=False)
This is to reproduce the third and fourth graphics of Figure 3 of the paper.
The Variable Copy Task is where the number of characters between the end of the sequence to copy and the signal character is drawn at random between 1 and π .
copy data: data.copy_data(variable=True)
Highlighted in bold is the detail that needs to be implemented. The input has π + 20 characters and the signal character is the (π + 10)th character, then at that moment the network starts outputting it's value (i.e. that's where the loss starts). The network runs with an input for 10 more characters and keep going without any input for the rest of the output sequence.
The copy task checks whether a model is able to remember information for arbi- trarily long durations. We use the setup from (Hochreiter & Schmidhuber, 1997; Arjovsky et al., 2016), which we summarize here. Consider an alphabet of 10 characters. The ninth character is a dummy character and the tenth character is a signal character. For a given π , input sequences consist of π + 20 characters. The first 10 characters are drawn uniformly randomly from the first 8 letters of the alphabet. These first characters are followed by π β 1 dummy characters, a signal character, whose aim is to signal the network that it has to provide its outputs, and the last 10 characters are dummy characters. The target sequence consists of π + 10 dummy characters, followed by the first 10 characters of the input. This dataset is thus about remembering an input sequence for exactly π timesteps. We also provide results for the variable copy task setup presented in (Henaff et al., 2016), where the number of characters between the end of the sequence to copy and the signal character is drawn at random between 1 and π .
Here:
https://github.com/atremblay/chronoRNN/blob/master/task/copyTask.py#L67
The LSTM Pytorch implementation has two biases for all the gates, input-hidden and hidden-hidden.
http://pytorch.org/docs/master/nn.html#torch.nn.LSTM
This is a bit different than the regular way of computing the gates (as seen in the paper, equation 11 to 15).
Initializing the biases could be done in the method reset_bias() which is called in the constructor. The input-hidden biases are all in one vector (of size 4*hidden_size), same thing for hidden-hidden. Changing the biases for the input and forget gates will require indexing in those two vectors.
Equation 16 gives the initialization to use. But since we have two biases for each gates I wonder if we have to reimplement a full LSTM with only one bias per gate or if we can split the initialization in two biases. I'm not sure what will be the impact on the results of the tasks.
In order to reproduce the paper as closely as possible I would be tempted to reimplement an LSTM from scratch, which shouldn't be too complicated.
This is not exhaustive
Have not been able to reproduce the problem consistently. I just noticed that it sometimes starts with NaN
For each value of maximum_warping, the train dataset consists of 50, 000 length-500 randomly warped random sequences, with either uniform or variable time warpings. The alphabet is of size 10 (including a dummy symbol). Contiguous characters are enforced to be different. After warping, each sequence is truncated to length 500. Test datasets of 10, 000 sequences are generated similarily. The criterion to be minimized is the cross entropy in predicting the next character of the output sequence.
Need to implement the leaky RNN (equation 5) with
hπ‘+1 =πΌtanh(ππ₯π₯π‘ +πhhπ‘ +π)+(1βπΌ)hπ‘
This could probably be directly done in the Rnn class that we already have and use a different argument to turn on the leakage.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.