GithubHelp home page GithubHelp logo

rnn-toynet's Introduction

RNN-ToyNet

This ToyNet is a really simple implementation of Recurrent Neural Network, it is so simple that the only extra dependency of it is numpy, no need to touch Tensorflow or Theano or PyTorch. But this is a really good example for anyone who want to get in touch with RNN. This implementation is following the idea of iamtrask's blog, but with some modifications to make the computations stable and the code clean.

This RNN is used to deal with a "Binary Addition" problem as showed below:

Below are explanations of the code, also referred from the original blog:

Lines 0-8: Importing our dependencies and seeding the random number generator. We will only use numpy and copy. Numpy is for matrix algebra. Copy is to copy things. If there is something wrong with numpy, it will raise an error. Lines 11: Create a class of this RNN model, to make parameters flexible. Lines 13-24: Constructor of our class, stores our input parameters.

Lines 28-40: Initializing our weights that will be used in later process. Lines 29: We're going to create a lookup table that maps from an integer to its binary representation. The binary representations will be our input and output data for each math problem we try to get the network to solve. This lookup table will be very helpful in converting from integers to bit strings.

Lines 42-48: Our nonlinearity and derivative. To ensure the computational statbility(although it is not necessary here for this simple network), used another version from the original function version.

Input_dim: We are adding two numbers together, so we'll be feeding in two-bit strings one character at the time each. Thus, we need to have two inputs to the network (one for each of the numbers being added).

hidden_dim: This is the size of the hidden layer that will be storing our carry bit. Notice that it is way larger than it theoretically needs to be. Play with this and see how it affects the speed of convergence. Do larger hidden dimensions make things train faster or slower? More iterations or fewer?

output_dim: Well, we're only predicting the sum, which is one number. Thus, we only need one output

w_0: This is the matrix of weights that connects our input layer and our hidden layer. Thus, it has "input_dim" rows and "hidden_dim" columns. (2 x 16 unless you change it). If you forgot what it does, look for it in the pictures in Part 2 of this blogpost.

w_1: This is the matrix of weights that connects the hidden layer to the output layer. Thus, it has "hidden_dim" rows and "output_dim" columns. (16 x 1 unless you change it). If you forgot what it does, look for it in the pictures in Part 2 of this blogpost.

w_h: This is the matrix of weights that connects the hidden layer in the previous time-step to the hidden layer in the current timestep. It also connects the hidden layer in the current timestep to the hidden layer in the next timestep (we keep using it). Thus, it has the dimensionality of "hidden_dim" rows and "hidden_dim" columns. (16 x 16 unless you change it). If you forgot what it does, look for it in the pictures in Part 2 of this blogpost.

Line 38-40: These store the weight updates that we would like to make for each of the weight matrices. After we've accumulated several weight updates, we'll actually update the matrices. More on this later.

Line 50: We're iterating over 100,000 training examples (iterations) by default, but since now we define the training process in a method contained in the RNN model class, and the varaible "iteration" can be changed during instantiate.

Line 54: We're going to generate a random addition problem. So, we're initializing an integer randomly between 0 and half of the largest value we can represent. If we allowed the network to represent more than this, than adding two number could theoretically overflow (be a bigger number than we have bits to represent). Thus, we only add numbers that are less than half of the largest number we can represent.

Line 54: We lookup the binary form for "a_int" and store it in "a"

Line 57: Same thing as line 54, just getting another random number.

Line 58: Same thing as line 55, looking up the binary representation.

Line 61: We're computing what the correct answer should be for this addition

Line 62: Converting the true answer to its binary representation

Line 65: Initializing an empty binary array where we'll store the neural network's predictions (so we can see it at the end). You could get around doing this if you want...but i thought it made things more intuitive

Line 68: Resetting the error measure (which we use as a means to track convergence... see my tutorial on backpropagation and gradient descent to learn more about this)

Lines 70-71: These two lists will keep track of the layer 2 derivatives and layer 1 values at each time step.

Line 72: Time step zero has no previous hidden layer, so we initialize one that's off.

Line 77: This for loop iterates through the binary representation

Line 80: X is the same as "layer_0" in the pictures. X is a list of 2 numbers, one from a and one from b. It's indexed according to the "position" variable, but we index it in such a way that it goes from right to left. So, when position == 0, this is the farhest bit to the right in "a" and the farthest bit to the right in "b". When position equals 1, this shifts to the left one bit.

Line 82: Same indexing as line 80, but instead it's the value of the correct answer (either a 1 or a 0)

Line 86: This is the magic!!! Make sure you understand this line!!! To construct the hidden layer, we first do two things. First, we propagate from the input to the hidden layer (np.dot(X,synapse_0)). Then, we propagate from the previous hidden layer to the current hidden layer (np.dot(prev_layer_1, synapse_h)). Then WE SUM THESE TWO VECTORS!!!!... and pass through the sigmoid function.

So, how do we combine the information from the previous hidden layer and the input? After each has been propagated through its various matrices (read: interpretations), we sum the information.

Line 89: This should look very familiar. It's the same as previous tutorials. It propagates the hidden layer to the output to make a prediction

Line 92: Compute by how much the prediction missed

Line 93: We're going to store the derivative (mustard orange in the graphic above) in a list, holding the derivative at each timestep.

Line 94: Calculate the sum of the absolute errors so that we have a scalar error (to track propagation). We'll end up with a sum of the error at each binary position.

Line 97 Rounds the output (to a binary value, since it is between 0 and 1) and stores it in the designated slot of d.

Line 100 Copies the layer_1 value into an array so that at the next time step we can apply the hidden layer at the current one.

Line 105: So, we've done all the forward propagating for all the time steps, and we've computed the derivatives at the output layers and stored them in a list. Now we need to backpropagate, starting with the last timestep, backpropagating to the first

Line 107: Indexing the input data like we did before

Line 108: Selecting the current hidden layer from the list.

Line 109: Selecting the previous hidden layer from the list

Line 112: Selecting the current output error from the list

Line 115: this computes the current hidden layer error given the error at the hidden layer from the future and the error at the current output layer.

Line 118-120: Now that we have the derivatives backpropagated at this current time step, we can construct our weight updates (but not actually update the weights just yet). We don't actually update our weight matrices until after we've fully backpropagated everything. Why? Well, we use the weight matrices for the backpropagation. Thus, we don't want to go changing them yet until the actual backprop is done. See the backprop blog post for more details.

Line 125 - 132 Now that we've backpropped everything and created our weight updates. It's time to update our weights (and empty the update variables).

Line 135 - end Just some nice logging to show progress>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.