GithubHelp home page GithubHelp logo

gru-varients-main's Introduction

GRU-varients

An implementation of classical GRU (Cho, el at. 2014) along with Optimized versions (Dey, Rahul. 2017) on TensorFlow that outperforms Native tf.keras.layers.GRU(units) implementation of Keras.

Dataset

The original paper [Dey, Rahul, et al. 2017] used MNIST Dataset for the training and inference purpose, we'll be comparing them on np.random.rand(batch_size, time_width, 500) in our implementation (this ensures that bias ~ 0, following random noise cancel each others.)

Abstract

The whole code has two files - CustomGRU.py and Notebook.ipynb as different implementations of Gated Recurrent Units and an short tutorial on usage in form of Python Notebooks respectively. We compare GRU0 - Classical GRU, GRU1/GRU2/GRU3 as optimized GRU models and GRU4 as Native TensorFlow implementation of GRU in form of tf.keras.layers.GRU(units). GRUs are special RNNs which are very lighter alternative to LSTM with very few parameters and almost similar performance [Cho, et al. 2014]. We optimize classical GRUs to the versions in [Dey, Rahul. 2017] for our analysis. We found that all four manual implementations outperform the Native TensorFlow GRU implementation, which almost stopped converging after a certain epochs. Our models had slower initial convergence compared to TensorFlow's native implementation which was much sharper in the beginning but totally stopped after few epochs whereas our Models continued it. Training time was almost same for all the five models, though, GRU2/GRU3 had much faster training time and GRU4 outperformed all in training duration.

Results

Published Results vs Our finding.

Screenshot 2022-08-20 020700 (Trained on MNIST Dataset)

download (2) (Same Models, trained on Random Noise)

Comparision with Native TensorFlow Keras Model

download (3) (Blue line refers to TensorFlow Keras Native GRU Implementation)

Equations of update/reset gate in Optimized Models.

Screenshot 2022-08-20 022808

References

  1. Dey, Rahul, and Fathi M. Salem. "Gate-variants of gated recurrent unit (GRU) neural networks." 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS). IEEE, 2017.

  2. Bengio, Y., Simard, P., and Frasconi, P. Learning Longterm Dependencies with Gradient Descent is Difficult. IEEE Trans.Neural Networks, 5(2):157–166, 1994. H. Simpson, Dumb Robots, 3rd ed., Springfield: UOS Press, 2004, pp.6-9.

  3. Chung, Junyoung, et al. "Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3555 (2014).

  4. Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. "Learning long-term dependencies with gradient descent is difficult." IEEE transactions on neural networks 5.2 (1994): 157-166.

  5. Bengio, Yoshua, Nicolas Boulanger-Lewandowski, and Razvan Pascanu. "Advances in optimizing recurrent networks." 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013.

  6. Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.