GithubHelp home page GithubHelp logo

cudnn_rnn_theano_benchmarks's Introduction

cudnn-rnn-benchmarks

All benchmarks are reported for a host with the following specifications :

* NVIDIA GeForce GTX TITAN X GPU

* Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz

* CUDA 8.0, cudnnv5105

These benchmarks are aimed at understanding the performance gains with using the cuDNN RNN implementation (https://devblogs.nvidia.com/parallelforall/optimizing-recurrent-neural-networks-cudnn-5/) in theano.

The benchmarks are evaluated similar to https://github.com/glample/rnn-benchmarks that compares RNN implementations in different deep learning frameworks. Results will be integrated into the above repository eventually.

Note: Results on regular RNNs cannot be compared as is between the two repositories as this benchmark uses the new theano GPU backend libgpuarray https://github.com/Theano/libgpuarray and different hardware specifications.

The Recurrent Networks take as input a 3D Tensor batch_size x seq_length x hidden_size and output all hidden states, compute a MSE loss at each step and compute the gradients of error with respect to each parameter. The hidden_size specifies the size of the output and input layer of the networks.

The code of the scripts we ran are available. The code for the regular theano RNN implementations were borrowed from the rnn-benchmarks repository.

The reported Train time is the average time needed to run (forward, backward) for a single training example, the smaller the better.

A more exhaustive grid search will be done soon.

Note: The compile times, although not reported are much faster for the cuDNN implementation.

LSTM - cuDNN LSTM and GRU vs FastLSTM in rnn.py

This LSTM implementation used for these benchmarks does not use peephole connections between cell and gates.

Depth 1

Batch Size 32 x Seq Len 30

Hidden Size 128

Version Train (µs) Forward only (µs)
Theano LSTM 204.5 57.1
cuDNN Theano LSTM 118.8 59.5
cuDNN Theano GRU 117.4 57.6

Hidden Size 512

Version Train (µs) Forward only (µs)
Theano LSTM 530.9 148.1
cuDNN Theano LSTM 223.2 102.4
cuDNN Theano GRU 184.6 77.6

Hidden Size 1024

Version Train (µs) Forward only (µs)
Theano LSTM 1102.0 294.0
cuDNN Theano LSTM 601.8 161.1
cuDNN Theano GRU 394.8 136.2

Batch Size 128 x Seq Len 30

Hidden Size 128

Version Train (µs) Forward only (µs)
Theano LSTM 200.8 52.8
cuDNN Theano LSTM 33.4 15.0
cuDNN Theano GRU 32.2 14.4

Hidden Size 512

Version Train (µs) Forward only (µs)
Theano LSTM 491.0 138.2
cuDNN Theano LSTM 100.8 31.7
cuDNN Theano GRU 83.3 26.5

Hidden Size 1024

Version Train (µs) Forward only (µs)
Theano LSTM 1000.1 291.8
cuDNN Theano LSTM 221.2 69.0
cuDNN Theano GRU 181.3 59.1

Depth 3

Batch Size 128 x Seq Len 30

Hidden Size 512

Version Train (µs) Forward only (µs)
Theano LSTM 778.3 418.3
cuDNN Theano LSTM 244.9 70.2
cuDNN Theano GRU 197.1 55.7

Hidden Size 1024

Version Train (µs) Forward only (µs)
Theano LSTM 1592.8 882.7
cuDNN Theano LSTM 820.6 256.8
cuDNN Theano GRU 639.5 195.2

Batch Size 128 x Seq Len 200

Hidden Size 512

Version Train (µs) Forward only (µs)
Theano LSTM 2196.6 1168.1
cuDNN Theano LSTM 1539.5 485.9
cuDNN Theano GRU 1253.8 386.4

Hidden Size 1024

Version Train (µs) Forward only (µs)
Theano LSTM 5711.1 3427.9
cuDNN Theano LSTM 5342.5 1692.1
cuDNN Theano GRU 4163.4 1274.5

cudnn_rnn_theano_benchmarks's People

Contributors

maximumentropy avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.