GithubHelp home page GithubHelp logo

convlstm-for-caffe's Introduction

ConvLSTM layer

Implementation based on Jeff Donahue's LSTM implementation for Caffe.

Installation

Requires a recent version of caffe (or alternatively, the "recurrent" branch of Jeff Donahue's caffe github repository). Clone with git clone -b recurrent <address>

Then, simply copy the files in include/ and src/ to their corresponding directories.

Patching the proto file

You need to merge the protobuffer defintion in patch.proto with src/caffe/proto/caffe.proto. To make this job easier, I have written a small patcher in python, see patch_proto.py.

  • Note: I do not take any responsibility for files broken by the patcher! Merge the files manually!
  • It does create a backup file!
  • Note: The patcher is more of a quick hack. Applying a patch more than once will destroy caffe.proto

If you do not want to use the patcher, you will have to manually merge the two files: Extend the block "LayerParameter" accordingly, and add the other blocks to the end of the file.

Makefile.config

We provide a working configuration file with this repository, see

Makefile.config

It was tested with g++-5 and Cuda 7.5.

Building

Once everything is prepared, run make clean && make to recompile caffe.

Notes on Compiling

Requires C++11 to compile. Set CUSTOM_CXX := g++ -std=c++11 in Makefile.config. We have observed some bugs when compiling with g++-5 (which is not technically supported with CUDA 7.5). To avoid these problems, add: -D__STRICT_ANSI__ -D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES to the compiler line.

Furthermore, a bug seems to appear in crop_layer.cu when using C++11 and Cuda. You can find a simple fix in fixes/.

Usage

Note that Jeff's implementation expects data of shape T x N x ..., where T are the number of timesteps and N the number of independent streams, e.g., videos.

That means the data needs to be interleaved: <video1_t1>, <video2_t1>, <video1_t2>, <video2_t2>, etc..

Specifying a ConvLSTM layer

Use "lstm_convolution_param" to specify the details of the convolutional layer inside a ConvLSTM layer. It can have the following parameters:

  • "type": Whether this is a input-to-hidden or hidden-to-hidden operations, or both ("input", "hidden", "all").
  • This means you can specify up to two lstm_convolution_param per layer (one "hidden" plus one "input", or simply one "all")!
  • Default Conv.-Params, such as "kernel_size", "num_outputs", "pad", etc.
  • Certain features may not be available

Example

For an example, please refer to the models/ directory!

Other Notes

To allow for correct gradient backpropagation, we currently use a workaround to force Caffe to propagate all the way to latent states h_0 / c_0. This is realized by the DummyForward layer, which simply forwards data, and backpropagates gradients. The layer owns a 'dummy' parameter of shape (1), which you may notice if you wish to inspect ConvLSTM's weights. The parameter is without further meaning. This workaround may or may not be removed in the future, in which case old models may not be compatible, as the number of parameters count will have changed.

Feedback

If you find any bugs or have other feedback, please let me know! Thanks :)

Contact: s [dot] agethen [at] gmail [dot] com

convlstm-for-caffe's People

Contributors

agethen avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.