GithubHelp home page GithubHelp logo

condnet's Introduction

condnet

A policy driven sparse net, using explicit sparse dot products to go faster. See our RLDM 2015 submission, as well as our NIPS 2015 Deep RL workshop and ICLR 2016 submission.

Policy block-dropout / Conditional neural net

Please read the paper for more details. The general idea is to drop blocks of units instead of single units (independently per example). In addition, we do not drop the blocks randomly but learn an input-dependent policy that outputs the participation probability of each block, for each layer.

Sparse dot product

Please note that the custom Ops implemented here do not do any checking of shapes yet, so be careful.
On CPU it is worth to do explicit sparse products for all three sparse/full matrix combinations.
On GPU though, the cuBLAS GEMM operation is so fast that it is only worth doing a sparse matrix product when both matrices are sparse.

The expected input of the sparse_dot function are the matrices A and B, the (binary) mask Am by which A is multiplied, the mask Om by which the result of dot(A,B) is multiplied, and finally for convenience a bias vector c which is added (only where Om is nonzero), so the result is:
O = (dot(A*Am, B) + c) * Om
except that Am and Om are actually not the same shape as A and O. For a (n,m) matrix, the mask should be of shape (n, m / block_size).

Also note that A is expected to already be 0 where Am is zero (this is in the GPU sparse-by-full case), which should already be the case if A is the output of a previous sparse operation.

Performance

Typically the higher the block size and the higher the sparsity, the more speedup you will get. On CPU this seems to be less true for very large matrices, while on GPU it is the opposite (the speedup is large for large matrices).

The CPU implementation is single-core (for now), but the intent is to have a proxy of how well it would perform on single-core hardware (e.g. a cheap phone).

Stability

I'm still hunting down the bugs, as it seems that on the CPU (maybe for numerical precision reasons?) I don't exactly get the same results as with theano. It doesn't seem to really affect gradient descent, but e.g. for a same random seed you will get different results.

condnet's People

Contributors

bengioe avatar

Stargazers

Arina Boiko avatar  avatar lizhen avatar  avatar Tony Metger avatar Zhongwen Xu avatar Yulong avatar Namdar Homayounfar avatar luca avatar Xianliang Wu avatar TENSORTALK avatar Sungjin Ahn avatar Nissan Pow avatar Pierre-Luc Bacon avatar

Watchers

 avatar James Cloos avatar Xianliang Wu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.