Restricted Boltzmann Machine (RBM) for MNIST reconstruction

TensorFlow implementation of a Restricted Boltzmann Machine (RBM) for MNIST digits reconstruction.

Check this video for some background.

Requirements

Python 2.7 or 3.5
TensorFlow 1.0.1+

RBM Graphical Model

Restricted Boltzmann Machines (RBMs) are a class of undirected probabilistic graphical models containing a layer of observable variables and a single layer of latent variables. In RBMs, there are no connections within a layer.

The whole system (hidden and visible nodes) is described by an energy function:

E(v,h) = -v^{T}Wh -v^{T}b - h^{T}c

As in statistical physics, high-energy configurations are less probable. The joint probability distribution is defined as:

p(v,h) = e^{-E(v,h)}/Z where Z is the partition function (intractable)

Our goal is to learn the joint probability distribution that maximizes the probability over the data, also known as likelihood.

p(v) = sum_{h}p(v,h} = e^{-F(v)}/Z where F(v) is called Free Energy

Inference

The Conditional distribution factorizes (no intra layer connections):

p(h_{j}=1|v) = p(h_{j}=1, v) / ( p(h_{j}=0, v) + p(h_{j}=1, v) ) = sigmoid(c_{j}+v^{T}W_{:j})
p(v_{i}=1|h) = sigmoid(b_{i}+W_{i:}h)

Learning

The parameters of our model are the weights W and the biases b, c.

Maximizing the log-Likelihood

Derive log-likelihood and gradient formulas. (TODO)

it is impractical to compute the exact log-likelihood gradient (expectation of the joint distribution).

Contrastive divergence

Idea:

Replace the expectation by a point estimate at v'
Obtain the point v' by Gibbs Sampling
Start sampling chain at v(t)

1-step divergence:

Positive divergence: $h(v)v^{T}$
Negative divergence: $h(v')v'^{T}$ where v' is reconstructed from a sample from h(v)

Pseudocode:

For each training example v(t):

i. Generate a negative sample v' using k steps of Gibbs Sampling, starting at v(t)

ii. Update parameters

 $w_{new} = w_{old} + \epsilon * (h(v(t))v(t)^{T}-h(v')v'^{T}) $

 $b_{new} = b_{old} + \epsilon * (h(v(t))-h(v'))$

 $c_{new} = c_{old} + \epsilon * (v(t)-v')$

Go back to 1. until stoppng criteria

The following figure is a representation of the feature detectors. The hidden nodes encode a lower dimensional representation of the data (visible nodes).

Usage

Run the main.ipynb file in jupyter

Results

Under progress

Extensions

Deep Boltzmann Machines and Deep Belief Networks.

Contrastive Divergence k (for k>1 step of MCMC simulation) w/ weight cost or temperature [Tieleman 08]. video for Persistent Contrastive Divergence.

...

dereypl / rbm-for-mnist Goto Github PK

rbm-for-mnist's Introduction

Restricted Boltzmann Machine (RBM) for MNIST reconstruction

Requirements

RBM Graphical Model

Inference

Learning

Maximizing the log-Likelihood

Contrastive divergence

Pseudocode:

Usage

Results

Extensions

rbm-for-mnist's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs