drckf / paysage Goto Github PK

View Code? Open in Web Editor NEW

119.0 9.0 25.0 27.76 MB

Unsupervised learning and generative models in python/pytorch.

License: Other

Python 99.85% Dockerfile 0.15%

boltzmann-machines rbm machine-learning unsupervised-learning generative-model

paysage's Introduction

Note: this repository has not been under active development for quite some time

Paysage

Paysage is library for unsupervised learning and probabilistic generative models written in Python. The library is still in the early stages and is not yet stable, so new features will be added frequently.

Currently, paysage can be used to train things like:

Bernoulli Restricted Boltzmann Machines
Gaussian Restricted Boltzmann Machines
Hopfield Models

Using advanced mean field and Markov Chain Monte Carlo methods.

Physics-inspired machine learning

Better performance through better algorithms. We are focused on making better Monte Carlo samplers, initialization methods, and optimizers that allow you to train Boltzmann machines without emptying your wallet for a new computer.
Stay close to Python. Everybody loves Python, but sometimes it is too slow to get the job done. We want to minimize the amount of computation that gets shifted to the backend by targeting efforts for acceleration to the main bottlenecks in training.

Installation:

We recommend using paysage with Anaconda3. Simply,

Clone this git repo
Move into the directory with setup.py
Run “pip install -e .”

Running the examples requires a file mnist.h5 containing the MNIST dataset of handwritten images. The script download_mnist.py in the mnist/ folder will fetch the file from the web.

Using PyTorch

Paysage uses one of two backends for performing computations. By default, computations are performed using numpy/numexpr/numba on the CPU. If you have installed PyTorch, then you can switch to the pytorch backend by changing the setting in paysage/backends/config.json to pytorch. If you have a CUDA enabled version of pytorch, you can change the setting in paysage/backends/config.json from cpu to gpu to run on the GPU.

System Dependencies

hdf5, 1.8 required required by tables
llvm, llvm-config required by scikit-learn

About the name:

Boltzmann machines encode information in an "energy landscape" where highly probable states have low energy and lowly probable states have high energy. The name "landscape" was already taken, but the French translation "paysage" was not.

paysage's People

Contributors

Stargazers

Watchers

paysage's Issues

Use of GPU in paysage

Hi,

I am trying to use the GPU option as in
http://physics.bu.edu/~pankajm/ML-Notebooks/HTML/NB17_CXVI_RBM_mnist.html
I have changed the config file appropriately as in the instructions.

However, with the config file changed to "pytorch" and "gpu" I get an unexpected error:
File "rbm-train.py", line 376, in
batch_reader = batch.in_memory_batch(data[phase], batch_size, train_fraction=0.95, transform=transform)
File "paysage/paysage/batch/batch.py", line 41, in in_memory_batch
return Batch({'train': in_memory.InMemoryTable(tensor_train, batch_size, transform),
File "paysage/paysage/batch/in_memory.py", line 51, in init
self.nrows, self.ncols = be.shape(self.tensor)
File "paysage/paysage/backends/pytorch_backend/matrix.py", line 136, in shape
return tuple(tensor.size())
TypeError: 'int' object is not callable

These are the PyTorch versions and CUDA I use.
torch 1.4.0
torchvision 0.5.0
CUDA 10.1

The error still preceeds when "pytorch","cpu" are used in the backends config file.
Is there something wrong with the code written for use with pytorch?

Thank you

Trace sources of non-determinism

We are seeing non-determinism in training in spite of the deterministic seeds in the code. We need to figure out what the source is (versions of libraries, etc.)

Markov Random Fields

Implement Ising and Potts models

Dropout RBMs

Dropout RBMs are discussed in the original dropout paper with results superior to normal RBMs.

Partial fit

Is there a way to fit partially an RBM in paysage? Thank you.

Shift logic from models to layers to improve modularity

In the current version, layers are just thin wrappers around a few functions that provide access to random sampling routines and a few other things like the mean or partition function of a distribution. All of the parameters are located in the model class and all of the logic is handled by the model class.

We can write a fairly general form that encompasses different types of Boltzmann machines as:

E(v, h) = -sum_i a_i(v_i) - sum_j b_j(h_j) - \sum_{ij} W_{ij} v_i h_j

where a_i(.) and b_j(.) are functions rather than just parameters. These functions should be defined by the layers. Also, the weights are defined by how the layers are stacked together. In this way, we could define a Bernoulli-Bernoulli RBM like:

BernoulliRBM = Model( [BernoulliLayer(n_visible), BernoulliLayer(n_hidden)] )

And a Gaussian RBM like:

GaussianRBM = Model( [GaussianLayer(n_visible), BernoulliLayer(n_hidden)] )

Or stack things into multiple layers like:

DBM = Model( [BernoulliLayer(nvis), BernoulliLayer(nhid_1), BernoulliLayer(nhid_2)] )

This will require quite a bit of thought and work, but it should be a high priority.

More stable gradient estimates

We should implement one of the so-called "enhanced gradient" approaches. Personally, I vote for the centered gradient approach from "How to Center Binary Deep Boltzmann Machines" by Melchior, Fischer, and Wiskott.

Hopfield Network [No hidden layer]

Is it possible to use Paysage to make a classical Hopfield network, without hidden units? In the examples, there is a Hopfield Network for MNIST, but this has hidden units. In trying to remove it (either setting n_hidden to 0 or removing the hidden layer from the layer list), there are a number of errors, which could be either mostly related to the fact that the utils/plotting files in the examples assume there is a hidden layer.

Layer-by-layer training of deep models

http://www.jmlr.org/proceedings/papers/v5/salakhutdinov09a/salakhutdinov09a.pdf
https://papers.nips.cc/paper/4610-a-better-way-to-pretrain-deep-boltzmann-machines.pdf
http://www.cs.cmu.edu/~rsalakhu/papers/dbmrec.pdf

Deprecate marginal_free_energy

We should convert the EnergyGap and EnergyZScore methods to use the joint_energy instead of the marginal_free_energy because the former is well-defined for deep models whereas the latter is not really computable. In fact, we should probably deprecate the marginal_free_energy function, which may simplify the code a bit.

This is really easy to fix, so we should just do it.

mnist.h5 no longer available

Hi the download site https://sites.google.com/site/charleskennethfisher is no longer there. Can I find mnist.h5 somewhere else?

About paysage.models.model

In your notebook, you use from paysage.models.model import Model to construct the model, but i find there is no such function , have you ever change that part code?

Docs

We need to write some documentation.

Use sampled states to compute derivatives

For a Boltzmann machine with a single hidden layer (i.e. an RBM) the derivatives can be written using the conditional mean of the hidden units, e.g. < v_i E[ h_j | v] >. The conditional means cannot be computed for models with additional hidden layers, which means we need to switch to computing the derivatives using sampled configurations, e.g. < v_i h_j >.

I've started to implement this change by creating a State class that holds the states of the units for all of the layers in the model. Various functions in the code need to be changed to operate on states -- these functions are marked with TODO statements.

Make backends consistent for vector inputs

When we pass in a vector to many of our backend functions and specify an axis argument, the two backends return different objects - floats vs. single element tensors.

Example:

from paysage.backends.python_backend import matrix as npbe
from paysage.backends.pytorch_backend import matrix as ptbe
import numpy as np
x = np.random.rand(5)
t = ptbe.float_tensor(x)
npbe.mean(x, axis=0)
ptbe.mean(t, axis=0)

returns 0.59536025460813957 for the python backend and 0.5954 [torch.FloatTensor of size 1] for the pytorch backend.

This is almost certainly happening for many backend functions, and needs to be fixed and covered by tests. The behavior should probably be that when a vector input is passed along with an axis argument, a single-element vector is returned.

Heat capacity training metric

The heat capacity is a good metric for determining if an RBM is in a {glassy, ferromagnetic, paramagnetic} phase, and it's change during training can indicate if the model is approaching a phase transition.

I'd like to implement the capacity via MCMC sampling as well as via the TAP3/4 formulas.

Once we can estimate the phase during training we can perhaps change or even switch training methods dynamically according to which hyperparameters work best in which phases.

Add TAP method for GRBMs

Currently TAP training methods are supported only for binary-binary rbms. We should be able to generalize this to a general training method and run it, in particular, on GRBMs

Capsules

http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf

Supervised and Semi-supervised RBMs

http://www.jmlr.org/papers/volume13/larochelle12a/larochelle12a.pdf
http://machinelearning.org/archive/icml2008/papers/601.pdf

Updated backend with GPU support

Paysage is being designed to use multiple backends. Currently, there are:
-- a numpy/numexp/numba backend
-- a backend that uses (pytorch)[https://github.com/pytorch/pytorch] (in progress)

Add Unit Tests

A set of tests that check the basic functionality of the code will enable people to experiment more broadly by providing feedback to make sure things are still working as they should. We should decide on an appropriate framework such as pytest or behave and start to implement things as we go.

Advanced real valued RBMs

Mean-Covariance: http://www.cs.toronto.edu/~ranzato/publications/mcRBMdeepPhoneRec.pdf
http://www.cs.toronto.edu/~fritz/absps/mcimage.pdf
Products of Student-t Distributions: http://www.cs.toronto.edu/~fritz/absps/PoT.pdf
Spike-and-Slab (original): http://jmlr.csail.mit.edu/proceedings/papers/v15/courville11a/courville11a.pdf
- convolutional http://jmlr.org/proceedings/papers/v31/luo13a.pdf
- mu version http://www.icml-2011.org/papers/591_icmlpaper.pdf

I have got a simplified code for most of them, I'm happy to help.

Heat Capacity metric is unstable

Sometimes the heat capacity goes negative during training. Since it's a variance, this shouldn't happen.

Advanced Initialization Methods

Minibatch versions of k-means, PCA, and archetype analysis can provide reasonable initial guesses for the parameters of latent variable models. Mean-field methods for RBMs: https://arxiv.org/abs/1702.03260

Conditional and recurrent RBMs

https://pdfs.semanticscholar.org/b32e/0f9d5a99c9ed2d245c6947f9b09773ddfe9f.pdf
http://www.cs.toronto.edu/~fritz/absps/rtrbm.pdf

Use common statistic calculation routines

We calculate the mean and variance of several different parameters across the code base (mainly in layers and metrics). These are all online calculations using batches of data, so it would be good to centralize the calculation routines and call them as needed.

Import problem with 'backends'

After installing paysage with pip in an Anaconda 3 environment, I get the following ImportError:

>>> import paysage
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mbukov/anaconda3/envs/DNN/lib/python3.6/site-packages/paysage/__init__.py", line 1, in <module>
    from . import backends
ImportError: cannot import name 'backends'

OS specifics:

python: Anaconda 3.6

Python 3.6.3 |Anaconda, Inc.| (default, Nov  8 2017, 18:10:31) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

OS: Mac OS X High Sierra

Multiple Connections in Layers

Currently, the layer objects (denote by vis) assume they are connected to a single layer (denoted hid) by a fully connected weights layer. Layers in conditional and recurrent RBMs are connected to two layers, however. For example, the visible layer at time t in a conditional RBM is connected to the visible layer at time t-1 and the hidden layer at time t. It should be pretty easy to generalize the current setup to allow for this behavior by passing a layer the list of connected layers rather than a single object. Making this change now shouldn't effect any of the models we have already implemented, but will make it a lot easier to add temporal models.

Training using fixed point equations instead of sampling

https://calculatedcontent.com/2016/10/21/improving-rbms-with-physical-chemistry/

Offset values in the energy function

All non-binary D/RBMs augment the basic (binary case) energy function with other terms based on the distribution of visible units. This is even more evident when using supervised or temporal RBMs or for efficient learning of joint DBMs (centering of hidden representations). Since the log energy function is always a linear function of hidden and visible units, all of these more complicated terms can be added as an offset to the basic formula (this can be done in a smart way also for the free energy function, by saving intermediate calculations into separate terms and reuse them later). So it would be nice to keep it in mind when designing the base classes.