blei-lab / edward Goto Github PK

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

License: Other

Python 29.88% Makefile 0.12% Jupyter Notebook 69.81% Smarty 0.01% Dockerfile 0.18%

bayesian-methods data-science deep-learning machine-learning neural-networks probabilistic-programming statistics tensorflow

edward's People

Contributors

Stargazers

Watchers

Forkers

ml-ai-nlp-ir guaibaoer pseemakurthi chrismattmann wavelets techscientist zihua paulhendricks wanjinchang jayhetee tempbottle yalechang xuesj snazz2001 jayinai jjsong fage2016 beronx86 fedorajzf rubyway fangzheng354 frankszn nkhuyu yorkerlin xsj4cs abojchevski detrident ml-lab twiecki zeeshansultan caomw jluttine gitter-badger hitluobin chagge hulalazz laisun flrgsr trappmartin mldl cc13ny timmyzhao meereeum sepidina pradyumnanpk codeaudit sandy4321 cbonnett xypan1232 yluo42 gfortaine vikingmew yezhiyun mamikonyana dcjones kudkudak aporia3517 crack521 graphbio adjidieng kgourgou meadowhb appcoreopc jf0510310315 franciscojavierarceo mohit1007 wangxiao5791509 leezqcst mouse1231 tonylibing xsongx fmpr talkingdata arbasher michelle190 alexpiche awesome-python srinituraga light44 colinsongf poolio rafaelvalle sihanhuang timwee perellonieto chris-tng knowledgefold anuroopsriram bayerj kourouklides stephenra bhargavvader hijbul neverspill pdeford benjamesbabala pipette capdevc lxiaorui mpacer

edward's Issues

support for essential log density functions

Discrete

Continuous

useful references:

wiki page on how to use the repo as a research tool

variational models

See the branch original for its NumPy/SciPy implementation.

design base class for inference algorithms

This will require quite a bit of foresight. We need the right structure to allow future algorithmic endeavors, e.g., sampling routines, meta-inference/hyperparameter optimization, optimization-based stuff such as tempering.

Given the current structure, we should be fine to do the stuff we currently want to do (inference on HVMs, alpha/f-divergences).

variance reduction techniques

use analytic form of q's entropy when available
use analytic form of KL(q(z) || p(z)) when available
rao blackwellization
control variates (the vanilla one, nothing fancy yet)

Stochastic Block Model

Can I define/learn a stochastic block model (SBM) with edward, and if I can what inference engine should I use?

With pymc3 I would define SBM, for some value K (number of blocks/clusters) as such:

# group assignment probabilities
pi = Dirichlet(name='pi', a=5 * np.ones(K)) 
# group assignments for nodes
q = Categorical(name='q', p=pi, shape=N) 
# connection probabilities among groups
eta = Beta(name='eta', alpha=1, beta=1, shape=(K, K))
# adjacency matrix
R = Bernoulli(name='R', p=eta[q, :][:, q], observed=R_observed, shape=(N, N))

As a probabilistic graphical model it would look like this:

In other words I model my graph (with N nodes and some edges) by modeling the adjacency matrix R. Each edge i, j is independent from the other edges and is generated as such:

sample pi from Dirichlet
sample q_i the categorical cluster assignment for the i-th node given pi
sample q_j the categorical cluster assignment for the j-th node given pi
from the K x K matrix of Beta variables pick the Beta variable corresponding to the cluster assignments (e.g. q_i=a and q_j=b => we sample from the Beta variable at index a, b)
form an edge according to a Bernoulli with probability of 1 given the chosen Beta in the previous step

The adjacency matrix R is what is observed, and I would like to learn the latent eta and q. The MCMC approach with pycm3 works well enough, but a variational approach might be better.

If this kind of question is not suited here please let me know.

entropy function for distributions

references

model zoos and pre-trained models

Caffe. Also see the equivalent issue in Keras.

(numerically stable) algorithms for alpha/f-divergences

design base class for variational models

What's the right abstraction, and base class methods and members that all variational model classes should share? Further, how do we mix and match them up, so it's not as blocky as "MFGaussian" but can, e.g., be a choice of variational family for each dimension, or specification of a joint distribution.

This choice will be particularly relevant for designing classes for hierarchical variational models, in which you will have some arbitrary stacking of these guys.

write tutorial page

This is the first page users will read when they learn how to use the library.

https://github.com/Blei-Lab/blackbox/wiki/Tutorial

hacks to sample from distributions in tensorflow

I'm not sure how much this affects our current runtimes, but right now, in order to sample from q, we have to run the session to realize the variational parameters and then sample using scipy.

TensorFlow supports uniform and standard normal, and from there we can try to use clever hacks in the literature to get fast and numerically stable samples from other distributions (Devroye, 1986?). Here are some off the top of my head:

Uniform (built-in)
Normal (location-scale transform of tf.random_normal)
Multivariate Normal (location-scale transform of tf.random_normal)
Log-Normal (sample from normal, then exp() transform it)
Bernoulli (tf.select from a uniform)
Multinomial (Gumbel arg max trick)
Exponential (eps ~ Unif(0,1), log(1/eps) transform it)
Inverse Gamma (sample from Gamma, then 1/() transform it)

References

http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/
Devroye, L. (1996). Random variate generation in one line of code (pp. 265–272). IEEE. http://doi.org/10.1109/WSC.1996.873288

come up with a new name

It's already taken on the python package index :( https://pypi.python.org/pypi/blackbox/0.7.1

write tests for functions in edward/util.py

there are many small functions in edward/util.py that we depend on.

let's write unit tests for them.

data subsampling for more complicated data structures

How do we subsample data for matrices, ragged arrays (e.g., time series), tensors, etc.?

unifying data specification

dave brought up the idea of abstracting the data specification: the users will have one way to specify the data and in the backend the data provided will be morphed into a stan data model, or a tensorflow data model etc...

tutorials

GPy has incredible tutorial notebooks. They teach basic concepts related to GPs and use GPy as a basis for explaining these concepts. I think it's a great idea, and I've personally found them useful as I dug though the GP literature.

I think we can do something similar. Each notebook instructively explains a concept. The code is similar to the end-to-end examples currently in the repo, but with more words and explanation about the various options that are used. Here are some examples of concepts to teach:

Black box variational inference
Mixture modeling with stochastic variational inference
Posterior predictive check

inverse mappings

functions from data point to variational parameters

inference networks
averaging local parameters (e.g., in stochastic ep)

useful references:

support for pymc3 as a modeling language?

We want to support modeling languages which are either popular or are useful for certain tasks over the alternatives we support. With that in mind, pymc3 seems appealing for specifying large discrete latent variable models. You can't write them as a Stan program, and it could be rather annoying to code them up in raw TensorFlow or NumPy/SciPy.

On the other hand, it's one more modeling language to maintain; pymc3 actually uses Theano as a backend, which may lead to some bugs(?); and I don't know how popular pymc3 would be as a use case over the other supported languages.

implement error checking when adding layers to Variational()

see #62 (comment)

visualization tools

what are good default functions we should have for plotting?

utility function to perform predictive checks

Right now in all examples we run inference and stop after looking at the ELBO via the print progress. We should have a simple function to do predictive checks (both prior and posterior).

Here's a general spec:

def predictive_check(T, likelihood, latent, S=100):       
    """
    Predictive check. It is a prior predictive check if latent is the prior
    and a posterior predictive check if latent is the posterior.
    (Box, 1980; Gelman, Meng, and Stern, 1996)

    It form an empirical distribution for the predictive discrepancy,
    q(T) = \int p(T(x) | z) q(z) dz
    by drawing S replicated data sets xrep and calculating T(xrep) for each data set.

    Arguments
    ----------
    T: function
        Test statistic.
    likelihood: class with 'sample' method
        likelihood distribution p(x | z) to sample from
    latent: class with 'sample' method
        latent variable distribution q(z) to sample from
    S: int, optional
        number of replicated data sets for the predictive check

    Returns
    -------
    np.ndarray
        Vector of S elements, (T(xrep^1), ..., T(xrep^S))
    """

extend data class to enable subsampling

The Data class should have a unified data structure for housing the data, from which we can write these methods on that data structure. Then it should output the corresponding object of class Tensor, dict or np.ndarray depending on the modeling language.

float64

write all computation to use float64, e.g., use tf.float64, np.float64 when specifying dtypes, and cast objects to float64 if necessary.

python 3 compatibility

automate choice of loss function

for mean-field variational inference:

score function gradient (general form)
reparameterization gradient (general form)
score function gradient when H(q) is analytic
reparameterization gradient when H(q) is analytic
score function gradient when KL(q(z) || p(z)) is analytic
reparameterization gradient when KL(q(z) || p(z)) is analytic

(I don't know how to tell the algorithm when KL(q(z) || p(z)) is analytic, or generally any portions of the generative model.)

unit testing

Start formalizing unit testing procedure. #1 also relevant

generalize to use an optimizer function, and not have the optimization inside the VI class

convergence diagnostics

a VIBES-like modeling language

It'd be nice to have a language to specify directed acyclic graphs, so we can take advantage of Rao-Blackwellization/Markov blankets. Also it'd be nice to take advantage of the case when all full conditional distributions are an exponential family.

data subsampling and streaming

We need to generalize our inference and model objects as follows:

to do data subsampling as a default, for data sets of size greater than some threshold
to still enable posterior inference when there is no data
Data will be a separate object which is passed into the model. Therefore the model object's method log_prob(zs) should also be a function of data points x.

empirical Bayes

learning model parameters

approximate MML (variational EM, joint learning of variational parameters + model parameters)
exact MML (gradient-based marginal optimization)

add model examples

It'd be great to have high profile model examples which we can highlight on the front page. Some ideas:

We can think about the choice of inference algorithm and data for the above later.

match outputs of distribution methods to SciPy convention

Currently the output for distribution methods (e.g., logpdf(), logpmf(), entropy()) is a scalar.

SciPy's calls have their output vary dimensions depending on the input. For example, the following outputs a scalar, 1-dimensional vector, 1x1 matrix respectively.

from scipy import stats

print(stats.norm.logpdf(0.0))
print(stats.norm.logpdf([0.0]))
print(stats.norm.logpdf([[0.0]]))

We've so far followed SciPy convention for names (distributions, methods, arguments) and the flexibility in the input by casting and squeezing. Therefore I think we should also follow this output behavior.

use more samples to evaluate and print objective function

we currently use n_minibatch to define the number of samples we use for Monte Carlo estimation of the gradient.

we also use the same n_minibatch to evaluate the objective function (e.g. the ELBO) at every n_print iterations.

we might want to use a larger number n_minibatch_obj to evaluate the objective function. this would help in assessing convergence.

decide on time schedule for public release

I think the library is ready to go public once we have:

hierarchical variational models
data subsampling
interesting model examples
documentation on how to use the repo as a research tool
finalized the software name.

We also need to figure out how to deal with active research branches on a public repo. We want these active research branches to be private. I think the best answer to this is to fork the public repo, as a private repo still on github.com/Blei-Lab. Then we have branches on the private repo, and whenever we think it should go into the public repo itself, we submit a pull request to the public repo.

enable manual specification of model gradient and wrapper for Stan models

Our algorithms perform inference on a Model class and requires only one thing from the class: the method log_prob(self, zs), which takes a n_minibatch x d matrix of latent variables z and outputs a vector [log p(x, z_{1,:}), ..., log p(x, z_{n_minibatch,:})]^T. (Any data x is fixed and stored inside the class.)

In cases where we use the reparameterization gradient, we also require the gradient of the log_prob() function with respect to z. This is done automatically if log_prob() is implemented in TensorFlow.

We want to extend this to a manual specification of the gradient. The motivation is twofold:

we want to be able to specify the model with a Stan program and use its log_prob() and grad_log_prob functions.
a user may not be familiar with TensorFlow and prefers to implement the model in vanilla numpy/scipy, in which they will hand-derive the gradient;

All of this should go behind the scenes: the algorithms will simply check if the grad_log_prob() method exists. If it exists, it uses the function. If it doesn't exist, then it tries to autodiff it using TensorFlow. This solves 1. To solve 2, we also require a wrapper than wraps a Stan program and data into this class with the two methods log_prob and grad_log_prob.

test for test_mfgaussian_log_prob_zi.py fails

This is a result of us changing how we parameterize the distributions, and letting the TensorFlow variables live outside the variational classes.

stop using fixed seed for examples/tests

let's make sure we don't write an entire library that only works for ed.set_seed(42)

test_pythonmodel.py and test_stanmodel_py_log_prob.py are broken

be backend agnostic

Generalize the current TensorFlow backend to also enable the use of Theano as an alternative.

TensorFuse
How Keras does this.

vectorize calls to distribution log densities

Consider a B x d array of zs, where a row corresponds to one sample of a d-dimensional latent variable, and we have a mini-batch of size B.

Univariate Distributions
For mean-field methods, we'd like to do something like call bernoulli.logpmf(zs[:, i], p), where p is a scalar in [0,1]. This returns a B-dimensional vector,

[ log Bernoulli(zs[1, i] | p), ..., log Bernoulli(zs[B, i] | p) ]^T

For a univariate distribution, it takes a B-dimensional input and returns a B-dimensional output.

Multivariate Distributions
Consider a d-dimensional multivariate Gaussian. We call multivariate_normal.logpdf(zs.transpose(), mu, Sigma), where mu is d-dimensional, Sigma is d x d, and it returns a B-dimensional vector

[ log Normal(zs[1, :] | mu, Sigma), ..., log Normal(zs[B, :] | mu, Sigma) ]^T

For a d-dimensional distribution, it takes a B x d matrix of inputs and returns a B-dimensional output.

SciPy does this too!

from scipy import stats

#4-d vector input, univariate normal
stats.norm.logpdf([0.0, 1.0, 1.0, 2.0], loc=0, scale=1)
## array([-0.91893853, -1.41893853, -1.41893853, -2.91893853])

#4 x 2 matrix input, 2-d normal
stats.multivariate_normal.logpdf(
    np.array([[0.0, 0.0], [1.0, 1.0], [2.0, 2.0], [3.0, 3.0]]), 
    mean=np.zeros(2), cov=np.diag(np.ones(2)))
## array([ -1.83787707,  -2.83787707,  -5.83787707, -10.83787707])

Higher-dimensional arguments
We can also consider something like bernoulli.logpmf(zs[:, i], ps), where not only is zs[:, i] a M-dimensional vector but ps is also a M-dimensional vector (in [0,1]^d). I propose not doing this. This is bound to lead to bugs. Any time this comes up, I propose we do individual calls, bernoulli.logpmf(zs[1, i], ps[i]) and so on.

(I don't know a situation where this comes up enough that vectorizing this computation is crucial. If we notice this we can make the change. I don't think SciPy allows this either.)

integrate with travisci

https://travis-ci.org

streaming data

Implementation-wise, this is the setting where we can't store all of the data inside the Data class. What's the best way to couple this problem when streaming is integrated into the iterations of inference.run()?

hamiltonian monte carlo

reference implementations

think about what it means to "default" to reparameterization gradient

we currently default to the reparameterization gradient if the Variational class implements reparam

however, if the Inference class does not support reparameterization gradients (e.g. KLpq) then it doesn't matter whether the Variational class implements it or not.

tensorflow support

pls respond

that would be amazing :)

Wrong link

Congrats Dustin for Edward! It seems very promising for easing the inference of probabilistic models.

I've seen that documentation is in progress, but let me report a couple of things, in case you'd like to fix.
In Tutorial for Research there's a wrong link which points to stan example instead of Tf.
"Here is a toy script that uses this model which is written in TensorFlow."

and where you say _Numpy/SciPy, I guess that you refer to _Stan, since the link points to stan and Numpy/SciPy example is above.
"Here is another toy script that uses the same model written in NumPy/SciPy."

add support for model compositionality with keras

Seems like keras is winning out of all the deep learning abstraction-based libraries.

It would great to be compatible with keras' compositionality for adding neural network layers. This would be convenient for stacking NN layers following their Sequential and Graph Model classes, as options to parameterize the probability model and/or the variational model.

It looks like all we need to do is write a wrapper around their evaluate() method, and hack evaluation of the log-likelihood as the compiled model object's "loss".

Docstring convention

Convert everything to NumPy/SciPy standards. More documentation on developer process in our wiki page.

blei-lab / edward Goto Github PK

edward's People

Contributors

Stargazers

Watchers

Forkers

edward's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs