GithubHelp home page GithubHelp logo

blei-lab / edward Goto Github PK

View Code? Open in Web Editor NEW
4.8K 4.8K 764.0 29.17 MB

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

Home Page: http://edwardlib.org

License: Other

Python 29.88% Makefile 0.12% Jupyter Notebook 69.81% Smarty 0.01% Dockerfile 0.18%
bayesian-methods data-science deep-learning machine-learning neural-networks probabilistic-programming statistics tensorflow

edward's People

Contributors

adjidieng avatar akucukelbir avatar bertini36 avatar bhargavvader avatar cavaunpeu avatar cbonnett avatar chmp avatar closedloop avatar dawenl avatar dustinvtran avatar fmpr avatar isomap avatar jamestwebber avatar janekberger avatar jkerfs avatar jmxpearson avatar kashif avatar konstantinlukaschenko avatar lbollar avatar mariru avatar matthewdhoffman avatar nfoti avatar patrickeganfoley avatar rdipietro avatar rw avatar siddharth-agrawal avatar stephenra avatar timshell avatar twiecki avatar yoshikawamasashi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

edward's Issues

unit testing

Start formalizing unit testing procedure. #1 also relevant

add model examples

It'd be great to have high profile model examples which we can highlight on the front page. Some ideas:

  • Bayesian linear regression
  • Hierarchical generalized linear model
  • Mixture model
  • Latent Dirichlet allocation
  • Probabilistic matrix factorization
  • Hidden Markov model
  • Stochastic block model
  • Undirected graphical model/Markov random field
  • Dirichlet process mixture model
  • Gaussian process
  • Poisson process
  • Bayesian neural network
  • Variational auto-encoder
  • Deep latent Gaussian model
  • Generative adversarial network
  • Bayesian word embedding model
  • Language model with LSTM
  • Sparse-Gamma deep exponential family
  • DRAW

We can think about the choice of inference algorithm and data for the above later.

variance reduction techniques

  • use analytic form of q's entropy when available
  • use analytic form of KL(q(z) || p(z)) when available
  • rao blackwellization
  • control variates (the vanilla one, nothing fancy yet)

support for essential log density functions

Discrete

  • Bernoulli
  • Binomial
  • Multinomial
  • Poisson
  • Geometric
  • Negative Binomial

Continuous

  • Uniform
  • Multivariate Normal
  • Student-t (with mean and scale parameters)
  • Gamma
  • Inverse Gamma
  • Exponential
  • Beta
  • Dirichlet
  • Log-Normal
  • Chi-squared

useful references:

use more samples to evaluate and print objective function

we currently use n_minibatch to define the number of samples we use for Monte Carlo estimation of the gradient.

we also use the same n_minibatch to evaluate the objective function (e.g. the ELBO) at every n_print iterations.

we might want to use a larger number n_minibatch_obj to evaluate the objective function. this would help in assessing convergence.

hacks to sample from distributions in tensorflow

I'm not sure how much this affects our current runtimes, but right now, in order to sample from q, we have to run the session to realize the variational parameters and then sample using scipy.

TensorFlow supports uniform and standard normal, and from there we can try to use clever hacks in the literature to get fast and numerically stable samples from other distributions (Devroye, 1986?). Here are some off the top of my head:

  • Uniform (built-in)
  • Normal (location-scale transform of tf.random_normal)
  • Multivariate Normal (location-scale transform of tf.random_normal)
  • Log-Normal (sample from normal, then exp() transform it)
  • Bernoulli (tf.select from a uniform)
  • Multinomial (Gumbel arg max trick)
  • Exponential (eps ~ Unif(0,1), log(1/eps) transform it)
  • Inverse Gamma (sample from Gamma, then 1/() transform it)

References

utility function to perform predictive checks

Right now in all examples we run inference and stop after looking at the ELBO via the print progress. We should have a simple function to do predictive checks (both prior and posterior).

Here's a general spec:

def predictive_check(T, likelihood, latent, S=100):       
    """
    Predictive check. It is a prior predictive check if latent is the prior
    and a posterior predictive check if latent is the posterior.
    (Box, 1980; Gelman, Meng, and Stern, 1996)

    It form an empirical distribution for the predictive discrepancy,
    q(T) = \int p(T(x) | z) q(z) dz
    by drawing S replicated data sets xrep and calculating T(xrep) for each data set.

    Arguments
    ----------
    T: function
        Test statistic.
    likelihood: class with 'sample' method
        likelihood distribution p(x | z) to sample from
    latent: class with 'sample' method
        latent variable distribution q(z) to sample from
    S: int, optional
        number of replicated data sets for the predictive check

    Returns
    -------
    np.ndarray
        Vector of S elements, (T(xrep^1), ..., T(xrep^S))
    """

add support for model compositionality with keras

Seems like keras is winning out of all the deep learning abstraction-based libraries.

It would great to be compatible with keras' compositionality for adding neural network layers. This would be convenient for stacking NN layers following their Sequential and Graph Model classes, as options to parameterize the probability model and/or the variational model.

It looks like all we need to do is write a wrapper around their evaluate() method, and hack evaluation of the log-likelihood as the compiled model object's "loss".

Wrong link

Congrats Dustin for Edward! It seems very promising for easing the inference of probabilistic models.

I've seen that documentation is in progress, but let me report a couple of things, in case you'd like to fix.
In Tutorial for Research there's a wrong link which points to stan example instead of Tf.
"Here is a toy script that uses this model which is written in TensorFlow."

and where you say _Numpy/SciPy, I guess that you refer to _Stan, since the link points to stan and Numpy/SciPy example is above.
"Here is another toy script that uses the same model written in NumPy/SciPy."

design base class for inference algorithms

This will require quite a bit of foresight. We need the right structure to allow future algorithmic endeavors, e.g., sampling routines, meta-inference/hyperparameter optimization, optimization-based stuff such as tempering.

Given the current structure, we should be fine to do the stuff we currently want to do (inference on HVMs, alpha/f-divergences).

extend data class to enable subsampling

The Data class should have a unified data structure for housing the data, from which we can write these methods on that data structure. Then it should output the corresponding object of class Tensor, dict or np.ndarray depending on the modeling language.

a VIBES-like modeling language

It'd be nice to have a language to specify directed acyclic graphs, so we can take advantage of Rao-Blackwellization/Markov blankets. Also it'd be nice to take advantage of the case when all full conditional distributions are an exponential family.

empirical Bayes

learning model parameters

  • approximate MML (variational EM, joint learning of variational parameters + model parameters)
  • exact MML (gradient-based marginal optimization)

enable manual specification of model gradient and wrapper for Stan models

Our algorithms perform inference on a Model class and requires only one thing from the class: the method log_prob(self, zs), which takes a n_minibatch x d matrix of latent variables z and outputs a vector [log p(x, z_{1,:}), ..., log p(x, z_{n_minibatch,:})]^T. (Any data x is fixed and stored inside the class.)

In cases where we use the reparameterization gradient, we also require the gradient of the log_prob() function with respect to z. This is done automatically if log_prob() is implemented in TensorFlow.

We want to extend this to a manual specification of the gradient. The motivation is twofold:

  1. we want to be able to specify the model with a Stan program and use its log_prob() and grad_log_prob functions.
  2. a user may not be familiar with TensorFlow and prefers to implement the model in vanilla numpy/scipy, in which they will hand-derive the gradient;

All of this should go behind the scenes: the algorithms will simply check if the grad_log_prob() method exists. If it exists, it uses the function. If it doesn't exist, then it tries to autodiff it using TensorFlow. This solves 1. To solve 2, we also require a wrapper than wraps a Stan program and data into this class with the two methods log_prob and grad_log_prob.

data subsampling and streaming

We need to generalize our inference and model objects as follows:

  1. to do data subsampling as a default, for data sets of size greater than some threshold
  2. to still enable posterior inference when there is no data
  3. Data will be a separate object which is passed into the model. Therefore the model object's method log_prob(zs) should also be a function of data points x.

support for pymc3 as a modeling language?

We want to support modeling languages which are either popular or are useful for certain tasks over the alternatives we support. With that in mind, pymc3 seems appealing for specifying large discrete latent variable models. You can't write them as a Stan program, and it could be rather annoying to code them up in raw TensorFlow or NumPy/SciPy.

On the other hand, it's one more modeling language to maintain; pymc3 actually uses Theano as a backend, which may lead to some bugs(?); and I don't know how popular pymc3 would be as a use case over the other supported languages.

tutorials

GPy has incredible tutorial notebooks. They teach basic concepts related to GPs and use GPy as a basis for explaining these concepts. I think it's a great idea, and I've personally found them useful as I dug though the GP literature.

I think we can do something similar. Each notebook instructively explains a concept. The code is similar to the end-to-end examples currently in the repo, but with more words and explanation about the various options that are used. Here are some examples of concepts to teach:

  • Black box variational inference
  • Mixture modeling with stochastic variational inference
  • Posterior predictive check

unifying data specification

dave brought up the idea of abstracting the data specification: the users will have one way to specify the data and in the backend the data provided will be morphed into a stan data model, or a tensorflow data model etc...

automate choice of loss function

for mean-field variational inference:

  • score function gradient (general form)
  • reparameterization gradient (general form)
  • score function gradient when H(q) is analytic
  • reparameterization gradient when H(q) is analytic
  • score function gradient when KL(q(z) || p(z)) is analytic
  • reparameterization gradient when KL(q(z) || p(z)) is analytic

(I don't know how to tell the algorithm when KL(q(z) || p(z)) is analytic, or generally any portions of the generative model.)

streaming data

Implementation-wise, this is the setting where we can't store all of the data inside the Data class. What's the best way to couple this problem when streaming is integrated into the iterations of inference.run()?

match outputs of distribution methods to SciPy convention

Currently the output for distribution methods (e.g., logpdf(), logpmf(), entropy()) is a scalar.

SciPy's calls have their output vary dimensions depending on the input. For example, the following outputs a scalar, 1-dimensional vector, 1x1 matrix respectively.

from scipy import stats

print(stats.norm.logpdf(0.0))
print(stats.norm.logpdf([0.0]))
print(stats.norm.logpdf([[0.0]]))

We've so far followed SciPy convention for names (distributions, methods, arguments) and the flexibility in the input by casting and squeezing. Therefore I think we should also follow this output behavior.

float64

write all computation to use float64, e.g., use tf.float64, np.float64 when specifying dtypes, and cast objects to float64 if necessary.

design base class for variational models

What's the right abstraction, and base class methods and members that all variational model classes should share? Further, how do we mix and match them up, so it's not as blocky as "MFGaussian" but can, e.g., be a choice of variational family for each dimension, or specification of a joint distribution.

This choice will be particularly relevant for designing classes for hierarchical variational models, in which you will have some arbitrary stacking of these guys.

Stochastic Block Model

Can I define/learn a stochastic block model (SBM) with edward, and if I can what inference engine should I use?

With pymc3 I would define SBM, for some value K (number of blocks/clusters) as such:

# group assignment probabilities
pi = Dirichlet(name='pi', a=5 * np.ones(K)) 
# group assignments for nodes
q = Categorical(name='q', p=pi, shape=N) 
# connection probabilities among groups
eta = Beta(name='eta', alpha=1, beta=1, shape=(K, K))
# adjacency matrix
R = Bernoulli(name='R', p=eta[q, :][:, q], observed=R_observed, shape=(N, N))

As a probabilistic graphical model it would look like this:

In other words I model my graph (with N nodes and some edges) by modeling the adjacency matrix R. Each edge i, j is independent from the other edges and is generated as such:

  1. sample pi from Dirichlet
  2. sample q_i the categorical cluster assignment for the i-th node given pi
  3. sample q_j the categorical cluster assignment for the j-th node given pi
  4. from the K x K matrix of Beta variables pick the Beta variable corresponding to the cluster assignments (e.g. q_i=a and q_j=b => we sample from the Beta variable at index a, b)
  5. form an edge according to a Bernoulli with probability of 1 given the chosen Beta in the previous step

The adjacency matrix R is what is observed, and I would like to learn the latent eta and q. The MCMC approach with pycm3 works well enough, but a variational approach might be better.

If this kind of question is not suited here please let me know.

decide on time schedule for public release

I think the library is ready to go public once we have:

  • hierarchical variational models
  • data subsampling
  • interesting model examples
  • documentation on how to use the repo as a research tool
  • finalized the software name.

We also need to figure out how to deal with active research branches on a public repo. We want these active research branches to be private. I think the best answer to this is to fork the public repo, as a private repo still on github.com/Blei-Lab. Then we have branches on the private repo, and whenever we think it should go into the public repo itself, we submit a pull request to the public repo.

vectorize calls to distribution log densities

Consider a B x d array of zs, where a row corresponds to one sample of a d-dimensional latent variable, and we have a mini-batch of size B.

Univariate Distributions
For mean-field methods, we'd like to do something like call bernoulli.logpmf(zs[:, i], p), where p is a scalar in [0,1]. This returns a B-dimensional vector,

[ log Bernoulli(zs[1, i] | p), ..., log Bernoulli(zs[B, i] | p) ]^T

For a univariate distribution, it takes a B-dimensional input and returns a B-dimensional output.

Multivariate Distributions
Consider a d-dimensional multivariate Gaussian. We call multivariate_normal.logpdf(zs.transpose(), mu, Sigma), where mu is d-dimensional, Sigma is d x d, and it returns a B-dimensional vector

[ log Normal(zs[1, :] | mu, Sigma), ..., log Normal(zs[B, :] | mu, Sigma) ]^T

For a d-dimensional distribution, it takes a B x d matrix of inputs and returns a B-dimensional output.

SciPy does this too!

from scipy import stats

#4-d vector input, univariate normal
stats.norm.logpdf([0.0, 1.0, 1.0, 2.0], loc=0, scale=1)
## array([-0.91893853, -1.41893853, -1.41893853, -2.91893853])

#4 x 2 matrix input, 2-d normal
stats.multivariate_normal.logpdf(
    np.array([[0.0, 0.0], [1.0, 1.0], [2.0, 2.0], [3.0, 3.0]]), 
    mean=np.zeros(2), cov=np.diag(np.ones(2)))
## array([ -1.83787707,  -2.83787707,  -5.83787707, -10.83787707])

Higher-dimensional arguments
We can also consider something like bernoulli.logpmf(zs[:, i], ps), where not only is zs[:, i] a M-dimensional vector but ps is also a M-dimensional vector (in [0,1]^d). I propose not doing this. This is bound to lead to bugs. Any time this comes up, I propose we do individual calls, bernoulli.logpmf(zs[1, i], ps[i]) and so on.

(I don't know a situation where this comes up enough that vectorizing this computation is crucial. If we notice this we can make the change. I don't think SciPy allows this either.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.