blei-lab / edward Goto Github PK
View Code? Open in Web Editor NEWA probabilistic programming language in TensorFlow. Deep generative models, variational inference.
Home Page: http://edwardlib.org
License: Other
A probabilistic programming language in TensorFlow. Deep generative models, variational inference.
Home Page: http://edwardlib.org
License: Other
Discrete
Continuous
useful references:
See the branch original for its NumPy/SciPy implementation.
This will require quite a bit of foresight. We need the right structure to allow future algorithmic endeavors, e.g., sampling routines, meta-inference/hyperparameter optimization, optimization-based stuff such as tempering.
Given the current structure, we should be fine to do the stuff we currently want to do (inference on HVMs, alpha/f-divergences).
Can I define/learn a stochastic block model (SBM) with edward, and if I can what inference engine should I use?
With pymc3 I would define SBM, for some value K (number of blocks/clusters) as such:
# group assignment probabilities
pi = Dirichlet(name='pi', a=5 * np.ones(K))
# group assignments for nodes
q = Categorical(name='q', p=pi, shape=N)
# connection probabilities among groups
eta = Beta(name='eta', alpha=1, beta=1, shape=(K, K))
# adjacency matrix
R = Bernoulli(name='R', p=eta[q, :][:, q], observed=R_observed, shape=(N, N))
As a probabilistic graphical model it would look like this:
In other words I model my graph (with N nodes and some edges) by modeling the adjacency matrix R
. Each edge i, j
is independent from the other edges and is generated as such:
pi
from Dirichletq_i
the categorical cluster assignment for the i-th node given pi
q_j
the categorical cluster assignment for the j-th node given pi
K x K
matrix of Beta variables pick the Beta variable corresponding to the cluster assignments (e.g. q_i=a
and q_j=b
=> we sample from the Beta variable at index a, b
)The adjacency matrix R
is what is observed, and I would like to learn the latent eta
and q
. The MCMC approach with pycm3 works well enough, but a variational approach might be better.
If this kind of question is not suited here please let me know.
What's the right abstraction, and base class methods and members that all variational model classes should share? Further, how do we mix and match them up, so it's not as blocky as "MFGaussian" but can, e.g., be a choice of variational family for each dimension, or specification of a joint distribution.
This choice will be particularly relevant for designing classes for hierarchical variational models, in which you will have some arbitrary stacking of these guys.
This is the first page users will read when they learn how to use the library.
I'm not sure how much this affects our current runtimes, but right now, in order to sample from q, we have to run the session to realize the variational parameters and then sample using scipy.
TensorFlow supports uniform and standard normal, and from there we can try to use clever hacks in the literature to get fast and numerically stable samples from other distributions (Devroye, 1986?). Here are some off the top of my head:
tf.random_normal
)tf.random_normal
)tf.select
from a uniform)eps ~ Unif(0,1)
, log(1/eps)
transform it)References
It's already taken on the python package index :( https://pypi.python.org/pypi/blackbox/0.7.1
there are many small functions in edward/util.py
that we depend on.
let's write unit tests for them.
How do we subsample data for matrices, ragged arrays (e.g., time series), tensors, etc.?
dave brought up the idea of abstracting the data specification: the users will have one way to specify the data and in the backend the data provided will be morphed into a stan data model, or a tensorflow data model etc...
GPy has incredible tutorial notebooks. They teach basic concepts related to GPs and use GPy as a basis for explaining these concepts. I think it's a great idea, and I've personally found them useful as I dug though the GP literature.
I think we can do something similar. Each notebook instructively explains a concept. The code is similar to the end-to-end examples currently in the repo, but with more words and explanation about the various options that are used. Here are some examples of concepts to teach:
functions from data point to variational parameters
useful references:
We want to support modeling languages which are either popular or are useful for certain tasks over the alternatives we support. With that in mind, pymc3 seems appealing for specifying large discrete latent variable models. You can't write them as a Stan program, and it could be rather annoying to code them up in raw TensorFlow or NumPy/SciPy.
On the other hand, it's one more modeling language to maintain; pymc3 actually uses Theano as a backend, which may lead to some bugs(?); and I don't know how popular pymc3 would be as a use case over the other supported languages.
see #62 (comment)
what are good default functions we should have for plotting?
Right now in all examples we run inference and stop after looking at the ELBO via the print progress. We should have a simple function to do predictive checks (both prior and posterior).
Here's a general spec:
def predictive_check(T, likelihood, latent, S=100):
"""
Predictive check. It is a prior predictive check if latent is the prior
and a posterior predictive check if latent is the posterior.
(Box, 1980; Gelman, Meng, and Stern, 1996)
It form an empirical distribution for the predictive discrepancy,
q(T) = \int p(T(x) | z) q(z) dz
by drawing S replicated data sets xrep and calculating T(xrep) for each data set.
Arguments
----------
T: function
Test statistic.
likelihood: class with 'sample' method
likelihood distribution p(x | z) to sample from
latent: class with 'sample' method
latent variable distribution q(z) to sample from
S: int, optional
number of replicated data sets for the predictive check
Returns
-------
np.ndarray
Vector of S elements, (T(xrep^1), ..., T(xrep^S))
"""
The Data
class should have a unified data structure for housing the data, from which we can write these methods on that data structure. Then it should output the corresponding object of class Tensor
, dict
or np.ndarray
depending on the modeling language.
write all computation to use float64, e.g., use tf.float64, np.float64 when specifying dtype
s, and cast objects to float64 if necessary.
for mean-field variational inference:
(I don't know how to tell the algorithm when KL(q(z) || p(z)) is analytic, or generally any portions of the generative model.)
Start formalizing unit testing procedure. #1 also relevant
It'd be nice to have a language to specify directed acyclic graphs, so we can take advantage of Rao-Blackwellization/Markov blankets. Also it'd be nice to take advantage of the case when all full conditional distributions are an exponential family.
We need to generalize our inference and model objects as follows:
log_prob(zs)
should also be a function of data points x
.learning model parameters
It'd be great to have high profile model examples which we can highlight on the front page. Some ideas:
We can think about the choice of inference algorithm and data for the above later.
Currently the output for distribution methods (e.g., logpdf()
, logpmf()
, entropy()
) is a scalar.
SciPy's calls have their output vary dimensions depending on the input. For example, the following outputs a scalar, 1-dimensional vector, 1x1 matrix respectively.
from scipy import stats
print(stats.norm.logpdf(0.0))
print(stats.norm.logpdf([0.0]))
print(stats.norm.logpdf([[0.0]]))
We've so far followed SciPy convention for names (distributions, methods, arguments) and the flexibility in the input by casting and squeezing. Therefore I think we should also follow this output behavior.
we currently use n_minibatch
to define the number of samples we use for Monte Carlo estimation of the gradient.
we also use the same n_minibatch
to evaluate the objective function (e.g. the ELBO) at every n_print
iterations.
we might want to use a larger number n_minibatch_obj
to evaluate the objective function. this would help in assessing convergence.
I think the library is ready to go public once we have:
We also need to figure out how to deal with active research branches on a public repo. We want these active research branches to be private. I think the best answer to this is to fork the public repo, as a private repo still on github.com/Blei-Lab. Then we have branches on the private repo, and whenever we think it should go into the public repo itself, we submit a pull request to the public repo.
Our algorithms perform inference on a Model
class and requires only one thing from the class: the method log_prob(self, zs)
, which takes a n_minibatch x d
matrix of latent variables z
and outputs a vector [log p(x, z_{1,:}), ..., log p(x, z_{n_minibatch,:})]^T
. (Any data x
is fixed and stored inside the class.)
In cases where we use the reparameterization gradient, we also require the gradient of the log_prob()
function with respect to z
. This is done automatically if log_prob()
is implemented in TensorFlow.
We want to extend this to a manual specification of the gradient. The motivation is twofold:
log_prob()
and grad_log_prob
functions.All of this should go behind the scenes: the algorithms will simply check if the grad_log_prob()
method exists. If it exists, it uses the function. If it doesn't exist, then it tries to autodiff it using TensorFlow. This solves 1. To solve 2, we also require a wrapper than wraps a Stan program and data into this class with the two methods log_prob
and grad_log_prob
.
This is a result of us changing how we parameterize the distributions, and letting the TensorFlow variables live outside the variational classes.
let's make sure we don't write an entire library that only works for ed.set_seed(42)
Generalize the current TensorFlow backend to also enable the use of Theano as an alternative.
Consider a B x d
array of zs
, where a row corresponds to one sample of a d
-dimensional latent variable, and we have a mini-batch of size B
.
Univariate Distributions
For mean-field methods, we'd like to do something like call bernoulli.logpmf(zs[:, i], p)
, where p
is a scalar in [0,1]. This returns a B
-dimensional vector,
[ log Bernoulli(zs[1, i] | p), ..., log Bernoulli(zs[B, i] | p) ]^T
For a univariate distribution, it takes a B
-dimensional input and returns a B
-dimensional output.
Multivariate Distributions
Consider a d
-dimensional multivariate Gaussian. We call multivariate_normal.logpdf(zs.transpose(), mu, Sigma)
, where mu
is d-dimensional, Sigma
is d x d, and it returns a B
-dimensional vector
[ log Normal(zs[1, :] | mu, Sigma), ..., log Normal(zs[B, :] | mu, Sigma) ]^T
For a d
-dimensional distribution, it takes a B x d
matrix of inputs and returns a B
-dimensional output.
SciPy does this too!
from scipy import stats
#4-d vector input, univariate normal
stats.norm.logpdf([0.0, 1.0, 1.0, 2.0], loc=0, scale=1)
## array([-0.91893853, -1.41893853, -1.41893853, -2.91893853])
#4 x 2 matrix input, 2-d normal
stats.multivariate_normal.logpdf(
np.array([[0.0, 0.0], [1.0, 1.0], [2.0, 2.0], [3.0, 3.0]]),
mean=np.zeros(2), cov=np.diag(np.ones(2)))
## array([ -1.83787707, -2.83787707, -5.83787707, -10.83787707])
Higher-dimensional arguments
We can also consider something like bernoulli.logpmf(zs[:, i], ps)
, where not only is zs[:, i]
a M-dimensional vector but ps
is also a M
-dimensional vector (in [0,1]^d). I propose not doing this. This is bound to lead to bugs. Any time this comes up, I propose we do individual calls, bernoulli.logpmf(zs[1, i], ps[i])
and so on.
(I don't know a situation where this comes up enough that vectorizing this computation is crucial. If we notice this we can make the change. I don't think SciPy allows this either.)
Implementation-wise, this is the setting where we can't store all of the data inside the Data
class. What's the best way to couple this problem when streaming is integrated into the iterations of inference.run()
?
reference implementations
we currently default to the reparameterization gradient if the Variational
class implements reparam
however, if the Inference
class does not support reparameterization gradients (e.g. KLpq
) then it doesn't matter whether the Variational
class implements it or not.
pls respond
that would be amazing :)
Congrats Dustin for Edward! It seems very promising for easing the inference of probabilistic models.
I've seen that documentation is in progress, but let me report a couple of things, in case you'd like to fix.
In Tutorial for Research there's a wrong link which points to stan example instead of Tf.
"Here is a toy script that uses this model which is written in TensorFlow."
and where you say _Numpy/SciPy, I guess that you refer to _Stan, since the link points to stan and Numpy/SciPy example is above.
"Here is another toy script that uses the same model written in NumPy/SciPy."
Seems like keras is winning out of all the deep learning abstraction-based libraries.
It would great to be compatible with keras' compositionality for adding neural network layers. This would be convenient for stacking NN layers following their Sequential and Graph Model classes, as options to parameterize the probability model and/or the variational model.
It looks like all we need to do is write a wrapper around their evaluate()
method, and hack evaluation of the log-likelihood as the compiled model object's "loss".
Convert everything to NumPy/SciPy standards. More documentation on developer process in our wiki page.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.