thu-ml / zhusuan Goto Github PK

View Code? Open in Web Editor NEW

2.2K 144.0 421.0 1.79 MB

A probabilistic programming library for Bayesian deep learning, generative models, based on Tensorflow

Home Page: http://zhusuan.readthedocs.io

License: MIT License

Python 100.00%

bayesian-inference probabilistic-programming graphical-models generative-models deep-learning

zhusuan's Introduction

ZhuSuan is a Python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and deep learning. ZhuSuan is built upon TensorFlow. Unlike existing deep learning libraries, which are mainly designed for deterministic neural networks and supervised tasks, ZhuSuan provides deep learning style primitives and algorithms for building probabilistic models and applying Bayesian inference. The supported inference algorithms include:

Variational Inference (VI) with programmable variational posteriors, various objectives and advanced gradient estimators (SGVB, REINFORCE, VIMCO, etc.).
Importance Sampling (IS) for learning and evaluating models, with programmable proposals.
Hamiltonian Monte Carlo (HMC) with parallel chains, and optional automatic parameter tuning.
Stochastic Gradient Markov Chain Monte Carlo (SGMCMC): SGLD, PSGLD, SGHMC, and SGNHT.

Installation

ZhuSuan is still under development. Before the first stable release (1.0), please clone the repository and run

pip install .

in the main directory. This will install ZhuSuan and its dependencies automatically. ZhuSuan also requires TensorFlow 1.13.0 or later. Because users should choose whether to install the cpu or gpu version of TensorFlow, we do not include it in the dependencies. See Installing TensorFlow.

If you are developing ZhuSuan, you may want to install in an "editable" or "develop" mode. Please refer to the Contributing section below.

Documentation

Examples

We provide examples on traditional hierarchical Bayesian models and recent deep generative models.

To run the provided examples, you may need extra dependencies to be installed. This can be done by

pip install ".[examples]"

Gaussian: HMC
Toy 2D Intractable Posterior: SGVB
Bayesian Neural Networks: SGVB, SGMCMC
Variational Autoencoder (VAE): SGVB, IWAE
Convolutional VAE: SGVB
Semi-supervised VAE (Kingma, 2014): SGVB, Adaptive IS
Deep Sigmoid Belief Networks Adaptive IS, VIMCO
Logistic Normal Topic Model: HMC
Probabilistic Matrix Factorization: HMC
Sparse Variational Gaussian Process: SGVB

Citing ZhuSuan

If you find ZhuSuan useful, please cite it in your publications. We provide a BibTeX entry of the ZhuSuan white paper below.

@ARTICLE{zhusuan2017,
    title={Zhu{S}uan: A Library for {B}ayesian Deep Learning},
    author={Shi, Jiaxin and Chen, Jianfei. and Zhu, Jun and Sun, Shengyang
    and Luo, Yucen and Gu, Yihong and Zhou, Yuhao},
    journal={arXiv preprint arXiv:1709.05870},
    year=2017,
}

Contributing

We always welcome contributions to help make ZhuSuan better. If you would like to contribute, please check out the guidelines here.

zhusuan's People

Contributors

Stargazers

Watchers

Forkers

thjashin yanan-zheng statml wycharry allensmile liuxin11235 wujsact linsong8208 yiyang92 winnerineast tobechao eric-seekas inspur-iop linkdo2 benjamesbabala cosmozhang billhsia caibing1872 bit10011001 world2005 leoleishi ml-lab hannah1999 alfiyazi xuanhan863 scarita oftensmile lfthwjx daisenryaku chenghuige jaloveapple youngleec meta-inf zmoon111 xchuwenbo songmingye94 ginobilinie nagizeroiw jingyonghou xiangyongcao zhouqingping dongfeng951 lyk125 weiyuesun tigerneil dalchemisy sourcestudy lixiaosi33 jinyu0310 mcckk pharrellwang jeanru duolinwang objects-time kmustyxl merz9b jsonac xia1299 trantorrepository canoefzh yzy5630 pengchengma ranchen1994 huajh jazzeyoung wangyaobupt zhly0 squirrel1982 akinswin tusonggao wlzhong hitszzx tnlin qixiangyujj jane8816 situgongyuan zyubin cjf00000 korepwx liuhengguang eece-23 kkpop ajoeajoe lijinzhao86 tandychao s1y1111 adambear-me gitter-badger achao2013 trigrass2 solertis yancz1989 captainmushroom little1tow desperado1992 korterling adong7639 sli1989 andrewganjinrui gracekk

zhusuan's Issues

Question on the parameters representation in the example of Bayesian NNs

Best greetings, dear authors:

I truly appreciate this package and I am playing with one of the examples (Bayesian NNs).

Honestly speaking, I am not working in this specific field and have limited background on this topic. But I would really appreciate that if you could tell the physical meaning of some evaluation parameters in your provided code (bayesian_nn.py).

Starts from line 86 and ends at line 95, I am wondering what's the role of "lb_samples, ll_samples, epochs, test_freq, anneal_lr_freq, anneal_lr_rate".

Hope it's not much work and thanks for your time!

Normal does not accept np.ndarray

        beta = Normal(np.zeros((n_dims)), np.ones((n_dims)) * np.log(sigma))

vae_conv.py NAN when tensorflow upgraded

vae_conv.py has NAN bug when tensorflow is upgraded to 0.10.0rc0 from 0.9.0. Both under prettytensor (0.6.2).

Can't compute prior (local_log_prob) of a StochasticTensor inside tf.scan (in LSTM cell)

Hi,

I tried to implement the bayesian_rnn from the docs.
However, while trying to compute log_joint, I can't compute the log of prior log_pz because w is declared within a LSTM cell, so I get the following error:

Traceback (most recent call last):
  File "blstm.py", line 111, in <module>
    joint_ll = log_joint({'x': x, 'y_i': y_i, 'y_v': y_v})
  File "blstm.py", line 106, in log_joint
    log_pz, log_px_z = model.local_log_prob(['w', 'y_v'])  # Error
  File "/Users/jilljenn/code/vae/venv/lib/python3.6/site-packages/zhusuan/model/base.py", line 346, in local_log_prob
    ret.append(s_tensor.log_prob(s_tensor.tensor))
  File "/Users/jilljenn/code/vae/venv/lib/python3.6/site-packages/zhusuan/model/base.py", line 140, in log_prob
    return self._distribution.log_prob(given)
  File "/Users/jilljenn/code/vae/venv/lib/python3.6/site-packages/zhusuan/utils.py", line 215, in _func
    return f(*args, **kwargs)
  File "/Users/jilljenn/code/vae/venv/lib/python3.6/site-packages/zhusuan/distributions/base.py", line 303, in log_prob
    log_p = self._log_prob(given)
  File "/Users/jilljenn/code/vae/venv/lib/python3.6/site-packages/zhusuan/distributions/univariate.py", line 180, in _log_prob
    return c - logstd - 0.5 * precision * tf.square(given - mean)
  File "/Users/jilljenn/code/vae/venv/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 979, in binary_op_wrapper
    return func(x, y, name=name)
  File "/Users/jilljenn/code/vae/venv/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 8009, in sub
    "Sub", x=x, y=y, name=name)
  File "/Users/jilljenn/code/vae/venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/Users/jilljenn/code/vae/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/Users/jilljenn/code/vae/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1760, in __init__
    self._control_flow_post_processing()
  File "/Users/jilljenn/code/vae/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1769, in _control_flow_post_processing
    control_flow_util.CheckInputFromValidContext(self, input_tensor.op)
  File "/Users/jilljenn/code/vae/venv/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_util.py", line 263, in CheckInputFromValidContext
    raise ValueError(error_msg + " See info log for more details.")
ValueError: Cannot use 'model/scan/while/MatMul/Normal.sample/Squeeze' as input to 'Normal.log_prob/sub_1' because 'model/scan/while/MatMul/Normal.sample/Squeeze' is in a while loop. See info log for more details.

Up-to-run code:
https://github.com/jilljenn/vae/blob/master/blstm.py#L106

How to make it?

No module named examples

When I run the examples, such as python zhusuan/examples/bayesian_neural_nets/python bayesian_nn.py, I will get the error "ImportError: No module named examples". Please help me. Thank you.

Dirichlet + Categorical or Dirichlet + Multinomial toy example ?

Hello
Is it possible to add a little example for doing this:
Latent <- zs.distributions.Dirichlet(alpha_parameters)
Observations <- zs.distributions.Multinomial(Latent, n_experiments) ?

And the goal is to retrieve alpha_parameters when we have only Observations, with HMC or variationnal inference.

(My ultimate goal is to do:
Latent1 <- zs.distributions.Dirichlet(alpha_parameters)
Latent2 <- zs.distributions.Dirichlet(alpha_parameters)
Latent3 <- conv2d(Latent1,Latent2)
Observations <- zs.distributions.Multinomial(Latent3, n_experiments)
And retrieve again alpha_parameters. I could manage it with a toy Dirichlet+Cat/Mult example and especially I think that could be helpful for everyone to have such an example.)
Thanks a lot

Collaboration with TensorLayer

Hello dear friends,

I have been reading your work with much interest. With @zsdonghao @luomai @lgarithm we maintain the library TensorLayer (TL), https://github.com/tensorlayer/tensorlayer.

I think that your work can be really complementary to TL APIs.

Maybe you would be interested in working together and heading toward a common direction.

We have a chat group on Slack and/or WeChat if you wanna talk.

Have a nice day,

All the best,

Jonathan DEKHTIAR

tf compability

Please fix the tf call to stop the warning

/data/jianfei/zhusuan/zhusuan/distributions/univariate.py:69: UserWarning: Normal: The order of arguments logstd/std will change to std/logstd in the coming version.

ssl performance turns bad after recent commits

ssl performance turns bad after recent commits. 2500 epochs accuracy 96.93%

Questions about vae.py

regarding to vae.py, I have two questions：

what's the considerations to use zs.Implicit and zs.Empirical ? any benefits from using these two API?
x_mean = zs.Implicit("x_mean", tf.sigmoid(x_logits), group_ndims=1) in line 27
x = zs.Empirical('x', tf.int32, (None, x_dim)) in line 34
why the n_particles set for evaluating test_lb and test_ll differs so much (1 vs 1000)?
test_lb = sess.run(lower_bound, feed_dict={x: test_x_batch, n_particles: 1}) in line 125
test_ll = sess.run(is_log_likelihood,feed_dict={x: test_x_batch, n_particles: 1000}) in line 128

pytorch

Can zhusuan be applied within pytorch framework？
OR will it support pytorch in future？:-D

pairwise Markov random field

dear authors, thanks for this package. I am wondering if Markov random field can be implemented using zhusuan, is so, do you have a tutorial for me to go through? thank you.

Clarifying the * N in log_joint?

Hi!

Here:

N, n_x = x_train.shape

...

def log_joint(observed):
    model, _ = bayesianNN(observed, x, n_x, layer_sizes, n_particles)
    log_pws = model.local_log_prob(w_names)
    log_py_xw = model.local_log_prob('y')
    return tf.add_n(log_pws) + log_py_xw * N

Source: https://github.com/thu-ml/zhusuan/blob/master/examples/bayesian_neural_nets/bayesian_nn.py#L96

Shouldn't log_py_xw be multiplied by B the batch size corresponding to tf.shape(x)[0], instead of the size of the train set?

get_output does not accept tf.Variable as input

    y_out, beta_out = self.model.get_output([self.y, self.beta],
                                            inputs={self.y: y, self.beta: **tf.identity**(beta)})

I have some trouble translating a model from PyMC3

I'm very excited for this library, so I really want to learn how to use it. I'm translating the example Cheating among students from PyMC3 to ZhuSuan, from the book Probabilistic Programming & Bayesian Methods for Hackers.

It is a clever solution for inferring how many students cheated on an exam. It involves a "privacy algorithm" where students answer whether they cheated with the truth or randomly according to the flipping of two coins.

First coin flip:

Head => they answer with the truth. That is, if they cheated or not
Tails => they answer according to the second coin flip

Second coin flip:

Heads => they answer that they cheated
Tails => the answer that they didn't cheat

This way, we can't point to an individual student and claim that they cheated, since only the student knows the results of the coin flips. And so, their privacy remains protected.

After the survey, we observe that 35 students out of a 100 answered that they cheated. The inference problem is to infer the real probability of cheating. Note that even if no student cheated, we would expect that 25 students would answer that they cheated.

However, it seems I'm getting something wrong since my results don't look similar to the results obtained in the original code.

This is my graphical model (a running Jupiter Notebook for this code can be found here):

def create_model(observed):
  with zs.BayesianNet(observed=observed) as model:
    
    #this is what we want to infer, the probability that a student cheats
    cheating_freq = zs.Uniform('cheating_freq', minval=0.0, maxval=1.0, group_ndims=1)
    
    not_cheated = zs.Bernoulli('not_cheated', logits=n_ones*logit(cheating_freq), group_ndims=1)
    first_coin_flips = zs.Bernoulli('first_coin_flips', logits=n_ones*logit(0.5), group_ndims=1)
    second_coin_flips = zs.Bernoulli('second_coin_flips', logits=n_ones*logit(0.5), group_ndims=1)
    
    answers = first_coin_flips * not_cheated + (1 - first_coin_flips) * second_coin_flips
    
    observed_proportion = tf.cast(tf.reduce_sum(answers)/N, tf.float32)
    observations = zs.Binomial('observations', logits=logit(observed_proportion),
                               n_experiments=tf.constant(N), group_ndims=1)
  return model

This is the code for the inference model:

n_chains = 1
n_iters = 40000
burnin = 20000
n_leapfrogs = 5

adapt_step_size = tf.placeholder(tf.bool, shape=[], name="adapt_step_size")
adapt_mass = tf.placeholder(tf.bool, shape=[], name="adapt_mass")
hmc = zs.HMC(step_size=1e-3, n_leapfrogs=n_leapfrogs, 
             adapt_step_size=adapt_step_size, adapt_mass=adapt_mass,
             target_acceptance_rate=0.9)

def log_joint(observed):
  model = create_model(observed)
  cf, nc = model.local_log_prob(['cheating_freq', 'not_cheated'])
  return cf + nc

qcheating_freq = tf.Variable(0.1 * tf.ones([n_chains,1]), trainable=True)
obs = tf.constant([35])
sample_op, hmc_info = hmc.sample(log_joint, observed={'observations':obs}, latent={'cheating_freq':qcheating_freq})

And this is the code for running the inference:

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  samples = []
  print('Sampling...')
  for i in range(n_iters):
      _, fc_sample, acc, ss = sess.run(
          [sample_op, hmc_info.samples['cheating_freq'],
           hmc_info.acceptance_rate,hmc_info.updated_step_size],
          feed_dict={adapt_step_size: i < burnin // 2, adapt_mass: i < burnin // 2}
      )
      if i % 250 == 0:
          print('Sample {}: Acceptance rate = {}, updated step size = {}'
                  .format(i, np.mean(acc), ss))
      if i >= burnin:
          samples.append(fc_sample)
  print('Finished.')
  samples = np.vstack(samples)

If you have the time, I'll appreciate your help a lot.

suggest a walkthrough jupyter notebook

Hi, authors,

It would be very helpful if there's walkthrough jupyter notebook like this one
.
Thanks!

Computation of gradient w.r.t. Distribution parameters

I have detailedly studied the code under the folders zhusuan/distributions and zhusuan/model. Yet, I am still confused by how the gradients are computed by Tensorflow. Since a number of samples are used to caluated the cost at each optimization Op, I am not sure why this is reversible when taking the gradients w.r.t. Distribution parameters (e.g. mean and log_std in Normal), how this is handled by Tensorflow. It is much appreciated if you can suggest me some materials which I have missed to read.

Posterior and parameters analysis

I wonder if there are some approaches for posterior and parameters analysing? I wish you could provide a summary of model and plot API. Or, some examples for analysing and Visualizing model parameters are very helpful.

Fix tf.contrib.graph_editor issues before merging to tensorflow master

This is a temporal fix for tf.contrib.graph_editor issues before my fix is merged to tensorflow master. Use this before running examples under model branch
Apply a patch to

.../tensorflow/contrib/graph_editor/transform.py

in line 449

# original
# remaining_roots = [
#         op for op in remaining_ops
#         if not op.outputs and not self._info.control_outputs.get(op)
#     ]

# new
remaining_roots = [
        op for op in remaining_ops
        if not op.outputs  # and not self._info.control_outputs.get(op)
    ]

Clarification on HMC

I have two questions regarding the HMC implementation:

Why the default target_acceptance_rate is 0.8 and not 0.65, which is most widely used in the literature.
Is there a white paper about the method used for adapting the mass? (I assume this is the diagonal covariance of the velocity vectors)

AttributeError: module 'progressbar' has no attribute 'DataSize'

When I run the examples, such as 'python -m examples.variational_autoencoders.vae' and 'python -m examples.bayesian_neural_nets.bayesian_nn', I will get the error "AttributeError: module 'progressbar' has no attribute 'DataSize'". Please help me. Thank you.

Issue with the shapes required in tf.layers

Following a standard VAE example for the geneative model that looks like this:

@zs.reuse("p")
def gen2(func, observed, bs, dim_z, n_samples):
    with zs.BayesianNet(observed=observed) as p:
        z_mean = tf.zeros([bs, dim_z])
        z = zs.Normal("z", z_mean, std=1., group_ndims=1, n_samples=n_samples)
        x_logits = func(z)
        x = zs.Bernoulli("x", x_logits, group_ndims=1)
    return p

where func just applies a bunch of dense layers from tf.layers I get the error:

Traceback (most recent call last):
  File "/opt/pycharm-2017.2.3/helpers/pydev/pydevd.py", line 1599, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/opt/pycharm-2017.2.3/helpers/pydev/pydevd.py", line 1026, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/opt/pycharm-2017.2.3/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/alex/work/python/eval_vi/bin/test.py", line 37, in <module>
    p = gen2(gen_func, {"z": qz_samples, "x": x_bin}, bs, dim_z, n_samples)
  File "/opt/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 278, in __call__
    result = self._call_func(args, kwargs, check_for_new_variables=False)
  File "/opt/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 217, in _call_func
    result = self._func(*args, **kwargs)
  File "/home/alex/work/python/eval_vi/bb_eval_vi/models/__init__.py", line 82, in gen2
    x_logits = func(z)
  File "/home/alex/work/python/eval_vi/bb_eval_vi/models/__init__.py", line 47, in dense_stack
    return stack.apply(inputs)
  File "/opt/miniconda3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 721, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/opt/miniconda3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 594, in __call__
    input_shapes = nest.map_structure(lambda x: x.get_shape(), inputs)
  File "/opt/miniconda3/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 387, in map_structure
    structure[0], [func(*x) for x in entries])
  File "/opt/miniconda3/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 387, in <listcomp>
    structure[0], [func(*x) for x in entries])
  File "/opt/miniconda3/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 594, in <lambda>
    input_shapes = nest.map_structure(lambda x: x.get_shape(), inputs)
AttributeError: 'Normal' object has no attribute 'get_shape'

originally defined at:
  File "/home/alex/work/python/eval_vi/bin/test.py", line 4, in <module>
    from bb_eval_vi.models import *
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/alex/work/python/eval_vi/bb_eval_vi/models/__init__.py", line 77, in <module>
    @zs.reuse("p")
  File "/home/alex/work/python/zhusuan/zhusuan/model/base.py", line 413, in <lambda>
    return lambda f: tf.make_template(scope, f)

Which I think is related to the fact that the model classes have no shapes. However, this makes it quite inconvinient to use like this.

A problem on zs.Concrete

`
import tensorflow as tf
import zhusuan as zs

logits = tf.zeros([10, 10])
temparature = 1.
labels = zs.Concrete('y', logits=logits, temperature=temparature, n_samples=10, group_ndims=1)
prob = labels.log_prob(tf.one_hot(range(10), 10))
print (prob.shape)
`
seems only produce 1 tensor with shape ()

Some example cannot run under Python 3.6

The example examples/semi_supervised_vae/vae_ssl_rws.py cannot run under Python 3.6.

questions about dlgm_nf.py

in line 74-76 in dlgm_nf.py :
qz_samples, log_qz = zs.planar_normalizing_flow(qz_samples, log_qz,
n_iters=n_planar_flows)
qz_samples, log_qz = zs.planar_normalizing_flow(qz_samples, log_qz,
n_iters=n_planar_flows)

why repeat the codes? if we want to implement more layers of normalizing_flow.why not just set the
n_iters directly? e.g.
qz_samples, log_qz = zs.planar_normalizing_flow(qz_samples, log_qz,
n_iters=2*n_planar_flows)

what's the parameters in normalizing_flow? can these parameters also be optimized during
the training process?

Convolutional VAE batch size

Convolutional VAE batch size should be specified in current version. Need to be fixed.

log_pdf in StocahsticGraph's get_output has the same size with the output tensor(Branch model)

In branch model,
the log_pdf in StocahsticGraph's get_output has the same size with the targeted output tensor, which differs from the master setting, in which the log_pdf will be shape (batch_size, n_samples) by reduce_sum.

In #42 of ../zhusuan/variational.py

    lower_bound = model.log_prob(latent_outputs, observed, given) - \
        sum(latent_logpdfs)_

will crash because of adding latent_logpdfs of different shapes.

Ancestral sampling incorrect (?)

Code below should output close to 0 according to the docs, but the output is actually random.

import tensorflow as tf
import zhusuan as zs
if __name__ == "__main__":
  with zs.BayesianNet() as model:
    z = zs.Normal('z', tf.constant(0.), tf.constant(0.))
    x = zs.Normal('x', z, tf.constant(0.))
    spl = x.sample(1000)
  with tf.Session() as sess:
    print(sess.run(tf.reduce_mean(spl)))

Exponential families and natural parameters

It would really great if there is a way of constructing an easy way of switching between standard and natural parameters. This particularly is to address more advanced techniques such as Structured VAEs - https://arxiv.org/abs/1603.06277. However, this would require significant thought on how to incorporate this in the API and so I think a discussion here would be good to make. I have not seen so far a good abstraction for this in any of the existing probabilistic frameworks in the community.

Questions on multivariate distributions class

Dear authors, thanks for the great package. I have some questions here and would like to ask for some help.

Initialize a distribution

In Basic Concepts part of the docs, the code

>>> import zhusuan as zs
>>> a = zs.distributions.Normal(mean=0., logstd=0.)

is easy to understand and I think it represents we create a random variable a~N(0, 0), while

>>> b = zs.distributions.Normal([[-1., 1.], [0., -2.]], [0., 1.])

seems confusing to me. Does it mean we create a 2*2 random variables which form a matrix? If is, what does [0., 1.] means then ?

multivariate distributions

In the Bayesian Neural Networks for Multivariate Regression section of the docs, the code below is a little bit confusing to me.

with zs.BayesianNet(observed=observed) as model:
    ws = []
    for i, (n_in, n_out) in enumerate(zip(layer_sizes[:-1],
                                          layer_sizes[1:])):
        w_mu = tf.zeros([1, n_out, n_in + 1])
        ws.append(
            zs.Normal('w' + str(i), w_mu, std=1.,
                      n_samples=n_particles, group_ndims=2))

The doc says "The last group_ndims number of axes in batch_shape are grouped into a single event". In this case, the batch_shape is [n_out, n_in + 1] and the value_shape is [], setting group_ndims=2 means we group the n_out*(n_in + 1) random variables together so as to evaluate the probability at one time, right?

If the understanding above is correct, then I notice we just create n_out*(n_in + 1) random variables which are independent with each other, we can also treat them as a multivariate Normal random variable with identity covariance matrix. However, what if I want to create a real multivariate Normal random variable with a non-diagonal covariance matrix? I didn't see it in module-zhusuan.distributions.multivariate while this kind class exists in tensorflow (refer to tf.contrib.distributions.MultivariateNormalDiag).

Hope it's not much work and thanks for your time!

PS. I think that the basic concept part of docs may be relatively hard for people who have limited background on this topic (like me), though the expression is mathematically precise indeed. Hope for more vivid figures and easy examples to illustrate the abstract concepts. :)

prettytensor's unimplemented he_init() for conv2d causes forward propagation exploding

Current solution: When using conv2d, apply a patch to

.../python2.7/dist-packages/prettytensor/pretty_tensor_image_methods.py

# init = layers.he_init(size[2] * patch_size, size[3] * patch_size,
#                                activation_fn)
init = layers.xavier_init(size[2] * patch_size, size[3] * patch_size)

This issue will be solved in future after we have our own neural layers and get rid of prettytensor.

HMC parameters settings

in examples: gaussian.py

Define HMC parameters

kernel_width = 0.1
n_chains = 1000
n_iters = 200

what is the diffetence between n_chains and n_iters?

it seems that both of them affect the final sampling quality?

what is exactly the difference if i change them as follows:
n_chains = 100
n_iters = 2000

save and restore models?

Just wondering how to save and restore trained deep generative models in zhusuan?
So that we can deploy or continue to train the previous trained models?
Is it exactly the same with that for general tensorflow models?

HMC sampling bug

hmc_bug.txt

HMC sampling cannot even run for one step.
But if setting yprec=zs.Gamma('yprec', 6., 6., n_samples=n_particles) in bnn and changing corresponding variance shape to be [n_chains], then it can sample well.

can't use sess.run() to a stochastic tensor

Here's the code (may omit some details)

def main():
    @zs.reuse('model')
    def test(observed):
        with zs.BayesianNet(observed=observed) as model:
            x_mean = tf.zeros([1, 2])
            x_logstd = tf.zeros([1, 2])
            x = zs.Normal('x', x_mean, x_logstd, group_event_ndims=1)
        return model,x

    model,x= test({})
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        print(sess.run(x))

if __name__ == "__main__":
    main()

Now I want to look into the variable x, but after I run the code nothing happens. The program seems to stuck at some place.

Stochastic Tensor shape crash when some dim is 1

n_particles = tf.placeholder(tf.int32, shape=[])
with zs.BayesianNet(observed=None) as model:
w_mu = tf.zeros([n_particles, 1, 10])
w_logstd = tf.zeros([n_particles, 1, 10])
w = zs.Normal('w', w_mu, w_logstd)
pdb.set_trace()
a = 1
see w.tensor in pdb, found the second dim becomes ? instead of 1

sum of logpdf in advi of variational.py crashes with None

How to query deterministic transformations of StochasticTensor?

Hi,

I'm implementing a model in which I need $f_{MLP}(z)$ in inference. As BayesianNet.query only accepts StochasticTensor, is there any way to do this, or could you consider adding this feature (e.g. by adding a deterministic StochasticTensor wrapper)? I think it's a pretty common use case.

Thanks for the awesome work!

请问哪里能找到zhusuan的中文文档?

test log likelihood probelms

I noted that test_ll (test log likelihood) is evaluated in many examples,:
e.g.
in bayesian_nn.py line 116:
log_likelihood = tf.reduce_mean(zs.log_mean_exp(log_py_xw, 0)) -
tf.log(std_y_train)

in vae.py line 78:
is_log_likelihood = tf.reduce_mean(
zs.is_loglikelihood(log_joint, {'x': x},
{'z': [qz_samples, log_qz]}, axis=0))

while in sbn_rws.py line 148:
test_ll = sess.run(lower_bound,
feed_dict={x: test_x_batch,
n_particles: ll_samples})

my questions are:

why in sbn_rws.py, just use lower_bound to evaluate test_ll , which is much different with others
it seems that the log_likelihood is calculated in various forms, what's the differences between them?
is it possible to use one clean API (e.g. something like zs.is_loglikelihood) to unify them?

no variational parameters

in line 51 of /toy_examples/toy2d_intractable.py :
infer_op = optimizer.minimize(cost)

it seems that the variational parameters( z_mean, z_logstd) were not explicitly specified, how does the
optimizer knows exactly which parameters to optimize?

Provide path_derivative argument to all continuous distributions

Paper: https://arxiv.org/abs/1703.09194
You can just add to any continuous distribution an extra argument path_derivative. When that is true when calculating the _log_prob just wrap the distribution parameters in tf.stop_gradients.

Extending Bayesian NNs to have more hidden layers

Greetings Jiaxin,

I am playing with regression using Bayesian Neural Networks. To do so I use your code (https://github.com/thu-ml/zhusuan/blob/master/examples/tutorials/bayesian_nn.py).

Thank you for writing such a great framework (zhusuan) and for your bayesian_nn code.

I have a question though: Could you give me instructions on how to extend the code to have more layers (let's say, 3 layers instead of only 1)?

Is that complicated to do that? In Edward it is pretty easy but I am not sure in this case.

If it is very easy, I hope to ask you a favor of doing that. Meanwhile if it is not the case, please give me some hints and then I will try to implement.

Thank you for your time, and again, thank you for your great framework!

confusions about lb_z in vae_ssl.py

From line 124 of vae_ssl.py, it seems that lb_z is a zs.variational.elbo object.
however, from line 130:
lb_z = tf.reshape(lb_z, [-1, n_y])
it seems that lb_z is treated as a tensor. so, what exactly lb_z is? how is it calculated?
line 136-137:
unlabeled_lower_bound = tf.reduce_mean(tf.reduce_sum(qy_u * (lb_z - log_qy_u), 1))

which makes me confused.
why use lb_z again here to calculate unlabeled_lower_bound ? what's the math behind the expression?

As according to the ELBO expression, it seems that is should be something like this: unlabeled_lower_bound = tf.reduce_mean(tf.reduce_sum(qy_u * ( logpθ(x,y,z)- log_qy_u), 1))
why here use lb_z instead of something like logpθ(x,y,z)

Abstraction of empirical distribution.

I know this might be a bit strange, but I think it would be useful at least to have an option of an empirical distribution. This would make for instance specifying the "inference" networks more symmetric and include more clearly what the full graphical model is. This will also make it consistent to pass the x as an observed variable when we query for the log-probabilities rather than when we build model. This additionally makes the way of building the forward and backward models having the exact same signatures. Taking the example:

@zs.reuse('model')
def vae(observed, x_dim, z_dim, n_x, n_z_per_x):
    with zs.BayesianNet(observed=observed) as model:
        z_mean = tf.zeros([n_x, z_dim])
        z = zs.Normal('z', z_mean, std=1., group_ndims=1, n_samples=n_z_per_x)
        lx_z = layers.fully_connected(z, 500)
        lx_z = layers.fully_connected(lx_z, 500)
        x_logits = layers.fully_connected(lx_z, x_dim, activation_fn=None)
        x = zs.Bernoulli('x', x_logits, group_ndims=1)
    return model, x_logits

@zs.reuse('variational')
def q_net(observed, x_dim, z_dim, n_x, n_z_per_x):
    with zs.BayesianNet(observed=observed) as variational:
        x = zs.Empirical('x', (n_x, x_dim), dtype=tf.int32)
        lz_x = layers.fully_connected(tf.to_float(x), 500)
        lz_x = layers.fully_connected(lz_x, 500)
        z_mean = layers.fully_connected(lz_x, z_dim, activation_fn=None)
        z_logstd = layers.fully_connected(lz_x, z_dim, activation_fn=None)
        z = zs.Normal('z', z_mean, logstd=z_logstd, group_ndims=1,
                      n_samples=n_z_per_x)
    return variational

Example request

Dear Creators,

I deeply think this is one of the most powerful frameworks for creating graphical models and parameters learning Ive seen so far. Outstanding job.

I kindly wanted to request a few more simple easy to follow BN focused examples, such as classical sprinkler example http://web.eecs.utk.edu/~leparker/Courses/CS594-fall09/Lectures/12-Chapter14b-Oct22.pdf. Potentially with a hidden state twist - would be great in practice.

Best regards

in line 84： x_test = np.random.binomial(1, x_test, size=x_test.shape).astype('float32')
the dtype of x_test is 'float32
in line 110: x = tf.placeholder(tf.int32, shape=[None, n_x], name='x')
the dtype of x is tf.int32, it seems that x_test is not match x.
however, in line 165: feed_dict={x: test_x_batch,
test_x_batch is feed to x , and the program works well, why?
in line 66: lz_x = layers.dropout(lz_x, keep_prob=0.9, is_training=is_training)
The dropout layer is applied after a bn layer.
Considering that they have some similar effects (e.g. regulation), is it a good operation to use both bn and dropout tricks? Any further benefits can we get than just use one of them ?