Comments (29)
A working example of this can be seen here.
If you guys think this is reasonable I can make a PR for it as well.
from zhusuan.
Nice idea, and I understand the motivation. What I am wondering is whether this does maximize ease of use, since the only difference of programming the posterior is that you may need to pass 'x' through the observed
argument instead of a separate one.
Despite this I'm still happy to merge it as an optional feature for people who love the symmetry :) Before that let me check how you deal with the shape things.
from zhusuan.
Yes, the shapes I'm not 100% sure are correct so having a look is fine.
Another use case, which I came across just now is to have a Delta
distribution (or also a marker). This is to be able to pull some of the deterministic computation from inside the model. As an example, some of the other Normalizing Flows (householder for instance) create the flow by usually using the "top deterministic" layer before the z_logits
. This additionally can make GANs a lot more natural where the sampling epsilon and the final variable are inside the BayesianNet
and are both retrievable. As an example (I specifically use it now for vae):
@zs.reuse('variational')
def q_net(observed, x_dim, z_dim, n_x, n_z_per_x):
with zs.BayesianNet(observed=observed) as variational:
x = zs.Empirical('x', (n_x, x_dim), dtype=tf.int32)
lz_x = layers.fully_connected(tf.to_float(x), 500)
lz_x = layers.fully_connected(lz_x, 500)
h = zs.Delta("h", h)
z_mean = layers.fully_connected(lz_x, z_dim, activation_fn=None)
z_logstd = layers.fully_connected(lz_x, z_dim, activation_fn=None)
z = zs.Normal('z', z_mean, logstd=z_logstd, group_ndims=1,
n_samples=n_z_per_x)
return variational
This now allows me to do:
(z, log_q_z), (h, _) =q_net({"x": x}, x_dim, z_dim, n_x, n_z_per_x) \
.query(["z", "h"], outputs=True, local_log_prob=True)
Using this a GAN would look like:
@zs.reuse('model')
def gan(observed, x_dim, z_dim, n_x, n_z_per_x):
with zs.BayesianNet(observed=observed) as model:
z_mean = tf.zeros([n_x, z_dim])
z = zs.Normal('z', z_mean, std=1., group_ndims=1, n_samples=n_z_per_x)
lx_z = layers.fully_connected(z, 500)
lx_z = layers.fully_connected(lx_z, 500)
lx_z = layers.fully_connected(lx_z, x_dim, activation_fn=None)
x = zs.Delta("x", lx_z)
return model
Note that for continuous variables the log likelihood of the Delta is considered infinte.
from zhusuan.
I agree this can be an alternative if you want to query some deterministic things through the context. Maybe Deterministic
or Implicit
is better? Since it may not be a delta due to randomness in upstream nodes.
from zhusuan.
@botev Btw, if you'd like to make the Implicit
node. We may be happy to bring forward the plan on supporting density ratio estimation for implicit distributions (e.g. through a GAN-like discriminator). This will make learning of implicit models easier in ZhuSuan. @ssydasheng is happy to help with this.
from zhusuan.
Sure, I don't mind what the name is.
Since it may not be a delta due to randomness in upstream nodes.
My interpretation of the name "delta" was as conditional delta, as in the same way that when we write x = zs.Normal
and meaning conditional normal in the graphical model, not marginal.
On that point, are you comfortable with the 0/1 0/inf probability densities for the Implicit
?
from zhusuan.
How about a NotImplementedError? Users may not expect an inf for their computation. They can be reminded if they try to use the density of an implicit distribution, which is always not a good choice.
from zhusuan.
Yeah you're right. I mixed up the conditional and marginal. In that sense I agree that Delta
is ok. But I think Implicit
is still preferred considering it's widely used in GAN related papers.
from zhusuan.
I think that depends on whether x
is generated from a random sample like in GAN, if it is, then Implicit
seems better. If x
is just a fixed tensor, then Delta
seems good.
from zhusuan.
@ssydasheng I think generally we are talking about things which are fixed, but to depend stochastically on something. E.g. each layer of the GAN is Implicit/Delta
. I don't think 2 distribution are needed.
@thjashin I don't think an Error is a good idea since atm when you query
a model with 2 variable, if you want the log_prob of one of them you will return for both. E.g.:
(z, log_q_z), (h, _) =q_net({"x": x}, x_dim, z_dim, n_x, n_z_per_x) \
.query(["z", "h"], outputs=True, local_log_prob=True)
Will raise an error, while you don't want that. You can issue a warning or alternatively return None
.
from zhusuan.
That makes sense. None
seems to be a good choice.
from zhusuan.
Hmm, apparently the None
gives an error from the base method:
@add_name_scope
def log_prob(self, given):
"""
log_prob(given)
Compute log probability density (mass) function at `given` value.
:param given: A Tensor. The value at which to evaluate log probability
density (mass) function. Must be able to broadcast to have a shape
of ``(... + )batch_shape + value_shape``.
:return: A Tensor of shape ``(... + )batch_shape[:-group_ndims]``.
"""
given = self._check_input_shape(given)
log_p = self._log_prob(given)
return tf.reduce_sum(log_p, tf.range(-self._group_ndims, 0))
Since the reduce_sum
is called on the None
. I can either modify the base method as well or go back to the infinity log probabilities.
from zhusuan.
I'm not sure which is better though. @ssydasheng @cjf00000 @korepwx @miskcoo Which type of log_prob you prefer for implicit/delta distributions? None or inf?
from zhusuan.
I prefer inf with a warning
from zhusuan.
0/inf seems good.
from zhusuan.
Ok, I will implement that and make a PR. A similar issue, which would be nice to also solve, is to have a Reparametrizable distribution. This will allow anything that is technically a Normalizing Flow to be part of the model as well.
I would suggest the interface for that to be something like:
z = zs.Normal("z", ...)
f_z, log_det = func(z)
z_t = zs.Reparametrizable("z_t", f_z, log_det)
It will not allow passing num_samples
. This way "I think" that it will work out of the box correctly if I understand correctly how you use the models to bootstrap them.
from zhusuan.
Cool, thanks.
As for Reparameterizable
, Previously we have discussed about this but finally decided to not support it (at least not the first priority). The main reason is that to implement this, only pass func
and log_det
is not enough. you have to build a bijector, which can do the inverse func^{-1}
to evaluate the density at a given value. We feel all these arguments (bijector
, log_det
) have made the feature useless because users are required to provide everything and the library only wrap the basic computation in a function. That's why we finally provided a simple implementation of normalizing flow.
from zhusuan.
So I do agree that technically you need a bijective function. However, if you restrict, at least for now as it is not a priority, that the Reparameterizable
cannot be part of the observed variables or maybe that if it is observed you can not query for the "root" latent and the log-probability of any of those.
That might not be too easy I agree, let me have a think about it and maybe if I come with some nicer way of doing this I'll make an example and give a proposition.
from zhusuan.
Yep. Some insights are really needed on this feature.
from zhusuan.
Ok, so I think the main issue is that in most NF we return the "samples" and the "log_det" simultaneously - that is pretty much the only way to compute stuff efficiently. This might be a breaking change and is worth considering, however - add a method sample_and_log_prob
to the base Distribution
class, which by default calls sample
and then log_prob
. When users call query you will now have to check for each variable if they request both to call this method. This would make every existing code backwards compatible. It will allow creating a new distribution which supports querying the log_prob
only through that method. This would also not require an inverse model. That can be added later where you have both forward and inverse model.
Another option is the model to have similar to self._tensor
a self._local_log_prob
which to facilitate this in a similar fashion. This, in fact, might be easier.
from zhusuan.
So with the second suggestion the normalizing flow example looks like this:
def q_net(x, z_dim, n_particles, n_planar_flows):
with zs.BayesianNet() as variational:
lz_x = tf.layers.dense(tf.to_float(x), 500, activation=tf.nn.relu)
lz_x = tf.layers.dense(lz_x, 500, activation=tf.nn.relu)
z_mean = tf.layers.dense(lz_x, z_dim)
z_logstd = tf.layers.dense(lz_x, z_dim)
def flow(samples, log_samples):
return zs.planar_normalizing_flow(samples, log_samples,
n_iters=n_planar_flows)
z = zs.NormalFlow('z', flow,
z_mean, logstd=z_logstd, group_ndims=1, n_samples=n_particles)
return variational
All of the change that were required can be viewed here: botev@4873d6e
PS: I also would suggest the NF to return only the log_det so that you don't pass the base log_probability. As there are cases where you just want to use the function form of the flow and if we have an inverse it can't use log_prob as input.
from zhusuan.
I think the key point here is that you suggest making Normalizing Flow a specific distribution. So in that way an error can be raised when its log_prob
is called. This is good to have. But I feel it is better to have sample_and_log_prob
only implemented in the flow distribution, because in the current implementation you construct log_prob
related graphs in situations where users may only want tensor
.
from zhusuan.
Doesn't tensorflow skip those graphs during the computation as if the user does not need them they won't be evaluated? Also, note that this can be easily side-stepped by making it internally a closure.
Another option is as you mentioned to have this only for Flow Distribution and have a specific case in the query method.
from zhusuan.
Ok, so I think your suggestion is good. I've also implemented this here: botev@94e4371. I think this approach is good as it addresses both. I've also added code for an optional inverse model which allows for calculating log_prob if ever needed. Note that I don't think there is any way of not creating the log_det
graph for the FlowDistribution
(also not many use cases when that is the case as well).
However, it is required that we set in the stone the interface to the forward and inverse model and I do suggest and think it is better to return just the log_det
with these rather than the sum log_x0 - log_det
.
from zhusuan.
Well, actually introducing an inverse model is not necessary for normalizing flow (e.g., for the planar flow the inverse method is not in closed-form). So I suggest we leave it later to be part of the TransformedDistribution
work, for which it is much harder to form a good API. For the normalizing flow distribution, another thing is about shapes. Note that NF should only be applied to distributions of value_shape
[]
and it depends on the user how many dimensions of the batch_shape
they will consider as a group. So instead of applying to the last dimension of batch_shape
, we should take group_ndims
into consideration.
Note that I don't think there is any way of not creating the log_det graph for the FlowDistribution (also not many use cases when that is the case as well).
I mean you do this for all distributions because the tensor
property is implemented by _tensor_and_log_prob
.
Maybe we could leave sample_and_log_prob
as a method in base Distribution
which is by default implemented by directly passing samples to log_probs, and rewritten by the FlowDistribution
.
from zhusuan.
So in the second implementation variant here: botev@94e4371
there are a few things:
-
I create
_local_log_prob
when.tensor
is called only for the FlowDistribution. For any other distribution, it is created only if you explicitly request.local_log_prob
otherwise, the graphs are not constructed. -
The inverse model is left optional (e.g. None by default) for the FlowDistribution. If you call
log_prob
and it isNone
it raises an exception, otherwise calculates the probability accordingly. -
I've left the
sample_and_log_prob
to exist only in FlowDistribution so that 1. is achievable.
As for the shapes could you maybe give me an example cause I'm not sure I understand the issue?
from zhusuan.
I spent some time thinking about this and have an improved version based on the second implementation.
For base distribution
def sample_and_log_prob(self):
samples = self.sample()
log_p = self.log_prob(samples)
return samples, log_p
By default it will call sample and then log_prob
For FlowDistribution,
def sample_and_log_prob(self):
samples, log_p = self.base_dist.sample_and_log_prob(samples, log_p)
samples, log_p = planar_normalizing_flow(samples, log_p, self.n_flows)
return samples, log_p
def _sample(self):
# Maybe a specialized error is better
raise NotImplementedError()
def _log_prob(self):
raise NotImplementedError()
It was rewritten to use the forward function.
Then in the base StochasticTensor.
@property
def tensor(self):
try:
self.tensor = self._distribution.sample()
except NotImplementedError:
self.tensor, self.local_log_prob = self._distribution.sample_and_log_prob()
How do you like this?
This will remove code about a specific flow distribution in the base classes.
from zhusuan.
Yes, that does sound good to me and I was thinking as well to add exception handling rather than a check. One thing, however, I do really suggest that the flow
has the interface:
samples, log_det_j = flow(samples, **kwargs)
And so than following your suggestion in the FlowDistribution:
def sample_and_log_prob(self):
samples, log_p = self.base_dist.sample_and_log_prob(samples, log_p)
samples, log_det_j = planar_normalizing_flow(samples, self.n_flows)
return samples, log_p - log_det_j
The reason being is that if we add an inverse model where we observe the z_t
there is no log_p
to pass in. Other than that if you also are happy with that I can modify my implementation and make another PR for that.
from zhusuan.
Yep. It will be more consistent if all things in the transform module could have (samples, log_det) returned.
from zhusuan.
Related Issues (20)
- Posterior and parameters analysis HOT 1
- questions about dlgm_nf.py HOT 1
- Can't compute prior (local_log_prob) of a StochasticTensor inside tf.scan (in LSTM cell) HOT 11
- Clarifying the * N in log_joint? HOT 4
- Dirichlet + Categorical or Dirichlet + Multinomial toy example ? HOT 5
- Collaboration with TensorLayer HOT 5
- save and restore models? HOT 4
- I have some trouble translating a model from PyMC3 HOT 4
- 请问哪里能找到zhusuan的中文文档? HOT 4
- AttributeError: module 'progressbar' has no attribute 'DataSize' HOT 1
- Why the std of y_mean is so small? HOT 7
- Memory leaks caused by VariationalObjective HOT 2
- How to use custom Hamiltonian? HOT 5
- Eager executation HOT 2
- Get logp from SGMCMC HOT 2
- module 'tensorflow' has no attribute 'make_template' HOT 1
- The examples of ‘semi_supervised_vae’ cannot run successfully HOT 1
- cant install ZhuSuan HOT 4
- AttributeError: module 'tensorflow' has no attribute 'log'
- Examples code is out dated and doesn't work with Tensorflow 2.x HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zhusuan.