GithubHelp home page GithubHelp logo

Comments (29)

botev avatar botev commented on May 20, 2024

A working example of this can be seen here.
If you guys think this is reasonable I can make a PR for it as well.

from zhusuan.

thjashin avatar thjashin commented on May 20, 2024

Nice idea, and I understand the motivation. What I am wondering is whether this does maximize ease of use, since the only difference of programming the posterior is that you may need to pass 'x' through the observed argument instead of a separate one.

Despite this I'm still happy to merge it as an optional feature for people who love the symmetry :) Before that let me check how you deal with the shape things.

from zhusuan.

botev avatar botev commented on May 20, 2024

Yes, the shapes I'm not 100% sure are correct so having a look is fine.

Another use case, which I came across just now is to have a Delta distribution (or also a marker). This is to be able to pull some of the deterministic computation from inside the model. As an example, some of the other Normalizing Flows (householder for instance) create the flow by usually using the "top deterministic" layer before the z_logits. This additionally can make GANs a lot more natural where the sampling epsilon and the final variable are inside the BayesianNet and are both retrievable. As an example (I specifically use it now for vae):

@zs.reuse('variational')
def q_net(observed, x_dim, z_dim, n_x, n_z_per_x):
    with zs.BayesianNet(observed=observed) as variational:
        x = zs.Empirical('x', (n_x, x_dim), dtype=tf.int32)
        lz_x = layers.fully_connected(tf.to_float(x), 500)
        lz_x = layers.fully_connected(lz_x, 500)
        h = zs.Delta("h", h)
        z_mean = layers.fully_connected(lz_x, z_dim, activation_fn=None)
        z_logstd = layers.fully_connected(lz_x, z_dim, activation_fn=None)
        z = zs.Normal('z', z_mean, logstd=z_logstd, group_ndims=1,
                      n_samples=n_z_per_x)
    return variational

This now allows me to do:

(z, log_q_z), (h, _) =q_net({"x": x}, x_dim, z_dim, n_x, n_z_per_x) \
                .query(["z", "h"], outputs=True, local_log_prob=True)

Using this a GAN would look like:

@zs.reuse('model')
def gan(observed, x_dim, z_dim, n_x, n_z_per_x):
    with zs.BayesianNet(observed=observed) as model:
        z_mean = tf.zeros([n_x, z_dim])
        z = zs.Normal('z', z_mean, std=1., group_ndims=1, n_samples=n_z_per_x)
        lx_z = layers.fully_connected(z, 500)
        lx_z = layers.fully_connected(lx_z, 500)
        lx_z = layers.fully_connected(lx_z, x_dim, activation_fn=None)
        x = zs.Delta("x", lx_z)
    return model

Note that for continuous variables the log likelihood of the Delta is considered infinte.

from zhusuan.

thjashin avatar thjashin commented on May 20, 2024

I agree this can be an alternative if you want to query some deterministic things through the context. Maybe Deterministic or Implicit is better? Since it may not be a delta due to randomness in upstream nodes.

from zhusuan.

thjashin avatar thjashin commented on May 20, 2024

@botev Btw, if you'd like to make the Implicit node. We may be happy to bring forward the plan on supporting density ratio estimation for implicit distributions (e.g. through a GAN-like discriminator). This will make learning of implicit models easier in ZhuSuan. @ssydasheng is happy to help with this.

from zhusuan.

botev avatar botev commented on May 20, 2024

Sure, I don't mind what the name is.

Since it may not be a delta due to randomness in upstream nodes.

My interpretation of the name "delta" was as conditional delta, as in the same way that when we write x = zs.Normal and meaning conditional normal in the graphical model, not marginal.

On that point, are you comfortable with the 0/1 0/inf probability densities for the Implicit?

from zhusuan.

thjashin avatar thjashin commented on May 20, 2024

How about a NotImplementedError? Users may not expect an inf for their computation. They can be reminded if they try to use the density of an implicit distribution, which is always not a good choice.

from zhusuan.

thjashin avatar thjashin commented on May 20, 2024

Yeah you're right. I mixed up the conditional and marginal. In that sense I agree that Delta is ok. But I think Implicit is still preferred considering it's widely used in GAN related papers.

from zhusuan.

ssydasheng avatar ssydasheng commented on May 20, 2024

I think that depends on whether x is generated from a random sample like in GAN, if it is, then Implicit seems better. If x is just a fixed tensor, then Delta seems good.

from zhusuan.

botev avatar botev commented on May 20, 2024

@ssydasheng I think generally we are talking about things which are fixed, but to depend stochastically on something. E.g. each layer of the GAN is Implicit/Delta. I don't think 2 distribution are needed.

@thjashin I don't think an Error is a good idea since atm when you query a model with 2 variable, if you want the log_prob of one of them you will return for both. E.g.:

(z, log_q_z), (h, _) =q_net({"x": x}, x_dim, z_dim, n_x, n_z_per_x) \
                .query(["z", "h"], outputs=True, local_log_prob=True)

Will raise an error, while you don't want that. You can issue a warning or alternatively return None.

from zhusuan.

thjashin avatar thjashin commented on May 20, 2024

That makes sense. None seems to be a good choice.

from zhusuan.

botev avatar botev commented on May 20, 2024

Hmm, apparently the None gives an error from the base method:

@add_name_scope
    def log_prob(self, given):
        """
        log_prob(given)

        Compute log probability density (mass) function at `given` value.

        :param given: A Tensor. The value at which to evaluate log probability
            density (mass) function. Must be able to broadcast to have a shape
            of ``(... + )batch_shape + value_shape``.
        :return: A Tensor of shape ``(... + )batch_shape[:-group_ndims]``.
        """
        given = self._check_input_shape(given)
        log_p = self._log_prob(given)
        return tf.reduce_sum(log_p, tf.range(-self._group_ndims, 0))

Since the reduce_sum is called on the None. I can either modify the base method as well or go back to the infinity log probabilities.

from zhusuan.

thjashin avatar thjashin commented on May 20, 2024

I'm not sure which is better though. @ssydasheng @cjf00000 @korepwx @miskcoo Which type of log_prob you prefer for implicit/delta distributions? None or inf?

from zhusuan.

ssydasheng avatar ssydasheng commented on May 20, 2024

I prefer inf with a warning

from zhusuan.

miskcoo avatar miskcoo commented on May 20, 2024

0/inf seems good.

from zhusuan.

botev avatar botev commented on May 20, 2024

Ok, I will implement that and make a PR. A similar issue, which would be nice to also solve, is to have a Reparametrizable distribution. This will allow anything that is technically a Normalizing Flow to be part of the model as well.
I would suggest the interface for that to be something like:

z = zs.Normal("z", ...)
f_z, log_det = func(z)
z_t = zs.Reparametrizable("z_t", f_z, log_det)

It will not allow passing num_samples. This way "I think" that it will work out of the box correctly if I understand correctly how you use the models to bootstrap them.

from zhusuan.

thjashin avatar thjashin commented on May 20, 2024

Cool, thanks.

As for Reparameterizable, Previously we have discussed about this but finally decided to not support it (at least not the first priority). The main reason is that to implement this, only pass func and log_det is not enough. you have to build a bijector, which can do the inverse func^{-1} to evaluate the density at a given value. We feel all these arguments (bijector, log_det) have made the feature useless because users are required to provide everything and the library only wrap the basic computation in a function. That's why we finally provided a simple implementation of normalizing flow.

from zhusuan.

botev avatar botev commented on May 20, 2024

So I do agree that technically you need a bijective function. However, if you restrict, at least for now as it is not a priority, that the Reparameterizable cannot be part of the observed variables or maybe that if it is observed you can not query for the "root" latent and the log-probability of any of those.

That might not be too easy I agree, let me have a think about it and maybe if I come with some nicer way of doing this I'll make an example and give a proposition.

from zhusuan.

thjashin avatar thjashin commented on May 20, 2024

Yep. Some insights are really needed on this feature.

from zhusuan.

botev avatar botev commented on May 20, 2024

Ok, so I think the main issue is that in most NF we return the "samples" and the "log_det" simultaneously - that is pretty much the only way to compute stuff efficiently. This might be a breaking change and is worth considering, however - add a method sample_and_log_prob to the base Distribution class, which by default calls sample and then log_prob. When users call query you will now have to check for each variable if they request both to call this method. This would make every existing code backwards compatible. It will allow creating a new distribution which supports querying the log_prob only through that method. This would also not require an inverse model. That can be added later where you have both forward and inverse model.

Another option is the model to have similar to self._tensor a self._local_log_prob which to facilitate this in a similar fashion. This, in fact, might be easier.

from zhusuan.

botev avatar botev commented on May 20, 2024

So with the second suggestion the normalizing flow example looks like this:

def q_net(x, z_dim, n_particles, n_planar_flows):
    with zs.BayesianNet() as variational:
        lz_x = tf.layers.dense(tf.to_float(x), 500, activation=tf.nn.relu)
        lz_x = tf.layers.dense(lz_x, 500, activation=tf.nn.relu)
        z_mean = tf.layers.dense(lz_x, z_dim)
        z_logstd = tf.layers.dense(lz_x, z_dim)

        def flow(samples, log_samples):
            return zs.planar_normalizing_flow(samples, log_samples,
                                              n_iters=n_planar_flows)
        z = zs.NormalFlow('z', flow,
                          z_mean, logstd=z_logstd, group_ndims=1, n_samples=n_particles)
    return variational

All of the change that were required can be viewed here: botev@4873d6e

PS: I also would suggest the NF to return only the log_det so that you don't pass the base log_probability. As there are cases where you just want to use the function form of the flow and if we have an inverse it can't use log_prob as input.

from zhusuan.

thjashin avatar thjashin commented on May 20, 2024

I think the key point here is that you suggest making Normalizing Flow a specific distribution. So in that way an error can be raised when its log_prob is called. This is good to have. But I feel it is better to have sample_and_log_prob only implemented in the flow distribution, because in the current implementation you construct log_prob related graphs in situations where users may only want tensor.

from zhusuan.

botev avatar botev commented on May 20, 2024

Doesn't tensorflow skip those graphs during the computation as if the user does not need them they won't be evaluated? Also, note that this can be easily side-stepped by making it internally a closure.

Another option is as you mentioned to have this only for Flow Distribution and have a specific case in the query method.

from zhusuan.

botev avatar botev commented on May 20, 2024

Ok, so I think your suggestion is good. I've also implemented this here: botev@94e4371. I think this approach is good as it addresses both. I've also added code for an optional inverse model which allows for calculating log_prob if ever needed. Note that I don't think there is any way of not creating the log_det graph for the FlowDistribution (also not many use cases when that is the case as well).

However, it is required that we set in the stone the interface to the forward and inverse model and I do suggest and think it is better to return just the log_det with these rather than the sum log_x0 - log_det.

from zhusuan.

thjashin avatar thjashin commented on May 20, 2024

Well, actually introducing an inverse model is not necessary for normalizing flow (e.g., for the planar flow the inverse method is not in closed-form). So I suggest we leave it later to be part of the TransformedDistribution work, for which it is much harder to form a good API. For the normalizing flow distribution, another thing is about shapes. Note that NF should only be applied to distributions of value_shape [] and it depends on the user how many dimensions of the batch_shape they will consider as a group. So instead of applying to the last dimension of batch_shape, we should take group_ndims into consideration.

Note that I don't think there is any way of not creating the log_det graph for the FlowDistribution (also not many use cases when that is the case as well).

I mean you do this for all distributions because the tensor property is implemented by _tensor_and_log_prob.

Maybe we could leave sample_and_log_prob as a method in base Distribution which is by default implemented by directly passing samples to log_probs, and rewritten by the FlowDistribution.

from zhusuan.

botev avatar botev commented on May 20, 2024

So in the second implementation variant here: botev@94e4371
there are a few things:

  1. I create _local_log_prob when .tensor is called only for the FlowDistribution. For any other distribution, it is created only if you explicitly request .local_log_prob otherwise, the graphs are not constructed.

  2. The inverse model is left optional (e.g. None by default) for the FlowDistribution. If you call log_prob and it is None it raises an exception, otherwise calculates the probability accordingly.

  3. I've left the sample_and_log_prob to exist only in FlowDistribution so that 1. is achievable.

As for the shapes could you maybe give me an example cause I'm not sure I understand the issue?

from zhusuan.

thjashin avatar thjashin commented on May 20, 2024

I spent some time thinking about this and have an improved version based on the second implementation.

For base distribution

def sample_and_log_prob(self):
    samples = self.sample()
    log_p = self.log_prob(samples)
    return samples, log_p

By default it will call sample and then log_prob

For FlowDistribution,

def sample_and_log_prob(self):
    samples, log_p = self.base_dist.sample_and_log_prob(samples, log_p)
    samples, log_p = planar_normalizing_flow(samples, log_p, self.n_flows)
    return samples, log_p

def _sample(self):
    # Maybe a specialized error is better
    raise NotImplementedError()

def _log_prob(self):
    raise NotImplementedError()

It was rewritten to use the forward function.

Then in the base StochasticTensor.

@property
def tensor(self):
    try:
         self.tensor = self._distribution.sample()
    except NotImplementedError:
         self.tensor, self.local_log_prob = self._distribution.sample_and_log_prob()

How do you like this?
This will remove code about a specific flow distribution in the base classes.

from zhusuan.

botev avatar botev commented on May 20, 2024

Yes, that does sound good to me and I was thinking as well to add exception handling rather than a check. One thing, however, I do really suggest that the flow has the interface:

samples, log_det_j = flow(samples, **kwargs)

And so than following your suggestion in the FlowDistribution:

def sample_and_log_prob(self):
    samples, log_p = self.base_dist.sample_and_log_prob(samples, log_p)
    samples, log_det_j = planar_normalizing_flow(samples, self.n_flows)
    return samples, log_p - log_det_j

The reason being is that if we add an inverse model where we observe the z_t there is no log_p to pass in. Other than that if you also are happy with that I can modify my implementation and make another PR for that.

from zhusuan.

thjashin avatar thjashin commented on May 20, 2024

Yep. It will be more consistent if all things in the transform module could have (samples, log_det) returned.

from zhusuan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.