GithubHelp home page GithubHelp logo

vae-clustering's Introduction

VAE-Clustering

A collection of experiments that shines light on VAE (containing discrete latent variables) as a clustering algorithm.

We evaluate the unsupervised clustering performance of three closely-related sets of deep generative models:

  1. Kingma's M2 model
  2. A modified-M2 model that implicitly contains a non-degenerate Gaussian mixture latent layer
  3. An explicit Gaussian Mixture VAE model

Details about the three models and why to compare them are provided in this blog post.

Results

M2 performs poorly as an unsupervised clustering algorithm. We suspect this is attributable to conflicting wishes to use the categorical variable as part of the generative model versus the inference model. By implicitly enforcing the a hidden layer to have a proper Gaussian mixture distribution, the modified-M2 model tips the scale in favor of using the categorical variable as part of the generative model. By using an explicit Gaussian Mixture VAE model, we achieve enable better inference, which leads to higher stability during training and even a stronger incentive to use the categorical variable in the generative model.

Code set-up

The experiments are implemented using TensorFlow. Since all of the three aforementioned models share very similar formulations, the shared subgraphs are placed in shared_subgraphs.py. The utils.py file contains some additional functions used during training. The remaining *.py files simply implement the three main model classes and other variants that we tried.

We recommend first reading the Jupyter Notebook on nbviewer in the Chrome browser.

Dependencies

  1. tensorflow
  2. tensorbayes
  3. numpy
  4. scipy

vae-clustering's People

Contributors

ruishu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vae-clustering's Issues

Question in computing z in gmvae.py

Hi Ruishu,

Thanks for your post and the code. I had a question in gmvae.py. I was wondering why you initialize y as y = tf.add(y_, Constant(np.eye(10)[i], name='hot_at_{:d}'.format(i))) and use this y in z[i], zm[i], zv[i] = qz_graph(xb, y), instead of using the recognized qy qy_logit, qy = qy_graph(xb)? Any rationale behind it?

Line 50 in gm_vae.py

Newbie here. Maybe this is a basic question, but why is it that line 50 in gm_vae.py does

nent = -cross_entropy_with_logits(qy_logit, qy)

Shouldn't we pass in the label, y, as the second parameter?

blog post

hey the blog post is gone. can you please share some tips on how to debug these kind of deep generative models. i get the theory but when something doesnt work. i am unable to reason why or figure out how to fix the issue. it will be really useful as i dont think theres enough documentatιον οn the actual practice of how to make these things work

Question about loss equation from blog

loss
Hi,
Comparing the explicit GMM VAE code with the loss equation from the blog, I'm confused about the missing term :
$\mathbb{E}_{q(y, z | x)} [\ln \frac{p(y)}{q(y | x)} ] $
(expectation of log(p(y)/q(y|x))-- the first term)
from the loss calculation here:
https://github.com/RuiShu/vae-clustering/blob/14525a9c8cad73b0f9df8ee638b8231253798b1f/shared_subgraphs.py#L31

loss = tf.add_n([nent] + [qy[:, i] * losses[i] for i in xrange(k)])

and in the notebook as well.
It would be great if you could clarify the reason for that
Thanks

Tensorflow Version?

First just wanted to say that I really enjoyed your blog post on this and look forward to getting it running.
Been running into what I have assessed to be Tensorflow version related errors. Had some issues with the Variable Scopes throwing errors, saying that 'reuse' was used multiple times, or something along those lines. I apologize for not having the exact error message. In my attempts to troubleshoot this I upgraded to TF 1.0 and broke things even more and can't get that error again.

error for tensorbayes

Traceback (most recent call last):
File "/home/wac/PycharmProjects/vae-clustering/gmvae.py", line 3, in
from tensorbayes.layers import Constant, Placeholder, Dense, GaussianSample
ImportError: cannot import name Constant

tensorbayes == 0.4.0
tensorflow == 1.7.0
how to fix it?

TypeError when running gmvae.py

Hey there, thanks for the great library - looking forward to getting into GMVAEs!

I've cloned the library and dependencies, but seem to be stuck on this runtime error. I'm on python2.7 so can't see why this shouldn't work out of the box.

Traceback (most recent call last):
File "gmvae.py", line 28, in
x = Placeholder((None, 784), 'x')
File "build/bdist.macosx-10.10-x86_64/egg/tensorbayes/layers/simple.py", line 10, in placeholder
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1332, in placeholder
name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1748, in _placeholder
name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 683, in apply_op
attr_value.type = _MakeType(value, attr_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 177, in _MakeType
(attr_def.name, repr(v)))
TypeError: Expected DataType for argument 'dtype' not 'x'

what's the difference from this paper?

Hey Rui Shu,
I really learn lots from your post, recently I read a candidate paper for ICLR 2019:
https://openreview.net/pdf?id=rygkk305YQ

'HIERARCHICAL GENERATIVE MODELING FOR
CONTROLLABLE SPEECH SYNTHESIS

which use GMVAE too, I compared it to yours in gmvae.py, what's in the paper is quite confusing for get y from sampled z, have you ever read it too?

Poor performance because of bad initialization

This is for those are trying this vae-clustering repo which depends the tensorbayes library. I got very poor performance because of the bad initialization scheme in file: tensorbayes/layers/simple.py line 31.

Pls replace variance_scaling_initializer with xavier_initializier. Also, don't forget to import it: from tensorflow.contrib.layers import xavier_initializer

After this, the performance is boosted.
Hope this help later followers~

Other datasets

Hello, I was wondering if you have been able to use GMVAE on any datasets other than MNIST?

Is `qy_graph` correct?

Hi Rui, I was reading your blog and found it very helpful to my understanding of the CVAE paper. Related to the GMVAE you experimented in the end, I found the implementation might have some issues. To be more specific:

the qy_graph (code copied and pasted below)

def qy_graph(x, k=10):
    reuse = len(tf.get_collection(tf.GraphKeys.VARIABLES, scope='qy')) > 0
    # -- q(y)
    with tf.variable_scope('qy'):
        h1 = Dense(x, 512, 'layer1', tf.nn.relu, reuse=reuse)
        h2 = Dense(h1, 512, 'layer2', tf.nn.relu, reuse=reuse)
        qy_logit = Dense(h2, k, 'logit', reuse=reuse)
        qy = tf.nn.softmax(qy_logit, name='prob')
    return qy_logit, qy

returns the distribution parameter of p(y) qy which is the mean of p(y). Following this graph, the generative process of the model becomes

y = E[p(y)]
z ~ N(mu_z(y), var_z(y))
x ~ Ber(mu_x(z))

instead of what is stated in the blog:

y ~ p(y)
z ~ N(mu_z(y), var_z(y))
x ~ Ber(mu_x(z))

.

Is my understanding correct?

Entropy of q(y|x)

In the code, you maximize the entropy of q(y|x) by minimize the negative entropy.

I think we want to make Y more informative so we should minimize the entropy instead. What is the intuition behind maximizing it?

And why the entropy still get reduced by the training?

PyTorch implementation?

I was trying to implement unsupervised M2 model in PyTorch as I am not so familiar with TensorFlow. Seems you are good on both. Have you implemented?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.