hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

The code is way out of date. It's in <a href="https://github.com/st

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

On Nov 12, 2017, at 11:01 PM, Carter Tazio Schonwald ***@***.***&gt

deprecation warnings in LDA example code about example-models HOT 9 OPEN

stan-dev commented on July 24, 2024

deprecation warnings in LDA example code

from example-models.

Comments (9)

cartazio commented on July 24, 2024

relatedly, whats the correct / recommend way to rewrite the sum over the gammas?

as written, its increment_log_prob = log_sum_exp(gamma)

should it be

a) target+= gamma
b) target+= something something gamma
c) something else?

from example-models.

bob-carpenter commented on July 24, 2024

The code is way out of date. It's in

https://github.com/stan-dev/example-models/blob/ec6d329bb5a88fa53e44c28fa01287701660933c/misc/cluster/lda/lda.stan

The current marginalization over the topic (k) for a given word (n) where is this:

  for (n in 1:N) {
    real gamma[K];
    for (k in 1:K) 
      gamma[k] <- log(theta[doc[n],k]) + log(phi[k,w[n]]);
    increment_log_prob(log_sum_exp(gamma));  // likelihood
  }

That can be reduced to

  for (n in 1:N)
    target += log_sum_exp(log(theta[doc[n]]) + to_vector(log(phi[ , w[n]])));

It'd be even better to define log_phi in vector form and reuse for each n. It would also be worth doing this for log_theta if the number of words per document is greater than the total number of topics.

from example-models.

bob-carpenter commented on July 24, 2024

@cartazio: Feel free to submit a pull request.

And a warning---you can't really do Bayesian inference for LDA because of the multimodality. You'll see that you won't satisfy convergence diagnostics running in multiple chains, and not just because of label switching.

from example-models.

cartazio commented on July 24, 2024

@bob-carpenter thanks! Thats super helpful.

by multi-mode you mean: there are different local optima when viewed as an optimization problem / things are nonconvex? (ie vary the priors and there will be different local optima in the posterior?). I had to google around to figure out what you meant, https://scholar.harvard.edu/files/dtingley/files/multimod.pdf seemed the most clearly expositional despite the double spaced formatting :)

is there any good reading/references on how the "variational" formulations such as Mallet/VowpalWabbit etc deal with that issue? or is it just one of those things that tends to stay hidden in folklore common knowledge?

from example-models.

bob-carpenter commented on July 24, 2024

Yes. I meant local optima by "mode". Nobody can deal with the issue. It's computationally intractable (at least unless P = NP). Run multiple times with different randomizer, get different answers. Usually it's only used for exploratory data analysis or to generate features for something else, so the multiple answers aren't a big deal---you just choose one either randomly or with human guidance. Some of the later literature tries to add more informative priors to guide solutions. Some of the early work by Griffiths and Steyvers tried to measure just how different the different modes were that the algorithms found with random inits.

from example-models.

cartazio commented on July 24, 2024

thanks! i'll have do a bit of digging into this :) intractability is no surprise, i was slightly imagining it might be interesting to look at the topology of how tthe different inits / regions of answers connect also what does the term label switching mean here?

…

On Sun, Nov 12, 2017 at 4:58 PM, Bob Carpenter ***@***.***> wrote: Yes. I meant local optima by "mode". Nobody can deal with the issue. It's computationally intractable (at least unless P = NP). Run multiple times with different randomizer, get different answers. Usually it's only used for exploratory data analysis or to generate features for something else, so the multiple answers aren't a big deal---you just choose one either randomly or with human guidance. Some of the later literature tries to add more informative priors to guide solutions. Some of the early work by Griffiths and Steyvers tried to measure just how different the different modes were that the algorithms found with random inits. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#125 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAQwgXYqgvwDq7PFaGYwSGNECWYX2CFks5s12n2gaJpZM4Qa_oo> .

from example-models.

bob-carpenter commented on July 24, 2024

On Nov 12, 2017, at 11:01 PM, Carter Tazio Schonwald ***@***.***> wrote: thanks! i'll have do a bit of digging into this :) intractability is no surprise, i was slightly imagining it might be interesting to look at the topology of how tthe different inits / regions of answers connect

I don't know of any work characterizing this, even for simpler mixtures than LDA.

from example-models.

cartazio commented on July 24, 2024

Considering that would veer into topological data analysis / computational topology and likely be #P hard, I’m not surprised. :) What’s the relable stuff you mentioned ? On Mon, Nov 13, 2017 at 2:17 PM Bob Carpenter <[email protected]> wrote:

…

> On Nov 12, 2017, at 11:01 PM, Carter Tazio Schonwald < ***@***.***> wrote: > > thanks! i'll have do a bit of digging into this :) > intractability is no surprise, i was slightly imagining it might be > interesting to look at the topology of how tthe different inits / regions > of answers connect I don't know of any work characterizing this, even for simpler mixtures than LDA. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#125 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAQwvtcayPF2aO9DPdQrO4snZvpvbxIks5s2JW8gaJpZM4Qa_oo> .

from example-models.

bob-carpenter commented on July 24, 2024

For the Griffiths and Steyvers experiment on relating topics across initializations of LDA: • Steyvers, Mark and Tom Griffiths. 2007. Probabilistic topic models. In Thomas K. Landauer, Danielle S. McNamara, Simon Dennis and Walter Kintsch (eds.), Handbook of Latent Semantic Analysis. Laurence Erlbaum. They use a greedy empirical KL-divergence for alignment, which is crude, but useful.

from example-models.

deprecation warnings in LDA example code about example-models HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs