Latent Dirichlet Allocation
Introduction
Latent Dirichlet Allocation (LDA) is a probabilistic generative model of text documents. Documents are modeled as a mixture over a set of "topics." Using Variational Bayesian (VB) algorithms, it is possible to learn the set of topics corresponding to the documents in a corpus. These topic features can then be used for tasks such as text categorization.
Included Files
batchLDA.m
- Implements LDA in MATLAB with batch processing of documents.
Takes in a set of word count vectors for the documents in the corpus and
outputs the set of topic features.
classify.m
- A simple text categorization example using the LDA topic
features. Requires the Pattern Recognition Toolbox.
License
This code is made available under the MIT License. Please consult the included LICENSE file for complete information.
References
[1] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet Allocation," Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[2] D. M. Blei, M. D. Hoffman, and F. Bach, "Online Learning for Latent Dirichlet Allocation," in Neural Information Processing Systems (NIPS) 2010, Vancouver, 2010.