Date of Original Version



Conference Proceeding

Rights Management

Copyright © SIAM

Abstract or Description

We present a pseudo-observed variable based regularization technique for latent variable mixed-membership models that provides a mechanism to impose preferences on the characteristics of aggregate functions of latent and observed variables. The regularization framework is used to regularize topic models, which are latent variable mixed membership models for language modeling. In many domains, documents and words often exhibit only a slight degree of mixed-membership behavior that is inadequately modeled by topic models which are overly liberal in permitting mixed-membership behavior. The regularization introduced in the paper is used to control the degree of polysemy of words permitted by topic models and to prefer sparsity in topic distributions of documents in a manner that is much more flexible than permitted by modification of priors. The utility of the regularization in exploiting sentiment-indicative features is evaluated internally using document perplexity and externally by using the models to predict star counts in movie and product reviews based on the content of the reviews. Results of our experiments show that using the regularization to finely control the behavior of topic models leads to better perplexity and lower mean squared error rates in the star-prediction task.





Published In

Proceedings of the 2013 SIAM International Conference on Data Mining, 414-422.