Date of Original Version
Abstract or Description
Online content have become an important medium to disseminate information and express opinions. With their proliferation, users are faced with the problem of missing the big picture in a sea of irrelevant and/or diverse content. In this paper, we addresses the problem of information organization of online document collections, and provide algorithms that create a structured representation of the otherwise unstructured content. We leverage the expressiveness of latent probabilistic models (e.g., topic models) and non-parametric Bayes techniques (e.g., Dirichlet processes), and give online and distributed inference algorithms that scale to terabyte datasets and adapt the inferred representation with the arrival of new documents. This paper is an extended abstract of the 2012 ACM SIGKDD best doctoral dissertation award of Ahmed .
Proceedings of 23rd International Joint Conference on Artificial Intelligence (IJCAL 2013).