Date of Original Version
Abstract or Table of Contents
The need for time-critical analysis and understanding of the underlying group structure from transactional data has been growing in domains such as law enforcement and customs. Kubica et al. (2003) proposed k-groups, an algorithm based on probabilistic generative model for discovering underlying groups in data. Even though k-groups is reported to be signficantly faster than its predecessor GDA (Kubica et al., 2002), k-groups is too slow and memory-intensive for large data in practice. This paper presents XGDA, a framework for scalable and robust group discovery. Evaluation of the performances of XGDA and k-groups shows that XGDA can handle extremely large datasets in reasonable time and yields more robust solutions than k-groups.