Date of Original Version



Conference Proceeding

Abstract or Description

We address the problem of identifying key domain concepts
automatically from an unannotated corpus of goal-oriented
human-human conversations. We examine two clustering
algorithms, one based on mutual information and another one
based on Kullback-Liebler distance. In order to compare the
results from both techniques quantitatively, we evaluate the
outcome clusters against reference concept labels using
precision and recall metrics adopted from the evaluation of
topic identification task. However, since our system allows
more than one cluster to associate with each concept an
additional metric, a singularity score, is added to better capture
cluster quality. Based on the proposed quality metrics, the
results show that Kullback-Liebler-based clustering
outperforms mutual information-based clustering for both the
optimal quality and the quality achieved using an automatic
stopping criterion