Date of Original Version
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 955–964
Abstract or Table of Contents
We describe an approach for acquiring the domain-specific dialog knowledge required to configure a task-oriented dialog system that uses human-human interaction data. The key aspects of this problem are the design of a dialog information representation and a learning approach that supports capture of domain information from in-domain dialogs. To represent a dialog for a learning purpose, we based our representation, the form-based dialog structure representation, on an observable structure. We show that this representation is sufficient for modeling phenomena that occur regularly in several dissimilar taskoriented domains, including informationaccess and problem-solving. With the goal of ultimately reducing human annotation effort, we examine the use of unsupervised learning techniques in acquiring the components of the form-based representation (i.e. task, subtask, and concept). These techniques include statistical word clustering based on mutual information and Kullback-Liebler distance, TextTiling, HMM-based segmentation, and bisecting K-mean document clustering. Withsome modifications to make these algorithms more suitable for inferring the structure of a spoken dialog, the unsupervised learning algorithms show promise.