Date of Original Version
Abstract or Table of Contents
Anticipating the availability of large question-answer datasets, we propose a principled, data-driven Instance-Based approach to Question Answering. Most question answering systems incorporate three major steps: classify questions according to answer types, formulate queries for document retrieval, and extract actual answers. Under our approach, strategies for answering new questions are directly learned from training data. We learn models of answer type, query content, and answer extraction from clusters of similar questions. We view the answer type as a distribution, rather than a class in an ontology. In addition to query expansion, we learn general content features from training data and use them to enhance the queries. Finally, we treat answer extraction as a binary classification problem in which text snippets are labeled as correct or incorrect answers. We present a basic implementation of these concepts that achieves a good performance on TREC test data.