Date of Original Version

7-2011

Type

Conference Proceeding

Journal Title

Proceedings of the EMNLP Workshop on Unsupervised Learning in NLP (UNSUP)

First Page

2

Last Page

12

Rights Management

Copyright 2011 ACM

Abstract or Description

We present a nonparametric Bayesian approach to extract a structured database of entities from text. Neither the number of entities nor the fields that characterize each entity are provided in advance; the only supervision is a set of five prototype examples. Our method jointly accomplishes three tasks: (i) identifying a set of canonical entities, (ii) inferring a schema for the fields that describe each entity, and (iii) matching entities to their references in raw text. Empirical evaluation shows that the approach learns an accurate database of entities and a sensible model of name structure.

Share

COinS
 

Published In

Proceedings of the EMNLP Workshop on Unsupervised Learning in NLP (UNSUP), 2-12.