Date of Original Version

7-2012

Type

Conference Proceeding

Journal Title

Proceedings of the Annual Meeting of the Association for Computational Linguistics

First Page

685

Last Page

693

Rights Management

Copyright 2012 ACL

Abstract or Description

We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news, finding that it outperforms a simple agglomerative clustering approach and previous work.

Creative Commons License

Creative Commons Attribution-Noncommercial-Share Alike 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.

Share

COinS
 

Published In

Proceedings of the Annual Meeting of the Association for Computational Linguistics, 685-693.