Date of Original Version
c 2009 Association for Computational Linguistics
Abstract or Description
We consider semi-supervised learning of information extraction methods, especially for extracting instances of noun categories (e.g., ‘athlete,’ ‘team’) and relations (e.g., ‘playsForTeam(athlete,team)’). Semisupervised approaches using a small number of labeled examples together with many unlabeled examples are often unreliable as they frequently produce an internally consistent, but nevertheless incorrect set of extractions. We propose that this problem can be overcome by simultaneously learning classifiers for many different categories and relations in the presence of an ontology defining constraints that couple the training of these classifiers. Experimental results show that simultaneously learning a coupled collection of classifiers for 30 categories and relations results in much more accurate extractions than training classifiers individually.
Proceedings of the NAACL HLT Workshop on Semi-supervised Learning for Natural Language Processing, 1-9.