Date of Original Version
Proceedings of the Conference on Computational Language Learning
Copyright 2014 Association for Computational Linguistics
Abstract or Description
We present a Bayesian formulation for weakly-supervised learning of a Combinatory Categorial Grammar (CCG) supertagger with an HMM. We assume supervision in the form of a tag dictionary, and our prior encourages the use of crosslinguistically common category structures as well as transitions between tags that can combine locally according to CCG’s combinators. Our prior is theoretically appealing since it is motivated by languageindependent, universal properties of the CCG formalism. Empirically, we show that it yields substantial improvements over previous work that used similar biases to initialize an EM-based learner. Additional gains are obtained by further shaping the prior with corpus-specific information that is extracted automatically from raw text and a tag dictionary
Proceedings of the Conference on Computational Language Learning, 141-150.