Date of Original Version
Proceedings of COLING 2012: Technical Papers, Martin Kay and Christian Boitet (eds.)
Copyright 2012 The COLING 2012 Organizing Committee.
Abstract or Description
In this work we address the challenge of augmenting n-gram language models according to prior linguistic intuitions. We argue that the family of hierarchical Pitman-Yor language models is an attractive vehicle through which to address the problem, and demonstrate the approach by proposing a model for German compounds. In our empirical evaluation the model outperforms a modified Kneser-Ney n-gram model in test set perplexity. When used as part of a translation system, the proposed language model matches the baseline BLEU score for English→German while improving the precision with which compounds are output. We find that an approximate inference technique inspired by the Bayesian interpretation of Kneser-Ney smoothing (Teh, 2006) offers a way to drastically reduce model training time with negligible impact on translation quality
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 License.
Proceedings of COLING 2012: Technical Papers, Martin Kay and Christian Boitet (eds.), 341-356.