Date of Original Version
Proceedings of the Workshop on Statistical Machine Translation (SMT)
Copyright 2011 ACL
Abstract or Description
A growing body of machine translation research aims to exploit lexical patterns (e.g., ngrams and phrase pairs) with gaps (Simard et al., 2005; Chiang, 2005; Xiong et al., 2011). Typically, these “gappy patterns” are discovered using heuristics based on word alignments or local statistics such as mutual information. In this paper, we develop generative models of monolingual and parallel text that build sentences using gappy patterns of arbitrary length and with arbitrarily many gaps. We exploit Bayesian nonparametrics and collapsed Gibbs sampling to discover salient patterns in a corpus. We evaluate the patterns qualitatively and also add them as features to an MT system, reporting promising preliminary results.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.
Proceedings of the Workshop on Statistical Machine Translation (SMT), 512-522.