Date of Original Version

8-2010

Type

Conference Proceeding

Journal Title

Coling 2010: Poster Volume

First Page

320

Last Page

328

Abstract or Description

Out-of-vocabulary (OOV) words present a significant challenge for Machine Translation. For low-resource languages, limited training data increases the frequency of OOV words and this degrades the quality of the translations. Past approaches have suggested using stems or synonyms for OOV words. Unlike the previous methods, we show how to handle not just the OOV words but rare words as well in an Example-based Machine Translation (EBMT) paradigm. Presence of OOV words and rare words in the input sentence prevents the system from finding longer phrasal matches and produces low quality translations due to less reliable language model estimates. The proposed method requires only a monolingual corpus of the source language to find candidate replacements. A new framework is introduced to score and rank the replacements by efficiently combining features extracted for the candidate replacements. A lattice representation scheme allows the decoder to select from a beam of possible replacement candidates. The new framework gives statistically significant improvements in English-Chinese and English-Haitian translation systems.

Share

COinS
 

Published In

Coling 2010: Poster Volume, 320-328.