Date of Original Version

10-2010

Type

Conference Proceeding

Journal Title

Proceedings of the Ninth Biennial Conference of the Association for Machine Translation in the Americas

Rights Management

Copyright 2010 AMTA

Abstract or Description

Morphologically rich languages pose a challenge for statistical machine translation (SMT). This challenge is magnified when translating into a morphologically rich language. In this work we address this challenge in the framework of a broad-coverage English-to-Arabic phrase based statistical machine translation (PBSMT). We explore the full spectrum of Arabic segmentation schemes ranging from full word form to fully segmented forms and examine the effects on system performance. Our results show a difference of 2.61 BLEU points between the best and worst segmentation schemes indicating that the choice of the segmentation scheme has a significant effect on the performance of a PBSMT system in a large data scenario. We also show that a simple segmentation scheme can perform as good as the best and more complicated segmentation scheme. We also report results on a wide set of techniques for recombining the segmented Arabic output

Share

COinS
 

Published In

Proceedings of the Ninth Biennial Conference of the Association for Machine Translation in the Americas.