Date of Original Version

10-2010

Type

Conference Proceeding

Journal Title

Proceedings of the Ninth Biennial Conference of the Association for Machine Translation in the Americas

Rights Management

Copyright 2010 AMTA

Abstract or Description

This paper examines the motivation, design, and practical results of several types of human evaluation tasks for machine translation. In addition to considering annotator performance and task informativeness over multiple evaluations, we explore the practicality of tuning automatic evaluation metrics to each judgment type in a comprehensive experiment using the METEOR-NEXT metric. We present results showing clear advantages of tuning to certain types of judgments and discuss causes of inconsistency when tuning to various judgment data, as well as sources of difficulty in the human evaluation tasks themselves

Share

COinS
 

Published In

Proceedings of the Ninth Biennial Conference of the Association for Machine Translation in the Americas.