Date of Original Version

5-2014

Type

Conference Proceeding

Journal Title

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

First Page

910

Last Page

916

Abstract or Description

We present a definiteness annotation scheme that captures the semantic, pragmatic, and discourse information associated with noun phrases, which we call communicative functions. A survey of the linguistics literature suggests that definiteness does not express a single communicative function but is a grammaticalization of many such functions, for example, identifiability, familiarity, uniqueness, and specificity. Our annotation scheme unifies ideas from previous research on definiteness while attempting to remove redundancy. The scheme encodes the communicative functions of definiteness rather than the grammatical forms of definiteness. We assume that the communicative functions are largely maintained across languages while the grammaticalization of this information may vary. Corpora that are annotated using communicative functions can be used to train classifiers, offering data-driven insights into the grammaticalization of definiteness in different languages. We release our annotated corpora for English and Hindi as well as sample annotations for Hebrew and Russian, together with an annotation manual.

Share

COinS
 

Published In

Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 910-916.