Date of Original Version

5-2014

Type

Conference Proceeding

Journal Title

Proceedings of the Language Resources and Evaluation Conference (LREC)

Rights Management

Copyright by the European Language Resources Association

Abstract or Description

Multiword expressions (MWEs) are quite frequent in languages such as English, but their diversity, the scarcity of individual MWE types, and contextual ambiguity have presented obstacles to corpus-based studies and NLP systems addressing them as a class. Here we advocate for a comprehensive annotation approach: proceeding sentence by sentence, our annotators manually group tokens into MWEs according to guidelines that cover a broad range of multiword phenomena. Under this scheme, we have fully annotated an English web corpus for multiword expressions, including those containing gaps.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Share

COinS
 

Published In

Proceedings of the Language Resources and Evaluation Conference (LREC).