Date of Original Version
Proceedings of the Language Resources and Evaluation Conference (LREC)
Copyright by the European Language Resources Association
Abstract or Description
Multiword expressions (MWEs) are quite frequent in languages such as English, but their diversity, the scarcity of individual MWE types, and contextual ambiguity have presented obstacles to corpus-based studies and NLP systems addressing them as a class. Here we advocate for a comprehensive annotation approach: proceeding sentence by sentence, our annotators manually group tokens into MWEs according to guidelines that cover a broad range of multiword phenomena. Under this scheme, we have fully annotated an English web corpus for multiword expressions, including those containing gaps.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Proceedings of the Language Resources and Evaluation Conference (LREC).