Date of Original Version

10-2014

Type

Conference Proceeding

Journal Title

Proceedings of Conference on Empirical Methods In Natural Language Processing (EMNLP)

First Page

1001

Last Page

1012

Rights Management

Copyright 2014 ACL

Abstract or Description

We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions.

Our dataset and parser can be found at http://www.ark.cs.cmu.edu/TweetNLP.

Share

COinS
 

Published In

Proceedings of Conference on Empirical Methods In Natural Language Processing (EMNLP), 1001-1012.