Date of Original Version

8-2014

Type

Conference Proceeding

Journal Title

Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

First Page

884

Last Page

894

Abstract or Description

With the rapid development of web-based services, concerns about user privacy have heightened. The privacy policies of online websites, which serve as a legal agreement between service providers and users, are not easy for people to understand and therefore offer an opportunity for natural language processing. In this paper, we consider a corpus of these policies, and tackle the problem of aligning or grouping segments of policies based on the privacy issues they address. A dataset of pairwise judgments from humans is used to evaluate two methods, one based on clustering and another based on a hidden Markov model. Our analysis suggests a five-point gap between system and median-human levels of agreement with a consensus annotation, of which half can be closed with bag of words representations and half requires more sophistication.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS
 

Published In

Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, 884-894.