Date of Original Version
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
Abstract or Description
With the rapid development of web-based services, concerns about user privacy have heightened. The privacy policies of online websites, which serve as a legal agreement between service providers and users, are not easy for people to understand and therefore offer an opportunity for natural language processing. In this paper, we consider a corpus of these policies, and tackle the problem of aligning or grouping segments of policies based on the privacy issues they address. A dataset of pairwise judgments from humans is used to evaluate two methods, one based on clustering and another based on a hidden Markov model. Our analysis suggests a five-point gap between system and median-human levels of agreement with a consensus annotation, of which half can be closed with bag of words representations and half requires more sophistication.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, 884-894.