Date of Original Version
Abstract or Description
Statistical language models have successfully been used to describe and analyze natural language documents. Recent work applying language models to programming languages is focused on the task of predicting code, while mainly ignoring the prediction of programmer comments. In this work, we predict comments from JAVA source files of open source projects, using topic models and n-grams, and we analyze the performance of the models given varying amounts of background data on the project being predicted. We evaluate models on their comment-completion capability in a setting similar to codecompletion tools built into standard code editors, and show that using a comment completion tool can save up to 47% of the comment typing.
Proceedings of The 51st Annual Meeting of the Association for Computational Linguistics.