Date of Original Version
This is the author’s version of a work that was accepted for publication. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version is available at http://dx.doi.org/10.1016/j.jvlc.2010.08.003
Abstract or Description
To help people find code that they might want to reuse, repositories of end-user code typically sort scripts by number of downloads, ratings, or other information based on prior uses of the code. However, this information is unavailable when code is new or when it has not yet been reused. Addressing this problem requires identifying reusable code based solely on information that exists when a script is created. To provide such a model for web macro scripts, we identified script traits that might plausibly predict reuse, then used IBM CoScripter repository logs to statistically test how well each corresponded to actual reuse. These tests confirmed that the traits generally did correspond to higher levels of reuse as anticipated. We then developed a machine learning model that uses these traits as features to predict reuse of macros. Evaluating this model on repository logs showed that its accuracy is comparable to that of existing machine learning models for predicting reuse—but with a much simpler structure. Sensitivity analysis revealed that our model is quite robust; its quality is greatly reduced only when parameters are set to such extreme values that the model becomes inordinately selective. Testing the model with individual traits revealed those that provided the best predictions on their own. Based on these results, we outline opportunities for using our model to improve repositories of end-user code.
Journal of Visual Languages and Computing, 21, 5, 277-291.