Topes: Reusable Abstractions for Validating Data

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE'08, May 10-18, 2008, Leipzig, Saxony, Germany.

Programmers often omit input validation when inputs can appear

in many different formats or when validation criteria cannot be

precisely specified. To enable validation in these situations, we

present a new technique that puts valid inputs into a consistent

format and that identifies “questionable” inputs which might be

valid or invalid, so that these values can be double-checked by a

person or a program. Our technique relies on the concept of a

“tope”, which is an application-independent abstraction describing

how to recognize and transform values in a category of data.

We present our definition of topes and describe a development

environment that supports the implementation and use of topes.

Experiments with web application and spreadsheet data indicate

that using our technique improves the accuracy and reusability of

validation code and also improves the effectiveness of subsequent

data cleaning such as duplicate identification.


This work was funded in part by the EUSES Consortium via NSF

(ITR-0325273) and by NSF under Grants CCF-0438929 and

CCF-0613823. Any opinions, findings, and conclusions or recommendations

expressed in this material are those of the author

and do not necessarily reflect the views of the sponsors.