Topes: Reusable Abstractions for Validating Data
Date of Original Version
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICSE’08, May 10-18, 2008, Leipzig, Saxony, Germany. Copyright 2008 ACM 1-58113-000-0/00/0004…$5.00.
Abstract or Description
Programmers often omit input validation when inputs can appear
in many different formats or when validation criteria cannot be
precisely specified. To enable validation in these situations, we
present a new technique that puts valid inputs into a consistent
format and that identifies “questionable” inputs which might be
valid or invalid, so that these values can be double-checked by a
person or a program. Our technique relies on the concept of a
“tope”, which is an application-independent abstraction describing
how to recognize and transform values in a category of data.
We present our definition of topes and describe a development
environment that supports the implementation and use of topes.
Experiments with web application and spreadsheet data indicate
that using our technique improves the accuracy and reusability of
validation code and also improves the effectiveness of subsequent
data cleaning such as duplicate identification.