Date of Original Version
Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.
Abstract or Description
Data is often stored in summarized form, as a histogram of aggregates (COUNTs, SUMs, or AVeraGes) over specified ranges. We study how to estimate the original detail data from the stored summary. We formulate this task as an inverse problem, specifying a well-defined cost function that has to be optimized under constraints. We show that our formulation includes the uniformity and independence assumptions as a special case, and that it can achieve better reconstruction results if we maximize the smoothness as opposed to the uniformity. In our experiments on real and synthetic datasets, the proposed method almost consistently outperforms its competitor, improving the root-mean-square error by up to 20 per cent for stock price data, and up to 90 per cent for smoother data sets. Finally, we show how to apply this theory to a variety of database problems that involve partial information, such as OLAP, data warehousing and histograms in query optimization.
Proceedings of 23rd International Conference on Very Large Data Bases.