Date of Original Version
Copyright © 1988 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.
Abstract or Description
High capacity disks, especially optical ones, are commercially available. These disks are ideal for archiving large text data bases. In this work, we examine efficient searching techniques for such applications. We propose a unifying framework, which reveals the similarities between signature files and an inverted file using a hash table. Then, we design methods that combine the ease of insertion of the signature files with the fast retrieval of the inverted files. We develop analytical models for their performance and we verify it through experimentation on a 2.8 Mb data base. The agreement between theory and experimentation is very good. The results show that the proposed methods achieve fast retrieval, they require a modest 10%-30% space overhead, (as opposed to 50%- 300% overhead  for the inverted files), and they do not require re-writing; thus, they can handle insertions easily, they permit searches during an insertion and they can be used with write-once optical disks. Using our verified model, the performance predictions for the proposed methods on large data bases (e.g., 250 Mb) are very promising.
Proceedings of 14th International Conference on Very Large Data Bases.