Date of Original Version
© ACM, 1992. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution.
Abstract or Description
We consider the problem of exploiting parallelism to accelerate the performance of spacial access methods and specifically, R-trees . Our goal is to design a server for spatial data, so that to maximize the throughput of range queries. This can be achieved by (a) maximizing parallelism for large range queries, and (b) by engaging as few disks as possible on point queries . We propose a simple hardware architecture consisting of one processor with several disks attached to it. On this architecture, we propose to distribute the nodes of a traditonal R-tree, with cross-disk pointers (“Multiplexed” R-tree). The R-tree code is identical to the one for a single-disk R-tree, with the only addition that we have to decide which disk a newly created R-tree node should be stored in. We propose and examine several criteria to choose a disk for a new node. The most successful one, termed “proximity index” or PI, estimates the similarity of the new node with the other R-tree nodes already on a disk, and chooses the disk with the lowest similarity. Experimental results show that our scheme consistently outperforms all the other heuristics for node-to-disk assignments, achieving up to 55% gains over the Round Robin one. Experiments also indicate that the multiplexed R-tree with PI heuristic gives better response time than the disk-stripping (=“Super-node”) approach, and imposes lighter load on the I/O sub-system. The speed up of our method is close to linear speed up, increasing with the size of the queries.
Proceedings of the 1992 ACM SIGMOD international Conference on Management of Data. M. Stonebraker, Ed. , 195-204.