Date of Original Version

12-2007

Type

Technical Report

Abstract or Table of Contents

Storage management is usually handled by skilled system administrators. The speci fic task of confi guring and allocating disk space for applications, often referred to as storage system design, is especially time- consuming and error-prone. Automated storage system design, a solution proposed by many, relies on fast and accurate performance predictions. However, challenges with conventional performance modeling have prevented such automation from being fully realized in practice.

Relative fitness is a new approach to modeling the performance of storage systems. In contrast to conventional models that predict the performance of storage systems based on the characteristics of workloads, referred to in this dissertation as absolute models, relative fi tness models predict performance diff erences as workloads are moved across storage systems. There are two primary advantages. First, because relative fitness models are constructed for each pair of storage systems, the feedback of a closed workload can be captured (e.g., how the I/O arrival rate changes as the workload moves from storage system A to storage system B). Second, relative fi tness models allow performance and resource utilization to be used in addition to workload characteristics. This is bene ficial when workload characteristics are difficult to obtain or concisely express. For example, rather than trying to describe the spatio-temporal characteristics of a workload, one could use the observed performance and cache hit rate of storage system A to help predict the performance of storage system B.

This dissertation describes the steps necessary to build a relative fitness model, with an approach that is general enough to be used with any black-box modeling technique. Relative fi tness models and absolute models are compared across a variety of workloads and disk arrays (RAID). When compared to absolute models, relative fi tness models reduce the bandwidth prediction error up to 53%, throughput up to 23%, and latency up to 20%. In general, the best predictors of the relative fitness models are performance observations, followed by conventional workload characteristics.

Relative fi tness models can be used in automated storage system design in a similar way that absolute models are used. Speci fically, workloads can be observed on the storage systems that they are initially assigned to, relative fitness models can use these observations to predict the performance of different assignments, and optimization techniques can be used to select an assignment that optimizes performance.

Share

COinS