Date of Original Version



Working Paper

Rights Management

All Rights Reserved

Abstract or Description

Optimal clustering is a notoriously hard task. Recently, several papers have suggested a new approach to clustering, motivated by examining natural assumptions that arise in practice, or that are made implicitly by many standard algorithmic approaches. These assumptions concern various measures of stability of our given clustering instance. The work of Bilu and Linial [BL10] refers to stability with respect to perturbations in the metric space, and gives positive results for inputs stable to perturbations of size O(n1/3) for max-cut based clustering. The work of Balcan et al [BBG09] refers to stability with respect to approximations of the objective, and gives positive results for inputs such that all (1 + α)-approximations to the k-median (or k-means) optimal solution are close, as partitions of the data, to the actual desired clustering. Related assumptions are considered by Ostrovsky et al. [ORSS06]. In this paper we extend these directions in several importantways. For the Bilu-Linial assumption, we show that for center-based clustering objectives (such as k-median, k-means, and k-center) we can efficiently find the optimal clustering assuming only stability to constant-magnitude perturbations of the underlying metric. For the approximation assumption of Balcan et al., we relax their condition to require good behavior only for “natural” (1 + α)-approximations of the objective: specifically, to make assumptions only about approximations in which each point is assigned to its nearest center. This relaxed assumption now allows for clusters in the optimum solution to be quite close together. Nonetheless, we show that for the k-median objective, for any α > 0, under the assumption that all such approximations yield the same partitioning it is still possible to obtain an optimal solution in polynomial time. In addition, we show extensions of this result to infinite metric spaces for any α ≥ 1.