Date of Original Version
© 2010 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Abstract or Description
We consider fc-median clustering in finite metric spaces and fc-means clustering in Euclidean spaces, in the setting where k is part of the input (not a constant). For the fc-means problem, Ostrovsky et al. show that if the optimal (k - 1)-means clustering of the input is more expensive than the optimal fc-means clustering by a factor of 1/∈2, then one can achieve a (1 + f(∈))-approximation to the fc-means optimal in time polynomial in n and k by using a variant of Lloyd's algorithm. In this work we substantially improve this approximation guarantee. We show that given only the condition that the (k - 1)-means optimal is more expensive than the fc-means optimal by a factor 1 + α for some constant α > 0, we can obtain a PTAS. In particular, under this assumption, for any ∈ > 0 we achieve a (1 + ∈)-approximation to the fc-means optimal in time polynomial in n and k, and exponential in 1/e and 1/α. We thus decouple the strength of the assumption from the quality of the approximation ratio. We also give a PTAS for the fc-median problem in finite metrics under the analogous assumption as well. For fc-means, we in addition give a randomized algorithm with improved running time of no(1) (k log n)poly(1/∈,1/α) Our technique also obtains a PTAS under the assumption of Balcan et al. that all (1 + α) approximations are δ-close to a desired target clustering, in the case that all target clusters have size greater than δn and α > 0 is constant. Note that the motivation of Balcan et al. is that for many clustering problems, the objective function is only a proxy for the true goal of getting close to the target. From this perspective, our improvement is that for fc-means in Euclidean spaces we reduce the distance of the clustering found to the target from O(δ) to δ when all target clusters are large, and for fc-median we improve- - the "largeness" condition needed in to get exactly δ-close from O(δn) to δn. Our results are based on a new notion of clustering stability.
Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 2010, 309-3018.