Date of Original Version



Conference Proceeding

Rights Management

Copyright 2015 by the authors

Abstract or Description

Column subset selection of massive data matrices has found numerous applications in real-world data systems. In this paper, we propose and analyze two sampling based algorithms for column subset selection without access to the complete input matrix. To our knowledge, these are the first algorithms for column subset selection with missing data that are provably correct. The proposed methods work for row/column coherent matrices by employing the idea of adaptive sampling. Furthermore, when the input matrix has a noisy low-rank structure, one algorithm enjoys a relative error bound.



Published In

Journal of Machine Learning Research : Workshop and Conference Proceedings, 38, 1033-1041.