#### Date of Original Version

7-2008

#### Type

Conference Proceeding

#### Abstract or Description

We continue the investigation of natural conditions for a similarity function to allow learning, without requiring the similarity function to be a valid kernel, or referring to an implicit high-dimensional space. We provide a new notion of a “good similarity function” that builds upon the previous definition of Balcan and Blum (2006) but improves on it in two important ways. First, as with the previous definition, any large-margin kernel is also a good similarity function in our sense, but the translation now results in a much milder increase in the labeled sample complexity. Second, we prove that for distribution-specific PAC learning, our new notion is strictly more powerful than the traditional notion of a large-margin kernel. In particular, we show that for any hypothesis class C there exists a similarity function under our definition allowing learning with O(log |C|) labeled examples. However, in a lower bound which may be of independent interest, we show that for any class C of pairwise uncorrelated functions, there is no kernelwith margin γ ≥ 8√|C| for all f ∈ C, even if one allows average hinge-loss as large as 0.5. Thus, the sample complexity for learning such classes with SVMs is Ω(|C|). This extends work of Ben- David et al. (2003) and Forster and Simon (2006) who give hardness results with comparablemargin bounds, but at much lower error rates.

Our new notion of similarity relies upon L_{1} regularized learning, and our separation result is related to a separation result between what is learnable with L_{1} vs. L_{2} regularization.