Date of Award


Embargo Period


Degree Name

Doctor of Philosophy (PhD)


Machine Learning


John Lafferty

Second Advisor

Larry Wasserman

Third Advisor

Christopher Genovese

Fourth Advisor

Zoubin Ghahramani

Fifth Advisor

Bin Yu


This thesis develops flexible and principled nonparametric learning algorithms to explore, understand, and predict high dimensional and complex datasets. Such data appear frequently in modern scientific domains and lead to numerous important applications. For example, exploring high dimensional functional magnetic resonance imaging data helps us to better understand brain functionalities; inferring large-scale gene regulatory network is crucial for new drug design and development; detecting anomalies in high dimensional transaction databases is vital for corporate and government security.

Our main results include a rigorous theoretical framework and efficient nonparametric learning algorithms that exploit hidden structures to overcome the curse of dimensionality when analyzing massive high dimensional datasets. These algorithms have strong theoretical guarantees and provide high dimensional nonparametric recipes for many important learning tasks, ranging from unsupervised exploratory data analysis to supervised predictive modeling. In this thesis, we address three aspects:

1 Understanding the statistical theories of high dimensional nonparametric inference, including risk, estimation, and model selection consistency;

2 Designing new methods for different data-analysis tasks, including regression, classification, density estimation, graphical model learning, multi-task learning, spatial-temporal adaptive learning;

3 Demonstrating the usefulness of these methods in scientific applications, including functional genomics, cognitive neuroscience, and meteorology.

In the last part of this thesis, we also present the future vision of high dimensional and large-scale nonparametric inference.