Date of Award

9-2013

Embargo Period

8-17-2014

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Language Technologies Institute

Advisor(s)

Christopher James Langmead

Abstract

Generative models of protein structure enable researchers to predict the behavior of proteins under different conditions. Continuous graphical models are powerful and efficient tools for modeling static and dynamic distributions, which can be used for learning generative models of molecular dynamics. In this thesis, we develop new and improved continuous graphical models, to be used in modeling of protein structure. We first present von Mises graphical models, and develop consistent and efficient algorithms for sparse structure learning and parameter estimation, and inference. We compare our model to sparse Gaussian graphical model and show it outperforms GGMs on synthetic and Engrailed protein molecular dynamics datasets. Next, we develop algorithms to estimate Mixture of von Mises graphical models using Expectation Maximization, and show that these models outperform Von Mises, Gaussian and mixture of Gaussian graphical models in terms of accuracy of prediction in imputation test of non-redundant protein structure datasets. We then use non-paranormal and nonparametric graphical models, which have extensive representation power, and compare several state of the art structure learning methods that can be used prior to nonparametric inference in reproducing kernel Hilbert space embedded graphical models. To be able to take advantage of the nonparametric models, we also propose feature space embedded belief propagation, and use random Fourier based feature approximation in our proposed feature belief propagation, to scale the inference algorithm to larger datasets. To improve the scalability further, we also show the integration of Coreset selection algorithm with the nonparametric inference, and show that the combined model scales to large datasets with very small adverse effect on the quality of predictions. Finally, we present time varying sparse Gaussian graphical models, to learn smoothly varying graphical models of molecular dynamics simulation data, and present results on CypA protein

Share

COinS