Date of Original Version
Abstract or Description
Protein fold recognition is a key step towards inferring the tertiary structures from amino-acid sequences. Complex folds such as those consisting of interacting structural repeats are prevalent in proteins involved in a wide spectrum of biological functions. However, extant approaches often perform inadequately due to their inability to capture long-range interactions between structural units and to handle low sequence similarities across proteins (under 25% identity). In this paper, we propose a chain graph model built on a causally connected series of segmentation conditional random fields (SCRFs) to address these issues. Specifically, the SCRF model captures long-range interactions within recurring structural units and the Bayesian network backbone decomposes cross-repeat interactions into locally computable modules consisting of repeat-specific SCRFs and a model for sequence motifs. We applied this model to predict β -helices and leucine-rich repeats, and found it significantly outperforms extant methods in predictive accuracy and/or computational efficiency.