Date of Original Version
This is a copy of an article published in the Journal of Computational Biology © 2013 Mary Ann Liebert, Inc.; Journal of Computational Biology is available online at: http://online.liebertpub.com
Abstract or Description
Inference of gene interaction networks from expression data usually focuses on either supervised or unsupervised edge prediction from a single data source. However, in many real world applications, multiple data sources, such as microarray and ISH (in situ hybridization) measurements of mRNA abundances, are available to offer multiview information about the same set of genes. We propose ISH to estimate a gene interaction network that is consistent with such multiple data sources, which are expected to reflect the same underlying relationships between the genes. NP-MuScL casts the network estimation problem as estimating the structure of a sparse undirected graphical model. We use the semiparametric Gaussian copula to model the distribution of the different data sources, with the different copulas sharing the same precision (i.e., inverse covariance) matrix, and we present an efficient algorithm to estimate such a model in the high-dimensional scenario. Results are reported on synthetic data, where NP-MuScL outperforms baseline algorithms significantly, even in the presence of noisy data sources. Experiments are also run on two real-world scenarios: two yeast microarray datasets and three Drosophila embryonic gene expression datasets, where NP-MuScL predicts a higher number of known gene interactions than existing techniques.
Journal of computational biology, 20, 11, 892-904.