Date of Original Version
This is a copy of an article published in the Journal of Computational Biology © 2005 Mary Ann Liebert, Inc.; Journal of Computational Biology is available online at: http://online.liebertpub.com.
Abstract or Description
Statistical validation of gene clusters is imperative for many important applications in comparative genomics which depend on the identification of genomic regions that are historically and/or functionally related. We develop the first rigorous statistical treatment of max-gap clusters, a cluster definition frequently used in empirical studies. We present exact expressions for the probability of observing an individual cluster of a set of marked genes in one genome, as well as upper and lower bounds on the probability of observing a cluster of h homologs in a pairwise whole-genome comparison. We demonstrate the utility of our approach by applying it to a whole-genome comparison of E. coli and B. subtilis. Code for statistical tests is available at.
Journal of computational biology, 12, 8, 1083-1102.