Date of Original Version




PubMed ID


Rights Management

This is a copy of an article published in the Journal of Computational Biology © 2005 Mary Ann Liebert, Inc.; Journal of Computational Biology is available online at:

Abstract or Description

Statistical validation of gene clusters is imperative for many important applications in comparative genomics which depend on the identification of genomic regions that are historically and/or functionally related. We develop the first rigorous statistical treatment of max-gap clusters, a cluster definition frequently used in empirical studies. We present exact expressions for the probability of observing an individual cluster of a set of marked genes in one genome, as well as upper and lower bounds on the probability of observing a cluster of h homologs in a pairwise whole-genome comparison. We demonstrate the utility of our approach by applying it to a whole-genome comparison of E. coli and B. subtilis. Code for statistical tests is available at.



Included in

Biology Commons



Published In

Journal of computational biology, 12, 8, 1083-1102.