Date of Original Version

10-1-2005

Type

Article

PubMed ID

16241899

Rights Management

This is a copy of an article published in the Journal of Computational Biology © 2005 Mary Ann Liebert, Inc.; Journal of Computational Biology is available online at: http://online.liebertpub.com.

Abstract or Description

Statistical validation of gene clusters is imperative for many important applications in comparative genomics which depend on the identification of genomic regions that are historically and/or functionally related. We develop the first rigorous statistical treatment of max-gap clusters, a cluster definition frequently used in empirical studies. We present exact expressions for the probability of observing an individual cluster of a set of marked genes in one genome, as well as upper and lower bounds on the probability of observing a cluster of h homologs in a pairwise whole-genome comparison. We demonstrate the utility of our approach by applying it to a whole-genome comparison of E. coli and B. subtilis. Code for statistical tests is available at.

DOI

10.1089/cmb.2005.12.1083

Included in

Biology Commons

Share

COinS
 

Published In

Journal of computational biology, 12, 8, 1083-1102.