Date of Original Version




Abstract or Description

We consider the problem of multiple sequence alignment under a fixed evolutionary tree: given a tree whose leaves are labeled by sequences, find ancestral sequences to label its internal nodes so as to minimize the total length of the tree, where the length of an edge is the edit distance between the sequences labeling its endpoints. We present a new polynomial-time approximation algorithm for this problem, and analyze its performance on regular d-ary trees with d a constant. On such a tree, the algorithm finds a solution within a factor (d + 1)/(d − 1) of the minimum in O(kdT(d, n) + k2dn2) time, where k is the number of leaves in the tree, n is the length of the longest sequence labeling a leaf, and T(d, n) is the time to compute a Steiner point for d sequences of length at most n. (A Steiner point for a set S of sequences is a sequence P that minimizes the sum of the edit distances from P to each sequence in S . The time T(d, n) is O(d2dnd), given O(dsd+1)-time preprocessing for an alphabet of size s.) The approximation algorithm is conceptually simple and easy to implement, and actually applies to any metric space in which a Steiner point for any fixed-sized set can be computed in polynomial time. We also introduce a new problem, bottleneck tree-alignment, in which the objective is to label the internal nodes of the tree so as to minimize the length of the longest edge. We describe an exponential-time exact algorithm for the case of unit-cost edit operations, and show there is a simple linear-time approximation algorithm for the general case that finds a solution within a factor O(logk) of the minimum.





Published In

Discrete Applied Mathematics , 88, 1-3, 355-366.