Efficient Updating of Biological Sequence Analyses

We present a novel approach for reducing the computational complexity of updating homologies produced by a wide class of popular state-of-the-art algorithms in comparative computational biology. The algorithms that we consider use hidden Markov models (HMMs) and a Viterbi recursion to evaluate match...

Full description

Saved in:
Bibliographic Details
Published in:IEEE journal of selected topics in signal processing Vol. 2; no. 3; pp. 365 - 377
Main Authors: Changjin Hong, Tewfik, A.H.
Format: Journal Article
Language:English
Published: New York IEEE 01.06.2008
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1932-4553, 1941-0484
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present a novel approach for reducing the computational complexity of updating homologies produced by a wide class of popular state-of-the-art algorithms in comparative computational biology. The algorithms that we consider use hidden Markov models (HMMs) and a Viterbi recursion to evaluate matches between sequences, or between a sequence and models. Such updates occur frequently in practice as researchers discover errors in biological sequences or analyze multiple nearly similar sequences, e.g., in a family of proteins that underwent mutations during evolution. The proposed algorithm interprets the Viterbi recursion as an update of an optimal minimum spanning tree in a shortest path problem. We propose the novel concept of a relative node tolerance bound and show how it can be used to guarantee that one or more partial subtrees of a minimum spanning tree obtained before encountering the perturbations remain optimal. We also describe how to compute and use in real-time the relative node tolerance bounds to skip most unperturbed parts of a sequence while computing the new optimal solution. To further reduce the computational overhead associated with the tolerance bound evaluation, we present and exploit a statistical analysis of the matching procedure that estimates how many columns in the dynamic program that corresponds to the matching problem are affected by a change in a preceding column. The resulting "reusable" Viterbi decoding algorithm can update a matching result in less than a third to a fifth of the time required to compute a new match by performing a normal matching procedure, i.e., running a Viterbi algorithm with updated sequences against a base hidden Markov model.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
content type line 23
ISSN:1932-4553
1941-0484
DOI:10.1109/JSTSP.2008.924382