Efficient Updating of Biological Sequence Analyses
We present a novel approach for reducing the computational complexity of updating homologies produced by a wide class of popular state-of-the-art algorithms in comparative computational biology. The algorithms that we consider use hidden Markov models (HMMs) and a Viterbi recursion to evaluate match...
Gespeichert in:
| Veröffentlicht in: | IEEE journal of selected topics in signal processing Jg. 2; H. 3; S. 365 - 377 |
|---|---|
| Hauptverfasser: | , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
IEEE
01.06.2008
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 1932-4553, 1941-0484 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | We present a novel approach for reducing the computational complexity of updating homologies produced by a wide class of popular state-of-the-art algorithms in comparative computational biology. The algorithms that we consider use hidden Markov models (HMMs) and a Viterbi recursion to evaluate matches between sequences, or between a sequence and models. Such updates occur frequently in practice as researchers discover errors in biological sequences or analyze multiple nearly similar sequences, e.g., in a family of proteins that underwent mutations during evolution. The proposed algorithm interprets the Viterbi recursion as an update of an optimal minimum spanning tree in a shortest path problem. We propose the novel concept of a relative node tolerance bound and show how it can be used to guarantee that one or more partial subtrees of a minimum spanning tree obtained before encountering the perturbations remain optimal. We also describe how to compute and use in real-time the relative node tolerance bounds to skip most unperturbed parts of a sequence while computing the new optimal solution. To further reduce the computational overhead associated with the tolerance bound evaluation, we present and exploit a statistical analysis of the matching procedure that estimates how many columns in the dynamic program that corresponds to the matching problem are affected by a change in a preceding column. The resulting "reusable" Viterbi decoding algorithm can update a matching result in less than a third to a fifth of the time required to compute a new match by performing a normal matching procedure, i.e., running a Viterbi algorithm with updated sequences against a base hidden Markov model. |
|---|---|
| Bibliographie: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 content type line 23 |
| ISSN: | 1932-4553 1941-0484 |
| DOI: | 10.1109/JSTSP.2008.924382 |