A fast parallel algorithm for finding the longest common sequence of multiple biosequences

Searching for the longest common sequence (LCS) of multiple biosequences is one of the most fundamental tasks in bioinformatics. In this paper, we present a parallel algorithm named FAST_LCS to speedup the computation for finding LCS. A fast parallel algorithm for LCS is presented. The algorithm fir...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	BMC bioinformatics Jg. 7; H. S4; S. S4
Hauptverfasser:	Chen, Yixin, Wan, Andrew, Liu, Wei
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	England BioMed Central 12.12.2006 BMC
Schlagworte:	Algorithms Computing Methodologies Conserved Sequence Sequence Alignment - methods Sequence Analysis - methods Sequence Homology Time Factors
ISSN:	1471-2105, 1471-2105
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Searching for the longest common sequence (LCS) of multiple biosequences is one of the most fundamental tasks in bioinformatics. In this paper, we present a parallel algorithm named FAST_LCS to speedup the computation for finding LCS. A fast parallel algorithm for LCS is presented. The algorithm first constructs a novel successor table to obtain all the identical pairs and their levels. It then obtains the LCS by tracing back from the identical character pairs at the last level. Effective pruning techniques are developed to significantly reduce the computational complexity. Experimental results on gene sequences in the tigr database show that our algorithm is optimal and much more efficient than other leading LCS algorithms. We have developed one of the fastest parallel LCS algorithms on an MPP parallel computing model. For two sequences X and Y with lengths n and m, respectively, the memory required is max{4(n+1)+4(m+1), L}, where L is the number of identical character pairs. The time complexity is O(L) for sequential execution, and O(\|LCS(X, Y)\|) for parallel execution, where \|LCS(X, Y)\| is the length of the LCS of X and Y. For n sequences X1, X2, ..., Xn, the time complexity is O(L) for sequential execution, and O(\|LCS(X1, X2, ..., Xn)\|) for parallel execution. Experimental results support our analysis by showing significant improvement of the proposed method over other leading LCS algorithms.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 ObjectType-Undefined-1 ObjectType-Feature-3
ISSN:	1471-2105 1471-2105
DOI:	10.1186/1471-2105-7-S4-S4