Handling biological sequence alignments on networked computing systems: A divide-and-conquer approach

In this paper, we address the biological sequence alignment problem, which is one of the most commonly used steps in several bioinformatics applications. We employ the Divisible Load Theory (DLT) paradigm that is suitable for handling large-scale processing on network-based systems to achieve a high...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of parallel and distributed computing Vol. 69; no. 10; pp. 854 - 865
Main Authors:	Bharadwaj, Veeravalli, Wong, Han Min
Format:	Journal Article
Language:	English
Published:	Amsterdam Elsevier Inc 01.10.2009 Elsevier
Subjects:	Analytical, structural and metabolic biochemistry Applied sciences Biological and medical sciences Biological sequences Communication delay Computer science; control theory; systems Data processing. List processing. Character string processing Divisible load theory Exact sciences and technology Fundamental and applied biological sciences. Psychology Gene expression General aspects General aspects, investigation methods Linear networks Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Memory organisation. Data processing Molecular and cellular biology Molecular genetics Proteins Sequence alignment Smith–Waterman algorithm Software Communication delay Smith–Waterman algorithm Linear networks Biological sequences Sequence alignment Divisible load theory Data analysis Computer simulation Transmission protocol Interconnected power system DNA sequence Processing time Smith-Waterman algorithm Distributed computing Transmission time Workload Minimum time DNA Optimal solution Heuristic method Database Parallelism Bioinformatics Divide and conquer method
ISSN:	0743-7315, 1096-0848
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we address the biological sequence alignment problem, which is one of the most commonly used steps in several bioinformatics applications. We employ the Divisible Load Theory (DLT) paradigm that is suitable for handling large-scale processing on network-based systems to achieve a high degree of parallelism. Using the DLT paradigm, we propose a strategy in which we carefully partition the computation work load among the processors in the system so as to minimize the overall computation time of determining the maximum similarity between the DNA/protein sequences. We consider handling such a computational problem on networked computing platforms connected as a linear daisy chain. We derive the individual load quantum to be assigned to the processors according to computation and communication link speeds along the chain. We consider two cases of sequence alignment where post-processes, i.e., trace-back processes that are required to determine an optimal alignment, may or may not be done at individual processors in the system. We derive some critical conditions to determine if our strategies are able to yield an optimal processing time. We apply three different heuristic strategies proposed in the literature to generate sub-optimal solutions for processing time when the above conditions cannot be satisfied. To testify the proposed schemes, we use real-life DNA samples of house mouse mitochondrion and the DNA of human mitochondrion obtained from the public database GenBank [GenBank, http://www.ncbi.nlm.nih.gov] in our simulation experiments. By this study, we conclusively demonstrate the applicability and potential of the DLT paradigm to such biological sequence related computational problems.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	0743-7315 1096-0848
DOI:	10.1016/j.jpdc.2009.04.014