A coarse-grained multicomputer parallel algorithm for the sequential substring constrained longest common subsequence problem

In this paper, we study the sequential substring constrained longest common subsequence (SSCLCS) problem. It is widely used in the bioinformatics field. Given two strings X and Y with respective lengths m and n, formed on an alphabet Σ and a constraint sequence C formed by ordered strings (c1,c2,…,c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Parallel computing Jg. 111; S. 102927
Hauptverfasser: Kengne Tchendji, Vianney, Bogning Tepiele, Hermann, Akong Onabid, Mathias, Myoupo, Jean Frédéric, Lacmou Zeutouo, Jerry
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.07.2022
Elsevier
Schlagworte:
ISSN:0167-8191, 1872-7336
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, we study the sequential substring constrained longest common subsequence (SSCLCS) problem. It is widely used in the bioinformatics field. Given two strings X and Y with respective lengths m and n, formed on an alphabet Σ and a constraint sequence C formed by ordered strings (c1,c2,…,cl) with total length r, the SSCLCS problem is to find the longest common subsequence D between X and Y such that D contains in an ordered way c1,c2,…,cl. To solve this problem, Tseng et al. proposed a dynamic-programming algorithm that runs in Omnr+(m+n)|Σ| time. We rely on this work to propose a parallel algorithm for the SSCLCS problem on the Coarse-Grained Multicomputer (CGM) model. We design a three-dimensional partitioning technique of the corresponding dependency graph to reduce the latency time of processors by ensuring that at each step, the size of the subproblems to be performed by processors is small. It also minimizes the number of communications between processors. Our solution requires Onmr+(m+n)|Σ|p execution time with O(p) communication rounds on p processors. The experimental results show that our solution speedups up to 59.7 on 64 processors. This is better than the CGM-based parallel techniques that have been used in solving similar problems. •Describing a task graph following the Tseng et al.’s recursive formula•Describing a three-dimensional partitioning strategy and a distribution scheme•Experimental study on a real parallel machine with existing DNA data sets•Comparing the empirical results between the proposed strategy and previous strategies
ISSN:0167-8191
1872-7336
DOI:10.1016/j.parco.2022.102927