A coarse-grained multicomputer parallel algorithm for the sequential substring constrained longest common subsequence problem

In this paper, we study the sequential substring constrained longest common subsequence (SSCLCS) problem. It is widely used in the bioinformatics field. Given two strings X and Y with respective lengths m and n, formed on an alphabet Σ and a constraint sequence C formed by ordered strings (c1,c2,…,c...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Parallel computing Ročník 111; s. 102927
Hlavní autori: Kengne Tchendji, Vianney, Bogning Tepiele, Hermann, Akong Onabid, Mathias, Myoupo, Jean Frédéric, Lacmou Zeutouo, Jerry
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 01.07.2022
Elsevier
Predmet:
ISSN:0167-8191, 1872-7336
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:In this paper, we study the sequential substring constrained longest common subsequence (SSCLCS) problem. It is widely used in the bioinformatics field. Given two strings X and Y with respective lengths m and n, formed on an alphabet Σ and a constraint sequence C formed by ordered strings (c1,c2,…,cl) with total length r, the SSCLCS problem is to find the longest common subsequence D between X and Y such that D contains in an ordered way c1,c2,…,cl. To solve this problem, Tseng et al. proposed a dynamic-programming algorithm that runs in Omnr+(m+n)|Σ| time. We rely on this work to propose a parallel algorithm for the SSCLCS problem on the Coarse-Grained Multicomputer (CGM) model. We design a three-dimensional partitioning technique of the corresponding dependency graph to reduce the latency time of processors by ensuring that at each step, the size of the subproblems to be performed by processors is small. It also minimizes the number of communications between processors. Our solution requires Onmr+(m+n)|Σ|p execution time with O(p) communication rounds on p processors. The experimental results show that our solution speedups up to 59.7 on 64 processors. This is better than the CGM-based parallel techniques that have been used in solving similar problems. •Describing a task graph following the Tseng et al.’s recursive formula•Describing a three-dimensional partitioning strategy and a distribution scheme•Experimental study on a real parallel machine with existing DNA data sets•Comparing the empirical results between the proposed strategy and previous strategies
ISSN:0167-8191
1872-7336
DOI:10.1016/j.parco.2022.102927