A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU–GPU clusters

Biological interaction databases accommodate information about interacted proteins or genes. Clustering on the networks formed by the interaction information for finding regions highly connected could reveal the functional affinities or structural similarities between protein or gene entities. With...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	The Journal of supercomputing Ročník 78; číslo 7; s. 9017 - 9037
Hlavní autoři:	Fu, You, Zhou, Wei
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York Springer US 01.05.2022 Springer Nature B.V
Témata:	Algorithms Clustering Compilers Computer Science Data transmission Interpreters Processor Architectures Programming Languages Proteins Parallel computing Biological interaction network Compute Unified Device Architecture Heterogenous computing Cluster computing
ISSN:	0920-8542, 1573-0484
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Biological interaction databases accommodate information about interacted proteins or genes. Clustering on the networks formed by the interaction information for finding regions highly connected could reveal the functional affinities or structural similarities between protein or gene entities. With the ever-increasing amounts of information in these databases, the runtime of a clustering task is more and more unaffordable. In this paper, we propose a heterogeneous parallel algorithm focusing on accelerating clustering tasks using distributed CPU–GPU clusters. Our parallel implementation is based on the original serial algorithm of the Markov clustering (MCL). In our parallel implementation, we utilize both the CPUs and GPUs to exploit the power of heterogeneous platforms. With the BioGRID biological interaction database, we have tested the proposed algorithm on a computer cluster equipped with NVIDIA Tesla P100 GPU accelerators. The result shows that, the algorithm is efficient in GPU memory usage and inter-node data transmission, and it can complete the clustering task in 3.2 minutes with the best speedup of 70.02 times compared to the serial counterpart.We believe our work can provide key insights for realizing fast MCL analyses on large-scale biological data, with distributed CPU–GPU computer clusters.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-021-04204-6