Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format

Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However, with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, w...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE/ACM transactions on computational biology and bioinformatics Vol. 9; no. 3; pp. 679 - 692
Main Authors:	Bustamam, A., Burrage, K., Hamilton, N. A.
Format:	Journal Article
Language:	English
Published:	United States IEEE 01.05.2012 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Bioinformatics Cluster Analysis Clustering Computation Computational Biology - methods Computer Graphics Computer Simulation CUDA ELLPACK-R sparse format Format GPU computing Graphics processing unit graphs and networks Instruction sets Markov Chains Markov clustering Markov processes Multicore processing Networks Oligonucleotide Array Sequence Analysis Parallel processing parallelism and concurrency performance evaluation PPI networks Proteins scalable parallel programming Studies
ISSN:	1545-5963, 1557-9964, 1557-9964
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However, with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1545-5963 1557-9964 1557-9964
DOI:	10.1109/TCBB.2011.68