Supernodal sparse Cholesky factorization on graphics processing units

SUMMARY Sparse Cholesky factorization is the most computationally intensive component in solving large sparse linear systems and is the core algorithm of numerous scientific computing applications. A large number of sparse Cholesky factorization algorithms have previously emerged, exploiting archite...

Full description

Saved in:
Bibliographic Details
Published in:Concurrency and computation Vol. 26; no. 16; pp. 2713 - 2726
Main Authors: Zou, Dan, Dou, Yong, Guo, Song, Li, Rongchun, Deng, Lin
Format: Journal Article
Language:English
Published: Blackwell Publishing Ltd 01.11.2014
Subjects:
ISSN:1532-0626, 1532-0634
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:SUMMARY Sparse Cholesky factorization is the most computationally intensive component in solving large sparse linear systems and is the core algorithm of numerous scientific computing applications. A large number of sparse Cholesky factorization algorithms have previously emerged, exploiting architectural features for various computing platforms. The recent use of graphics processing units (GPUs) to accelerate structured parallel applications shows the potential to achieve significant acceleration relative to desktop performance. However, sparse Cholesky factorization has not been explored sufficiently because of the complexity involved in its efficient implementation and the concerns of low GPU utilization. In this paper, we present a new approach for sparse Cholesky factorization on GPUs. We present the organization of the sparse matrix supernode data structure for GPU and propose a queue‐based approach for the generation and scheduling of GPU tasks with dense linear algebraic operations. We also design a subtree‐based parallel method for multi‐GPU system. These approaches increase GPU utilization, thus resulting in substantial computational time reduction. Comparisons are made with the existing parallel solvers by using problems arising from practical applications. The experiment results show that the proposed approaches can substantially improve sparse Cholesky factorization performance on GPUs. Relative to a highly optimized parallel algorithm on a 12‐core node, we were able to obtain speedups in the range 1.59× to 2.31× by using one GPU and 1.80× to 3.21× by using two GPUs. Relative to a state‐of‐the‐art solver based on supernodal method for CPU‐GPU heterogeneous platform, we were able to obtain speedups in the range 1.52× to 2.30× by using one GPU and 2.15× to 2.76× by using two GPUs. Concurrency and Computation: Practice and Experience, 2013. Copyright © 2013 John Wiley & Sons, Ltd.
Bibliography:istex:28E4EBE69BC931F153728A2B7ADE96B1B45BCDE2
ArticleID:CPE3158
ark:/67375/WNG-Q4W5TMV8-B
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.3158