Supernodal sparse Cholesky factorization on graphics processing units

SUMMARY Sparse Cholesky factorization is the most computationally intensive component in solving large sparse linear systems and is the core algorithm of numerous scientific computing applications. A large number of sparse Cholesky factorization algorithms have previously emerged, exploiting archite...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Concurrency and computation Jg. 26; H. 16; S. 2713 - 2726
Hauptverfasser: Zou, Dan, Dou, Yong, Guo, Song, Li, Rongchun, Deng, Lin
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Blackwell Publishing Ltd 01.11.2014
Schlagworte:
ISSN:1532-0626, 1532-0634
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:SUMMARY Sparse Cholesky factorization is the most computationally intensive component in solving large sparse linear systems and is the core algorithm of numerous scientific computing applications. A large number of sparse Cholesky factorization algorithms have previously emerged, exploiting architectural features for various computing platforms. The recent use of graphics processing units (GPUs) to accelerate structured parallel applications shows the potential to achieve significant acceleration relative to desktop performance. However, sparse Cholesky factorization has not been explored sufficiently because of the complexity involved in its efficient implementation and the concerns of low GPU utilization. In this paper, we present a new approach for sparse Cholesky factorization on GPUs. We present the organization of the sparse matrix supernode data structure for GPU and propose a queue‐based approach for the generation and scheduling of GPU tasks with dense linear algebraic operations. We also design a subtree‐based parallel method for multi‐GPU system. These approaches increase GPU utilization, thus resulting in substantial computational time reduction. Comparisons are made with the existing parallel solvers by using problems arising from practical applications. The experiment results show that the proposed approaches can substantially improve sparse Cholesky factorization performance on GPUs. Relative to a highly optimized parallel algorithm on a 12‐core node, we were able to obtain speedups in the range 1.59× to 2.31× by using one GPU and 1.80× to 3.21× by using two GPUs. Relative to a state‐of‐the‐art solver based on supernodal method for CPU‐GPU heterogeneous platform, we were able to obtain speedups in the range 1.52× to 2.30× by using one GPU and 2.15× to 2.76× by using two GPUs. Concurrency and Computation: Practice and Experience, 2013. Copyright © 2013 John Wiley & Sons, Ltd.
Bibliographie:istex:28E4EBE69BC931F153728A2B7ADE96B1B45BCDE2
ArticleID:CPE3158
ark:/67375/WNG-Q4W5TMV8-B
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.3158