Developing a Multi-GPU-Enabled Preconditioned GMRES with Inexact Triangular Solves for Block Sparse Matrices

Uložené v:
Podrobná bibliografia
Názov: Developing a Multi-GPU-Enabled Preconditioned GMRES with Inexact Triangular Solves for Block Sparse Matrices
Autori: Wenpeng Ma, Yiwen Hu, Wu Yuan, Xiazhen Liu
Zdroj: Mathematical Problems in Engineering, Vol 2021 (2021)
Informácie o vydavateľovi: Hindawi Limited
Rok vydania: 2021
Zbierka: Directory of Open Access Journals: DOAJ Articles
Predmety: Engineering (General). Civil engineering (General), TA1-2040, Mathematics, QA1-939
Popis: Solving triangular systems is the building block for preconditioned GMRES algorithm. Inexact preconditioning becomes attractive because of the feature of high parallelism on accelerators. In this paper, we propose and implement an iterative, inexact block triangular solve on multi-GPUs based on PETSc’s framework. In addition, by developing a distributed block sparse matrix-vector multiplication procedure and investigating the optimized vector operations, we form the multi-GPU-enabled preconditioned GMRES with the block Jacobi preconditioner. In the implementation, the GPU-Direct technique is employed to avoid host-device memory copies. The preconditioning step used by PETSc’s structure and the cuSPARSE library are also investigated for performance comparisons. The experiments show that the developed GMRES with inexact preconditioning on 8 GPUs can achieve up to 4.4x speedup over the CPU-only implementation with exact preconditioning using 8 MPI processes.
Druh dokumentu: article in journal/newspaper
Jazyk: English
Relation: http://dx.doi.org/10.1155/2021/6804723; https://doaj.org/toc/1024-123X; https://doaj.org/toc/1563-5147; https://doaj.org/article/0043c6fe61fa4f75866dc1fe6f37f4cd
DOI: 10.1155/2021/6804723
Dostupnosť: https://doi.org/10.1155/2021/6804723
https://doaj.org/article/0043c6fe61fa4f75866dc1fe6f37f4cd
Prístupové číslo: edsbas.14F878C1
Databáza: BASE
Popis
Abstrakt:Solving triangular systems is the building block for preconditioned GMRES algorithm. Inexact preconditioning becomes attractive because of the feature of high parallelism on accelerators. In this paper, we propose and implement an iterative, inexact block triangular solve on multi-GPUs based on PETSc’s framework. In addition, by developing a distributed block sparse matrix-vector multiplication procedure and investigating the optimized vector operations, we form the multi-GPU-enabled preconditioned GMRES with the block Jacobi preconditioner. In the implementation, the GPU-Direct technique is employed to avoid host-device memory copies. The preconditioning step used by PETSc’s structure and the cuSPARSE library are also investigated for performance comparisons. The experiments show that the developed GMRES with inexact preconditioning on 8 GPUs can achieve up to 4.4x speedup over the CPU-only implementation with exact preconditioning using 8 MPI processes.
DOI:10.1155/2021/6804723