Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation

Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	2012 26th IEEE International Parallel and Distributed Processing Symposium Workshops s. 1696 - 1702
Hlavní autori:	Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Basermann, A., Bishop, A. R.
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 01.05.2012
Predmet:	Bandwidth Computational modeling CUDA Error correction codes GPGPU Instruction sets Kernel Sparse matrices Vectors
ISBN:	1467309745, 9781467309745
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia "Fermi" class of GPGPUs. A new "padded jagged diagonals storage" (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme while making no assumptions about the matrix structure. In our test scenarios the pJDS format cuts the overall spMVM memory footprint on the GPGPU by up to 70%, and achieves 91% to 130% of the ELLPACK-R performance. Using a suitable performance model we identify performance bottlenecks on the node level that invalidate some types of matrix structures for efficient multi-GPGPU parallelization. For appropriate sparsity patterns we extend previous work on distributed-memory parallel spMVM to demonstrate a scalable hybrid MPI-GPGPU code, achieving efficient overlap of communication and computation.
ISBN:	1467309745 9781467309745
DOI:	10.1109/IPDPSW.2012.211