Implementing sparse matrix-vector multiplication on throughput-oriented processors

Sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations encounter a broad spectrum of matrices ranging from the regular to the highly irregular. Harnessing the tremendous potential...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis s. 1 - 11
Hlavní autoři:	Bell, Nathan, Garland, Michael
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	New York, NY, USA ACM 14.11.2009
Edice:	ACM Conferences
Témata:	Bandwidth Computer systems organization > Architectures > Parallel architectures > Multiple instruction, multiple data Computer systems organization > Dependable and fault-tolerant systems and networks Computing methodologies > Symbolic and algebraic manipulation > Symbolic and algebraic algorithms > Linear algebra algorithms General and reference > Cross-computing tools and techniques > Performance Graphics processing units Hardware Instruction sets Kernel Mathematics of computing > Mathematical analysis > Numerical analysis > Computations on matrices Memory management Networks > Network performance evaluation Optimization Sparse matrices Throughput Vectors
ISBN:	1605587443, 9781605587448
ISSN:	2167-4329
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations encounter a broad spectrum of matrices ranging from the regular to the highly irregular. Harnessing the tremendous potential of throughput-oriented processors for sparse operations requires that we expose substantial fine-grained parallelism and impose sufficient regularity on execution paths and memory access patterns. We explore SpMV methods that are well-suited to throughput-oriented architectures like the GPU and which exploit several common sparsity classes. The techniques we propose are efficient, successfully utilizing large percentages of peak bandwidth. Furthermore, they deliver excellent total throughput, averaging 16 GFLOP/s and 10 GFLOP/s in double precision for structured grid and unstructured mesh matrices, respectively, on a GeForce GTX 285. This is roughly 2.8 times the throughput previously achieved on Cell BE and more than 10 times that of a quad-core Intel Clovertown system.
ISBN:	1605587443 9781605587448
ISSN:	2167-4329
DOI:	10.1145/1654059.1654078