Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices

A challenging class of problems arising in many GPU applications, called batched problems, involves linear algebra operations on many small-sized matrices. We designed batched BLAS (Basic Linear Algebra Subroutines) routines, and in particular the Level-2 BLAS GEMV and the Level-3 BLAS GEMM routines...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Procedia computer science Ročník 108; s. 1008 - 1018
Hlavní autoři:	Dong, Tingxing, Haidar, Azzam, Tomov, Stanimire, Dongarra, Jack
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 2017
Témata:	batched Hardware accelerators Singular Value Problems two-sided factorization algorithms batched two-sided factorization algorithms Hardware accelerators Singular Value Problems
ISSN:	1877-0509, 1877-0509
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	A challenging class of problems arising in many GPU applications, called batched problems, involves linear algebra operations on many small-sized matrices. We designed batched BLAS (Basic Linear Algebra Subroutines) routines, and in particular the Level-2 BLAS GEMV and the Level-3 BLAS GEMM routines, to solve them. We proposed device functions and big-tile settings in our batched BLAS design. We adopted auto-tuning to optimize different instances of GEMV routines. We illustrated our batched BLAS approach to optimize batched bi-diagonalization progressively on a K40c GPU. The optimization techniques in this paper are applicable to the other two-sided factorizations as well.
ISSN:	1877-0509 1877-0509
DOI:	10.1016/j.procs.2017.05.237