Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices
A challenging class of problems arising in many GPU applications, called batched problems, involves linear algebra operations on many small-sized matrices. We designed batched BLAS (Basic Linear Algebra Subroutines) routines, and in particular the Level-2 BLAS GEMV and the Level-3 BLAS GEMM routines...
Saved in:
| Published in: | Procedia computer science Vol. 108; pp. 1008 - 1018 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
2017
|
| Subjects: | |
| ISSN: | 1877-0509, 1877-0509 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | A challenging class of problems arising in many GPU applications, called batched problems, involves linear algebra operations on many small-sized matrices. We designed batched BLAS (Basic Linear Algebra Subroutines) routines, and in particular the Level-2 BLAS GEMV and the Level-3 BLAS GEMM routines, to solve them. We proposed device functions and big-tile settings in our batched BLAS design. We adopted auto-tuning to optimize different instances of GEMV routines. We illustrated our batched BLAS approach to optimize batched bi-diagonalization progressively on a K40c GPU. The optimization techniques in this paper are applicable to the other two-sided factorizations as well. |
|---|---|
| ISSN: | 1877-0509 1877-0509 |
| DOI: | 10.1016/j.procs.2017.05.237 |