Recursion leads to automatic variable blocking for dense linear-algebra algorithms

We describe some modifications of the LAPACK dense linear-algebra algorithms using recursion. Recursion leads to automatic variable blocking. LAPACK's level-2 versions transform into level-3 codes by using recursion. The new recursive codes are written in FORTRAN 77, which does not support recu...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IBM journal of research and development Ročník 41; číslo 6; s. 737 - 755
Hlavný autor: Gustavson, F. G.
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Armonk, NY International Business Machines 01.11.1997
International Business Machines Corporation
Predmet:
ISSN:0018-8646, 0018-8646, 2151-8556
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:We describe some modifications of the LAPACK dense linear-algebra algorithms using recursion. Recursion leads to automatic variable blocking. LAPACK's level-2 versions transform into level-3 codes by using recursion. The new recursive codes are written in FORTRAN 77, which does not support recursion as a language feature. Gaussian elimination with partial pivoting and Cholesky factorization are considered. Very clear algorithms emerge with the use of recursion. The recursive codes do exactly the same computation as the LAPACK codes, and a single recursive code replaces both the level-2 and level-3 versions of the corresponding LAPACK codes. We present an analysis of the recursive algorithm in terms of both FLOP count and storage usage. The matrix operands are more `squarish' using recursion. The total area of the submatrices used in the recursive algorithm is less than the total area used by the LAPACK level-3 right-/left-looking algorithms. We quantify the difference; we also quantify how the FLOPS are computed. Also, we show that the algorithms exhibit high performance on RISC-type processors. In fact, except for small matrices, the recursive version outperforms the level-3 LAPACK versions of DGETRF and DPOTRF on an RS /6000 super(TM) workstation. For the level-2 versions, the performance gain approaches a factor of 3. We also demonstrate that a change to the LAPACK DLASWP routine can improve the performance of both the recursive version and DGETRF by more than 15 percent.
Bibliografia:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ISSN:0018-8646
0018-8646
2151-8556
DOI:10.1147/rd.416.0737