Reconstructing Householder vectors from Tall-Skinny QR

The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development t...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Journal of parallel and distributed computing Ročník 85; číslo C; s. 3 - 31
Hlavní autori: Ballard, G., Demmel, J., Grigori, L., Jacquelin, M., Knight, N., Nguyen, H.D.
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States Elsevier Inc 01.11.2015
Elsevier
Edícia:IPDPS 2014 Selected Papers on Numerical and Combinatorial Algorithms
Predmet:
ISSN:0743-7315, 1096-0848
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. We also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance. •We reconstruct Householder vectors representing the Q-factor from Tall-Skinny QR.•Our approach has the same asymptotic communication efficiency as TSQR.•Additionally, it enables more communication-efficient parallel QR algorithms.•We also provide algorithmic improvements to the Householder QR and CAQR algorithms.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
AC04-94AL85000; AC02-05CH11231; SC0008700; SC0010200
USDOE National Nuclear Security Administration (NNSA)
SAND-2015-1977J
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2015.06.003