Reconstructing Householder vectors from Tall-Skinny QR

The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development t...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of parallel and distributed computing Ročník 85; číslo C; s. 3 - 31
Hlavní autoři: Ballard, G., Demmel, J., Grigori, L., Jacquelin, M., Knight, N., Nguyen, H.D.
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States Elsevier Inc 01.11.2015
Elsevier
Edice:IPDPS 2014 Selected Papers on Numerical and Combinatorial Algorithms
Témata:
ISSN:0743-7315, 1096-0848
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. We also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance. •We reconstruct Householder vectors representing the Q-factor from Tall-Skinny QR.•Our approach has the same asymptotic communication efficiency as TSQR.•Additionally, it enables more communication-efficient parallel QR algorithms.•We also provide algorithmic improvements to the Householder QR and CAQR algorithms.
AbstractList The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. We also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance. •We reconstruct Householder vectors representing the Q-factor from Tall-Skinny QR.•Our approach has the same asymptotic communication efficiency as TSQR.•Additionally, it enables more communication-efficient parallel QR algorithms.•We also provide algorithmic improvements to the Householder QR and CAQR algorithms.
The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. Furthermore, we also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance.
The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation.We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. We also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance.
Author Jacquelin, M.
Nguyen, H.D.
Demmel, J.
Knight, N.
Ballard, G.
Grigori, L.
Author_xml – sequence: 1
  givenname: G.
  surname: Ballard
  fullname: Ballard, G.
  email: gmballa@sandia.gov
  organization: Sandia National Laboratories, United States
– sequence: 2
  givenname: J.
  surname: Demmel
  fullname: Demmel, J.
  email: demmel@cs.berkeley.edu
  organization: UC Berkeley, United States
– sequence: 3
  givenname: L.
  surname: Grigori
  fullname: Grigori, L.
  email: laura.grigori@inria.fr
  organization: INRIA Paris - Rocquencourt, France
– sequence: 4
  givenname: M.
  surname: Jacquelin
  fullname: Jacquelin, M.
  email: mathias.jacquelin@lbl.gov
  organization: Lawrence Berkeley National Laboratory, United States
– sequence: 5
  givenname: N.
  surname: Knight
  fullname: Knight, N.
  email: knight@cs.berkeley.edu
  organization: UC Berkeley, United States
– sequence: 6
  givenname: H.D.
  surname: Nguyen
  fullname: Nguyen, H.D.
  email: hdnguyen@cs.berkeley.edu
  organization: UC Berkeley, United States
BackLink https://inria.hal.science/hal-01241785$$DView record in HAL
https://www.osti.gov/servlets/purl/1236219$$D View this record in Osti.gov
BookMark eNp9kEtr3DAUhUVJoJPHH-jKZJUu7FxZtiRDNiG0ncJAyGstNPJ1R1OPNJU0A_n3lXHJoousBLrfOZxzzsiJ8w4J-UKhokD5zbba7ntT1UDbCngFwD6RBYWOlyAbeUIWIBpWCkbbz-Qsxi0Apa2QC8Kf0HgXUziYZN2vYukPETd-7DEURzTJh1gMwe-KFz2O5fNv69xb8fh0QU4HPUa8_Peek9fv317ul-Xq4cfP-7tVaVqQqWyYRInr2rChbjUbmg5a3vGhH2DNtO6MriXVIKDnnDHW1ZBPKPqWDmINtGbn5Gr29TFZFY1NaDY5sMvRVL7zmnYZ-jpDGz2qfbA7Hd6U11Yt71Zq-stODRWyPdLMXs_sPvg_B4xJ7Ww0OI7aYa6uqBASGAchMlrPqAk-xoDDuzcFNa2utmpaXU2rK-Aqr55F8j9RzqyT9S4FbcePpbezFPOgR4th6ovOYG_DVLf39iP5XzF3nVo
CitedBy_id crossref_primary_10_1109_ACCESS_2024_3523280
crossref_primary_10_1007_s11227_020_03176_3
crossref_primary_10_1137_20M1387158
crossref_primary_10_1137_18M1197400
crossref_primary_10_1007_s11075_018_0500_8
crossref_primary_10_1137_24M1658723
Cites_doi 10.1137/0139032
10.1137/0908009
10.1007/978-3-642-23397-5_10
10.1137/0910005
10.1137/090769156
10.1137/080731992
10.1002/cpe.1206
10.1016/j.parco.2013.01.003
10.1016/j.parco.2014.03.010
10.1016/j.future.2006.04.017
10.1177/1094342005051521
10.1145/2427023.2427030
10.1137/0725014
10.1137/0913042
10.1007/s13160-011-0053-x
10.1137/S0895479894276369
ContentType Journal Article
Copyright 2015 Elsevier Inc.
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: 2015 Elsevier Inc.
– notice: Distributed under a Creative Commons Attribution 4.0 International License
CorporateAuthor Sandia National Lab. (SNL-CA), Livermore, CA (United States)
CorporateAuthor_xml – name: Sandia National Lab. (SNL-CA), Livermore, CA (United States)
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
1XC
OIOZB
OTOTI
DOI 10.1016/j.jpdc.2015.06.003
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Hyper Article en Ligne (HAL)
OSTI.GOV - Hybrid
OSTI.GOV
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList


Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1096-0848
EndPage 31
ExternalDocumentID 1236219
oai:HAL:hal-01241785v1
10_1016_j_jpdc_2015_06_003
S074373151500101X
GroupedDBID --K
--M
-~X
.~1
0R~
1B1
1~.
1~5
29L
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABEFU
ABFNM
ABFSI
ABJNI
ABMAC
ABTAH
ABXDB
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADFGL
ADHUB
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CAG
COF
CS3
DM4
DU5
E.L
EBS
EFBJH
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
K-O
KOM
LG5
LG9
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SET
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TN5
TWZ
WUQ
XJT
XOL
XPP
ZMT
ZU3
ZY4
~G-
~G0
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
7SC
8FD
JQ2
L7M
L~C
L~D
1XC
AALMO
ABPIF
EFJIC
OIOZB
OTOTI
ID FETCH-LOGICAL-c508t-438e8eb2c3f25a3f4905696fdf0b3aa9ca281a070d66333920fdfe7d51f7b0123
ISICitedReferencesCount 15
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000362620200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0743-7315
IngestDate Mon Jul 03 03:57:19 EDT 2023
Sat Oct 25 11:13:56 EDT 2025
Sun Sep 28 14:29:56 EDT 2025
Sat Nov 29 07:14:51 EST 2025
Tue Nov 18 21:55:57 EST 2025
Fri Feb 23 02:31:21 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue C
Keywords Communication-avoiding algorithms
QR decomposition
Dense linear algebra
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c508t-438e8eb2c3f25a3f4905696fdf0b3aa9ca281a070d66333920fdfe7d51f7b0123
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
AC04-94AL85000; AC02-05CH11231; SC0008700; SC0010200
USDOE National Nuclear Security Administration (NNSA)
SAND-2015-1977J
ORCID 0000-0002-5880-1076
OpenAccessLink https://www.osti.gov/servlets/purl/1236219
PQID 1778036077
PQPubID 23500
PageCount 29
ParticipantIDs osti_scitechconnect_1236219
hal_primary_oai_HAL_hal_01241785v1
proquest_miscellaneous_1778036077
crossref_primary_10_1016_j_jpdc_2015_06_003
crossref_citationtrail_10_1016_j_jpdc_2015_06_003
elsevier_sciencedirect_doi_10_1016_j_jpdc_2015_06_003
PublicationCentury 2000
PublicationDate 2015-11-01
PublicationDateYYYYMMDD 2015-11-01
PublicationDate_xml – month: 11
  year: 2015
  text: 2015-11-01
  day: 01
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationSeriesTitle IPDPS 2014 Selected Papers on Numerical and Combinatorial Algorithms
PublicationTitle Journal of parallel and distributed computing
PublicationYear 2015
Publisher Elsevier Inc
Elsevier
Publisher_xml – name: Elsevier Inc
– name: Elsevier
References Golub, Van~Loan (br000125) 2012
E. Solomonik, J. Demmel, Communication-optimal 2.5D matrix multiplication and LU factorization algorithms, in: Springer Lecture Notes in Computer Science, Proceedings of Euro-Par, Bordeaux, France, 2011, pp. 90–109.
Demmel, Grigori, Hoemmen, Langou (br000065) 2012; 34
Auckenthaler, Huckle, Wittmann (br000020) 2014; 40
Song, Ltaief, Hadri, Dongarra (br000160) 2010
Demmel, Grigori, Hoemmen, Langou (br000070) 2008
Edison, NERSC’s Cray XC30 System
Tiskin (br000175) 2007; 23
Dongarra, Faverge, HéRault, Jacquelin, Langou, Robert (br000075) 2013; 39
Mori, Yamamoto, Zhang (br000140) 2012; 29
Anderson, Bai, Bischof, Demmel, Dongarra, Croz, Greenbaum, Hammarling, McKenney, Ostrouchov, Sorensen (br000010) 1992
C. Bischof, C. Van Loan, The WY representation for products of Householder matrices, SIAM J. Sci. Stat. Comput. 8 (1).
Träff, Ripke (br000180) 2005; vol. 3726
G. Ballard, J. Demmel, O. Holtz, O. Schwartz, Minimizing Communication in Linear Algebra, SIAM J. Matrix Anal. Appl. 32 (3).
Mohiyuddin, Hoemmen, Demmel, Yelick (br000135) 2009
Farley (br000085) 1980; 39
Fukaya, Nakatsukasa, Yanagisawa, Yamamoto (br000090) 2014
Agullo, Coti, Dongarra, Herault, Langou (br000005) 2010
Ballard, Demmel, Grigori, Jacquelin, Nguyen, Solomonik (br000030) 2013
Hopper, NERSC’s Cray XE6 System
Chan, Heimlich, Purkayastha, van~de Geijn (br000050) 2007; 19
Blackford, Choi, Cleary, D’Azevedo, Demmel, Dhillon, Dongarra, Hammarling, Henry, Petitet, Stanley, Walker, Whaley (br000045) 1997
Y. Yamamoto, Personal communication, 2012.
Schreiber, Van~Loan (br000130) 1989; 10
Hoemmen (br000110) 2011
Ballard, Demmel, Grigori, Jacquelin, Nguyen, Solomonik (br000025) 2014
Anderson, Ballard, Demmel, Keutzer (br000015) 2011
Bischof, Sun (br000040) 1994
Higham (br000105) 2002
Puglisi (br000145) 1992; 13
Sun, Bischof (br000165) 1995; 16
Schreiber, Parlett (br000150) 1988; 25
Yamamoto (br000185) 2012
Poulson, Marker, van~de Geijn, Hammond, Romero (br000055) 2013; 39
Yamamoto, Nakatsukasa, Yanagisawa, Fukaya (br000195) 2014
(br000095) 2011
Demmel, Grigori, Gu, Xiang (br000060) 2013
2014.
Golub, Plemmons, Sameh (br000100) 1988
Thakur, Rabenseifner, Gropp (br000170) 2005; 19
Puglisi (10.1016/j.jpdc.2015.06.003_br000145) 1992; 13
Ballard (10.1016/j.jpdc.2015.06.003_br000025) 2014
Schreiber (10.1016/j.jpdc.2015.06.003_br000130) 1989; 10
Demmel (10.1016/j.jpdc.2015.06.003_br000065) 2012; 34
10.1016/j.jpdc.2015.06.003_br000120
Tiskin (10.1016/j.jpdc.2015.06.003_br000175) 2007; 23
Anderson (10.1016/j.jpdc.2015.06.003_br000015) 2011
Mori (10.1016/j.jpdc.2015.06.003_br000140) 2012; 29
Thakur (10.1016/j.jpdc.2015.06.003_br000170) 2005; 19
Träff (10.1016/j.jpdc.2015.06.003_br000180) 2005; vol. 3726
Bischof (10.1016/j.jpdc.2015.06.003_br000040) 1994
Fukaya (10.1016/j.jpdc.2015.06.003_br000090) 2014
Schreiber (10.1016/j.jpdc.2015.06.003_br000150) 1988; 25
Blackford (10.1016/j.jpdc.2015.06.003_br000045) 1997
(10.1016/j.jpdc.2015.06.003_br000095) 2011
Mohiyuddin (10.1016/j.jpdc.2015.06.003_br000135) 2009
Dongarra (10.1016/j.jpdc.2015.06.003_br000075) 2013; 39
Demmel (10.1016/j.jpdc.2015.06.003_br000060) 2013
10.1016/j.jpdc.2015.06.003_br000190
Agullo (10.1016/j.jpdc.2015.06.003_br000005) 2010
10.1016/j.jpdc.2015.06.003_br000115
Farley (10.1016/j.jpdc.2015.06.003_br000085) 1980; 39
Hoemmen (10.1016/j.jpdc.2015.06.003_br000110) 2011
10.1016/j.jpdc.2015.06.003_br000035
10.1016/j.jpdc.2015.06.003_br000155
Auckenthaler (10.1016/j.jpdc.2015.06.003_br000020) 2014; 40
Anderson (10.1016/j.jpdc.2015.06.003_br000010) 1992
Poulson (10.1016/j.jpdc.2015.06.003_br000055) 2013; 39
Higham (10.1016/j.jpdc.2015.06.003_br000105) 2002
Sun (10.1016/j.jpdc.2015.06.003_br000165) 1995; 16
Golub (10.1016/j.jpdc.2015.06.003_br000125) 2012
Yamamoto (10.1016/j.jpdc.2015.06.003_br000185) 2012
Yamamoto (10.1016/j.jpdc.2015.06.003_br000195) 2014
Ballard (10.1016/j.jpdc.2015.06.003_br000030) 2013
Chan (10.1016/j.jpdc.2015.06.003_br000050) 2007; 19
10.1016/j.jpdc.2015.06.003_br000080
Golub (10.1016/j.jpdc.2015.06.003_br000100) 1988
Demmel (10.1016/j.jpdc.2015.06.003_br000070) 2008
Song (10.1016/j.jpdc.2015.06.003_br000160) 2010
References_xml – volume: 34
  start-page: A206
  year: 2012
  end-page: A239
  ident: br000065
  article-title: Communication-optimal parallel and sequential QR and LU factorizations
  publication-title: SIAM J. Sci. Comput.
– year: 2014
  ident: br000195
  article-title: Roundoff Error Analysis of the CholeskyQR2 Algorithm, Tech. Rep. 43
– volume: 39
  start-page: 212
  year: 2013
  end-page: 232
  ident: br000075
  article-title: Hierarchical QR factorization algorithms for multi-core clusters
  publication-title: Parallel Comput.
– volume: 19
  start-page: 1749
  year: 2007
  end-page: 1783
  ident: br000050
  article-title: Collective communication: theory, practice, and experience
  publication-title: Concurrency Comput. Pract. Exp.
– reference: C. Bischof, C. Van Loan, The WY representation for products of Householder matrices, SIAM J. Sci. Stat. Comput. 8 (1).
– volume: 19
  start-page: 49
  year: 2005
  end-page: 66
  ident: br000170
  article-title: Optimization of collective communication operations in MPICH
  publication-title: Int. J. High Perform. Comput. Appl.
– year: 2013
  ident: br000030
  article-title: Reconstructing householder vectors from Tall-Skinny QR, Tech. Rep.
– reference: , 2014.
– volume: 39
  start-page: 385
  year: 1980
  end-page: 390
  ident: br000085
  article-title: Broadcast time in communication networks
  publication-title: SIAM J. Appl. Math.
– volume: 25
  start-page: 189
  year: 1988
  end-page: 205
  ident: br000150
  article-title: Block reflectors: Theory and computation
  publication-title: SIAM J. Numer. Anal.
– year: 2012
  ident: br000125
  publication-title: Matrix Computations
– volume: vol. 3726
  start-page: 45
  year: 2005
  end-page: 56
  ident: br000180
  article-title: Optimal broadcast for fully connected networks
  publication-title: High Performance Computing and Communications
– volume: 10
  start-page: 53
  year: 1989
  end-page: 57
  ident: br000130
  article-title: A storage-efficient WY representation for products of householder transformations
  publication-title: SIAM J. Sci. Stat. Comput.
– reference: Edison, NERSC’s Cray XC30 System,
– year: 1992
  ident: br000010
  article-title: LAPACK Users’ Guide
– start-page: 1
  year: 2010
  end-page: 11
  ident: br000160
  article-title: Scalable tile communication-avoiding QR factorization on multicore cluster systems
  publication-title: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis SC’10
– volume: 23
  start-page: 179
  year: 2007
  end-page: 188
  ident: br000175
  article-title: Communication-efficient parallel generic pairwise elimination
  publication-title: Future Gener. Comput. Syst.
– start-page: 48
  year: 2011
  end-page: 58
  ident: br000015
  article-title: Communication-avoiding QR decomposition for GPUs
  publication-title: Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium IPDPS’11
– volume: 16
  start-page: 1184
  year: 1995
  end-page: 1196
  ident: br000165
  article-title: A basis-kernel representation of orthogonal matrices
  publication-title: SIAM J. Matrix Anal. Appl.
– start-page: 31
  year: 2014
  end-page: 38
  ident: br000090
  article-title: CholeskyQR2: A simple and communication-avoiding algorithm for computing a Tall-skinny QR factorization on a large-scale parallel system
  publication-title: Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems ScalA’14
– year: 2012
  ident: br000185
  article-title: Aggregation of the compact WY representations generated by the TSQR algorithm
  publication-title: Conference Talk Presented at SIAM Applied Linear Algebra
– reference: G. Ballard, J. Demmel, O. Holtz, O. Schwartz, Minimizing Communication in Linear Algebra, SIAM J. Matrix Anal. Appl. 32 (3).
– volume: 29
  start-page: 111
  year: 2012
  end-page: 130
  ident: br000140
  article-title: Backward error analysis of the AllReduce algorithm for Householder QR decomposition
  publication-title: Japan J. Ind. Appl. Math.
– year: 2013
  ident: br000060
  article-title: Communication Avoiding Rank Revealing QR Factorization with Column Pivoting, Tech. Rep. UCB/EECS-2013-46
– year: 1994
  ident: br000040
  article-title: On Orthogonal Block Elimination, Tech. Rep. MCS-P450-0794
– volume: 39
  start-page: 13:1
  year: 2013
  end-page: 13:24
  ident: br000055
  article-title: Elemental: A new framework for distributed memory dense matrix computations
  publication-title: ACM Trans. Math. Softw.
– reference: Hopper, NERSC’s Cray XE6 System,
– year: 1997
  ident: br000045
  article-title: ScaLAPACK Users’ Guide
– year: 2008
  ident: br000070
  article-title: Communication-Optimal Parallel and Sequential QR and LU Factorizations, Tech. Rep. UCB/EECS-2008-89
– start-page: 1
  year: 2010
  end-page: 11
  ident: br000005
  article-title: QR factorization of tall and skinny matrices in a grid computing environment
  publication-title: Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on
– start-page: 1159
  year: 2014
  end-page: 1170
  ident: br000025
  article-title: Reconstructing Householder Vectors from Tall-Skinny QR
  publication-title: IEEE 28th International Parallel and Distributed Processing Symposium
– year: 2011
  ident: br000095
  publication-title: The Future of Computing Performance: Game Over or Next Level?
– start-page: 966
  year: 2011
  end-page: 977
  ident: br000110
  article-title: A Communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method
  publication-title: Parallel Distributed Processing Symposium, IPDPS, 2011 IEEE International
– start-page: 171
  year: 1988
  end-page: 179
  ident: br000100
  article-title: Parallel block schemes for large-scale least-squares computations
  publication-title: High-speed Computing: Scientific Applications and Algorithm Design
– reference: Y. Yamamoto, Personal communication, 2012.
– year: 2002
  ident: br000105
  article-title: Accuracy and Stability of Numerical Algorithms
– volume: 13
  start-page: 723
  year: 1992
  end-page: 726
  ident: br000145
  article-title: Modification of the householder method based on compact WY representation
  publication-title: SIAM J. Sci. Stat. Comput.
– start-page: 36:1
  year: 2009
  end-page: 36:12
  ident: br000135
  article-title: Minimizing communication in sparse matrix solvers
  publication-title: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis SC’09
– reference: E. Solomonik, J. Demmel, Communication-optimal 2.5D matrix multiplication and LU factorization algorithms, in: Springer Lecture Notes in Computer Science, Proceedings of Euro-Par, Bordeaux, France, 2011, pp. 90–109.
– volume: 40
  start-page: 186
  year: 2014
  end-page: 194
  ident: br000020
  article-title: A blocked QR-decomposition for the parallel symmetric eigenvalue problem
  publication-title: Parallel Comput.
– volume: 39
  start-page: 385
  issue: 2
  year: 1980
  ident: 10.1016/j.jpdc.2015.06.003_br000085
  article-title: Broadcast time in communication networks
  publication-title: SIAM J. Appl. Math.
  doi: 10.1137/0139032
– year: 1997
  ident: 10.1016/j.jpdc.2015.06.003_br000045
– start-page: 171
  year: 1988
  ident: 10.1016/j.jpdc.2015.06.003_br000100
  article-title: Parallel block schemes for large-scale least-squares computations
– ident: 10.1016/j.jpdc.2015.06.003_br000120
  doi: 10.1137/0908009
– year: 2013
  ident: 10.1016/j.jpdc.2015.06.003_br000030
– year: 2013
  ident: 10.1016/j.jpdc.2015.06.003_br000060
– year: 1992
  ident: 10.1016/j.jpdc.2015.06.003_br000010
– year: 2012
  ident: 10.1016/j.jpdc.2015.06.003_br000185
  article-title: Aggregation of the compact WY representations generated by the TSQR algorithm
– volume: vol. 3726
  start-page: 45
  year: 2005
  ident: 10.1016/j.jpdc.2015.06.003_br000180
  article-title: Optimal broadcast for fully connected networks
– ident: 10.1016/j.jpdc.2015.06.003_br000155
  doi: 10.1007/978-3-642-23397-5_10
– year: 2012
  ident: 10.1016/j.jpdc.2015.06.003_br000125
– ident: 10.1016/j.jpdc.2015.06.003_br000190
– volume: 10
  start-page: 53
  issue: 1
  year: 1989
  ident: 10.1016/j.jpdc.2015.06.003_br000130
  article-title: A storage-efficient WY representation for products of householder transformations
  publication-title: SIAM J. Sci. Stat. Comput.
  doi: 10.1137/0910005
– ident: 10.1016/j.jpdc.2015.06.003_br000035
  doi: 10.1137/090769156
– year: 2011
  ident: 10.1016/j.jpdc.2015.06.003_br000095
– volume: 34
  start-page: A206
  issue: 1
  year: 2012
  ident: 10.1016/j.jpdc.2015.06.003_br000065
  article-title: Communication-optimal parallel and sequential QR and LU factorizations
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/080731992
– start-page: 1
  year: 2010
  ident: 10.1016/j.jpdc.2015.06.003_br000005
  article-title: QR factorization of tall and skinny matrices in a grid computing environment
– start-page: 36:1
  year: 2009
  ident: 10.1016/j.jpdc.2015.06.003_br000135
  article-title: Minimizing communication in sparse matrix solvers
– start-page: 1
  year: 2010
  ident: 10.1016/j.jpdc.2015.06.003_br000160
  article-title: Scalable tile communication-avoiding QR factorization on multicore cluster systems
– start-page: 31
  year: 2014
  ident: 10.1016/j.jpdc.2015.06.003_br000090
  article-title: CholeskyQR2: A simple and communication-avoiding algorithm for computing a Tall-skinny QR factorization on a large-scale parallel system
– volume: 19
  start-page: 1749
  issue: 13
  year: 2007
  ident: 10.1016/j.jpdc.2015.06.003_br000050
  article-title: Collective communication: theory, practice, and experience
  publication-title: Concurrency Comput. Pract. Exp.
  doi: 10.1002/cpe.1206
– volume: 39
  start-page: 212
  issue: 4–5
  year: 2013
  ident: 10.1016/j.jpdc.2015.06.003_br000075
  article-title: Hierarchical QR factorization algorithms for multi-core clusters
  publication-title: Parallel Comput.
  doi: 10.1016/j.parco.2013.01.003
– start-page: 48
  year: 2011
  ident: 10.1016/j.jpdc.2015.06.003_br000015
  article-title: Communication-avoiding QR decomposition for GPUs
– volume: 40
  start-page: 186
  issue: 7
  year: 2014
  ident: 10.1016/j.jpdc.2015.06.003_br000020
  article-title: A blocked QR-decomposition for the parallel symmetric eigenvalue problem
  publication-title: Parallel Comput.
  doi: 10.1016/j.parco.2014.03.010
– start-page: 1159
  year: 2014
  ident: 10.1016/j.jpdc.2015.06.003_br000025
  article-title: Reconstructing Householder Vectors from Tall-Skinny QR
– year: 2008
  ident: 10.1016/j.jpdc.2015.06.003_br000070
– volume: 23
  start-page: 179
  issue: 2
  year: 2007
  ident: 10.1016/j.jpdc.2015.06.003_br000175
  article-title: Communication-efficient parallel generic pairwise elimination
  publication-title: Future Gener. Comput. Syst.
  doi: 10.1016/j.future.2006.04.017
– year: 2002
  ident: 10.1016/j.jpdc.2015.06.003_br000105
– volume: 19
  start-page: 49
  issue: 1
  year: 2005
  ident: 10.1016/j.jpdc.2015.06.003_br000170
  article-title: Optimization of collective communication operations in MPICH
  publication-title: Int. J. High Perform. Comput. Appl.
  doi: 10.1177/1094342005051521
– volume: 39
  start-page: 13:1
  issue: 2
  year: 2013
  ident: 10.1016/j.jpdc.2015.06.003_br000055
  article-title: Elemental: A new framework for distributed memory dense matrix computations
  publication-title: ACM Trans. Math. Softw.
  doi: 10.1145/2427023.2427030
– volume: 25
  start-page: 189
  issue: 1
  year: 1988
  ident: 10.1016/j.jpdc.2015.06.003_br000150
  article-title: Block reflectors: Theory and computation
  publication-title: SIAM J. Numer. Anal.
  doi: 10.1137/0725014
– ident: 10.1016/j.jpdc.2015.06.003_br000115
– volume: 13
  start-page: 723
  issue: 3
  year: 1992
  ident: 10.1016/j.jpdc.2015.06.003_br000145
  article-title: Modification of the householder method based on compact WY representation
  publication-title: SIAM J. Sci. Stat. Comput.
  doi: 10.1137/0913042
– year: 1994
  ident: 10.1016/j.jpdc.2015.06.003_br000040
– ident: 10.1016/j.jpdc.2015.06.003_br000080
– volume: 29
  start-page: 111
  issue: 1
  year: 2012
  ident: 10.1016/j.jpdc.2015.06.003_br000140
  article-title: Backward error analysis of the AllReduce algorithm for Householder QR decomposition
  publication-title: Japan J. Ind. Appl. Math.
  doi: 10.1007/s13160-011-0053-x
– volume: 16
  start-page: 1184
  issue: 4
  year: 1995
  ident: 10.1016/j.jpdc.2015.06.003_br000165
  article-title: A basis-kernel representation of orthogonal matrices
  publication-title: SIAM J. Matrix Anal. Appl.
  doi: 10.1137/S0895479894276369
– year: 2014
  ident: 10.1016/j.jpdc.2015.06.003_br000195
– start-page: 966
  year: 2011
  ident: 10.1016/j.jpdc.2015.06.003_br000110
  article-title: A Communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method
SSID ssj0011578
Score 2.2125397
Snippet The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more...
SourceID osti
hal
proquest
crossref
elsevier
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 3
SubjectTerms Algorithms
Asymptotic properties
Communication-avoiding algorithms
Computer Science
Dense linear algebra
Distributed, Parallel, and Cluster Computing
Mathematical analysis
Mathematical models
MATHEMATICS AND COMPUTING
Numerical stability
QR decomposition
Reconstruction
Representations
Vectors (mathematics)
Title Reconstructing Householder vectors from Tall-Skinny QR
URI https://dx.doi.org/10.1016/j.jpdc.2015.06.003
https://www.proquest.com/docview/1778036077
https://inria.hal.science/hal-01241785
https://www.osti.gov/servlets/purl/1236219
Volume 85
WOSCitedRecordID wos000362620200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1096-0848
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0011578
  issn: 0743-7315
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9owELcY3cNe9j2NdZuyaW8oKJ_YftgDGu2g6xBtmcab5ThOKQoBUYq6f3V_ze7iJMCqVt3DXiyUkNj4jruf7d_dEfJJtnk7Cnxle9yjdqCUY_OAxnbAIyYV8_1Y5oHCx3QwYOMxH9Zqv8tYmHVKs4xdX_PFfxU1XANhY-jsP4i7eilcgM8gdGhB7NDeS_C4oCzTwmbnzR4s7TWeMellc61NcZ08pmQk09TOa2_9ap6c3gJSMTN4mmqTTyDGJLtYH0vnoXCLq1Xp9_KtUFAow5P_utTVRn1Xz2aGCJDzcZs_WxvGz8X5fFkFZ1fu4Ugq8FVFHfnvSI4sCEw4e_ryc3_YHZ41AVEEYOjw0AFGM5QLDEQGTR5cmSMoM2KwdrDyx30FPBjopNjhajLb2etwwyLor9qAK9DClo3EDKvUNxGhLW1suIO8amYSeJZG3tQFuuEvzNbFtDVdxJjP0g3zZK6Ov_GOJSOg1zkTw-6hOO4Pvu3e3WI09jrH0E5kCqMGaERZuIaV-Z5HQw4Wd6_TPxgfVWdcbmhwQvkTipAuwz78e0C3waYHE-Tv1ufgD26giRwijZ6Sx4XaWB2jk89ITWfPyZOybohVuJEXpL2rotaWilqFilqootaWilonpy_Jj8OD0ZeeXdTvsBXA_pUd-EwzHXnKT7xQ-knAAW3zdhInTuRLyZX0mCvB58QAe30A6g7c0jQO3YRGiPVfkXo2z_RrYnGWJBELE4fHNNAqidxAcVczxamnIxo3iFtOj1BFcnussZKKksU4FTilAqdU5FROv0Ga1TMLk9rlzm-H5ayLApwa0ClAl-587iOIqOoAs7mDlgi8ttGRBtlHCeKLMXWzQo4bvBkzIwGiaJAPpWAFGH880ZOZBtEIl1IGENSh9M19utknjzb_qrekDnLW78hDtV5dXC7fF_r5ByLb0SE
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Reconstructing+Householder+vectors+from+Tall-Skinny+QR&rft.jtitle=Journal+of+parallel+and+distributed+computing&rft.au=Ballard%2C+Grey&rft.au=Demmel%2C+James+W.&rft.au=Grigori%2C+Laura&rft.au=Jacquelin%2C+Mathias&rft.series=IPDPS+2014+Selected+Papers+on+Numerical+and+Combinatorial+Algorithms&rft.date=2015-11-01&rft.pub=Elsevier&rft.issn=0743-7315&rft.eissn=1096-0848&rft.volume=85&rft_id=info:doi/10.1016%2Fj.jpdc.2015.06.003&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai%3AHAL%3Ahal-01241785v1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0743-7315&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0743-7315&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0743-7315&client=summon