Reconstructing Householder vectors from Tall-Skinny QR
The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development t...
Uloženo v:
| Vydáno v: | Journal of parallel and distributed computing Ročník 85; číslo C; s. 3 - 31 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
United States
Elsevier Inc
01.11.2015
Elsevier |
| Edice: | IPDPS 2014 Selected Papers on Numerical and Combinatorial Algorithms |
| Témata: | |
| ISSN: | 0743-7315, 1096-0848 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation.
We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. We also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance.
•We reconstruct Householder vectors representing the Q-factor from Tall-Skinny QR.•Our approach has the same asymptotic communication efficiency as TSQR.•Additionally, it enables more communication-efficient parallel QR algorithms.•We also provide algorithmic improvements to the Householder QR and CAQR algorithms. |
|---|---|
| AbstractList | The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation.
We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. We also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance.
•We reconstruct Householder vectors representing the Q-factor from Tall-Skinny QR.•Our approach has the same asymptotic communication efficiency as TSQR.•Additionally, it enables more communication-efficient parallel QR algorithms.•We also provide algorithmic improvements to the Householder QR and CAQR algorithms. The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. Furthermore, we also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance. The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation.We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. We also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance. |
| Author | Jacquelin, M. Nguyen, H.D. Demmel, J. Knight, N. Ballard, G. Grigori, L. |
| Author_xml | – sequence: 1 givenname: G. surname: Ballard fullname: Ballard, G. email: gmballa@sandia.gov organization: Sandia National Laboratories, United States – sequence: 2 givenname: J. surname: Demmel fullname: Demmel, J. email: demmel@cs.berkeley.edu organization: UC Berkeley, United States – sequence: 3 givenname: L. surname: Grigori fullname: Grigori, L. email: laura.grigori@inria.fr organization: INRIA Paris - Rocquencourt, France – sequence: 4 givenname: M. surname: Jacquelin fullname: Jacquelin, M. email: mathias.jacquelin@lbl.gov organization: Lawrence Berkeley National Laboratory, United States – sequence: 5 givenname: N. surname: Knight fullname: Knight, N. email: knight@cs.berkeley.edu organization: UC Berkeley, United States – sequence: 6 givenname: H.D. surname: Nguyen fullname: Nguyen, H.D. email: hdnguyen@cs.berkeley.edu organization: UC Berkeley, United States |
| BackLink | https://inria.hal.science/hal-01241785$$DView record in HAL https://www.osti.gov/servlets/purl/1236219$$D View this record in Osti.gov |
| BookMark | eNp9kEtr3DAUhUVJoJPHH-jKZJUu7FxZtiRDNiG0ncJAyGstNPJ1R1OPNJU0A_n3lXHJoousBLrfOZxzzsiJ8w4J-UKhokD5zbba7ntT1UDbCngFwD6RBYWOlyAbeUIWIBpWCkbbz-Qsxi0Apa2QC8Kf0HgXUziYZN2vYukPETd-7DEURzTJh1gMwe-KFz2O5fNv69xb8fh0QU4HPUa8_Peek9fv317ul-Xq4cfP-7tVaVqQqWyYRInr2rChbjUbmg5a3vGhH2DNtO6MriXVIKDnnDHW1ZBPKPqWDmINtGbn5Gr29TFZFY1NaDY5sMvRVL7zmnYZ-jpDGz2qfbA7Hd6U11Yt71Zq-stODRWyPdLMXs_sPvg_B4xJ7Ww0OI7aYa6uqBASGAchMlrPqAk-xoDDuzcFNa2utmpaXU2rK-Aqr55F8j9RzqyT9S4FbcePpbezFPOgR4th6ovOYG_DVLf39iP5XzF3nVo |
| CitedBy_id | crossref_primary_10_1109_ACCESS_2024_3523280 crossref_primary_10_1007_s11227_020_03176_3 crossref_primary_10_1137_20M1387158 crossref_primary_10_1137_18M1197400 crossref_primary_10_1007_s11075_018_0500_8 crossref_primary_10_1137_24M1658723 |
| Cites_doi | 10.1137/0139032 10.1137/0908009 10.1007/978-3-642-23397-5_10 10.1137/0910005 10.1137/090769156 10.1137/080731992 10.1002/cpe.1206 10.1016/j.parco.2013.01.003 10.1016/j.parco.2014.03.010 10.1016/j.future.2006.04.017 10.1177/1094342005051521 10.1145/2427023.2427030 10.1137/0725014 10.1137/0913042 10.1007/s13160-011-0053-x 10.1137/S0895479894276369 |
| ContentType | Journal Article |
| Copyright | 2015 Elsevier Inc. Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: 2015 Elsevier Inc. – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| CorporateAuthor | Sandia National Lab. (SNL-CA), Livermore, CA (United States) |
| CorporateAuthor_xml | – name: Sandia National Lab. (SNL-CA), Livermore, CA (United States) |
| DBID | AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D 1XC OIOZB OTOTI |
| DOI | 10.1016/j.jpdc.2015.06.003 |
| DatabaseName | CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Hyper Article en Ligne (HAL) OSTI.GOV - Hybrid OSTI.GOV |
| DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Computer and Information Systems Abstracts |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1096-0848 |
| EndPage | 31 |
| ExternalDocumentID | 1236219 oai:HAL:hal-01241785v1 10_1016_j_jpdc_2015_06_003 S074373151500101X |
| GroupedDBID | --K --M -~X .~1 0R~ 1B1 1~. 1~5 29L 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABEFU ABFNM ABFSI ABJNI ABMAC ABTAH ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADFGL ADHUB ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CAG COF CS3 DM4 DU5 E.L EBS EFBJH EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA K-O KOM LG5 LG9 LY7 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SES SET SEW SPC SPCBC SST SSV SSZ T5K TN5 TWZ WUQ XJT XOL XPP ZMT ZU3 ZY4 ~G- ~G0 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO ADVLN AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD 7SC 8FD JQ2 L7M L~C L~D 1XC AALMO ABPIF EFJIC OIOZB OTOTI |
| ID | FETCH-LOGICAL-c508t-438e8eb2c3f25a3f4905696fdf0b3aa9ca281a070d66333920fdfe7d51f7b0123 |
| ISICitedReferencesCount | 15 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000362620200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0743-7315 |
| IngestDate | Mon Jul 03 03:57:19 EDT 2023 Sat Oct 25 11:13:56 EDT 2025 Sun Sep 28 14:29:56 EDT 2025 Sat Nov 29 07:14:51 EST 2025 Tue Nov 18 21:55:57 EST 2025 Fri Feb 23 02:31:21 EST 2024 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | C |
| Keywords | Communication-avoiding algorithms QR decomposition Dense linear algebra |
| Language | English |
| License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c508t-438e8eb2c3f25a3f4905696fdf0b3aa9ca281a070d66333920fdfe7d51f7b0123 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 AC04-94AL85000; AC02-05CH11231; SC0008700; SC0010200 USDOE National Nuclear Security Administration (NNSA) SAND-2015-1977J |
| ORCID | 0000-0002-5880-1076 |
| OpenAccessLink | https://www.osti.gov/servlets/purl/1236219 |
| PQID | 1778036077 |
| PQPubID | 23500 |
| PageCount | 29 |
| ParticipantIDs | osti_scitechconnect_1236219 hal_primary_oai_HAL_hal_01241785v1 proquest_miscellaneous_1778036077 crossref_primary_10_1016_j_jpdc_2015_06_003 crossref_citationtrail_10_1016_j_jpdc_2015_06_003 elsevier_sciencedirect_doi_10_1016_j_jpdc_2015_06_003 |
| PublicationCentury | 2000 |
| PublicationDate | 2015-11-01 |
| PublicationDateYYYYMMDD | 2015-11-01 |
| PublicationDate_xml | – month: 11 year: 2015 text: 2015-11-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationSeriesTitle | IPDPS 2014 Selected Papers on Numerical and Combinatorial Algorithms |
| PublicationTitle | Journal of parallel and distributed computing |
| PublicationYear | 2015 |
| Publisher | Elsevier Inc Elsevier |
| Publisher_xml | – name: Elsevier Inc – name: Elsevier |
| References | Golub, Van~Loan (br000125) 2012 E. Solomonik, J. Demmel, Communication-optimal 2.5D matrix multiplication and LU factorization algorithms, in: Springer Lecture Notes in Computer Science, Proceedings of Euro-Par, Bordeaux, France, 2011, pp. 90–109. Demmel, Grigori, Hoemmen, Langou (br000065) 2012; 34 Auckenthaler, Huckle, Wittmann (br000020) 2014; 40 Song, Ltaief, Hadri, Dongarra (br000160) 2010 Demmel, Grigori, Hoemmen, Langou (br000070) 2008 Edison, NERSC’s Cray XC30 System Tiskin (br000175) 2007; 23 Dongarra, Faverge, HéRault, Jacquelin, Langou, Robert (br000075) 2013; 39 Mori, Yamamoto, Zhang (br000140) 2012; 29 Anderson, Bai, Bischof, Demmel, Dongarra, Croz, Greenbaum, Hammarling, McKenney, Ostrouchov, Sorensen (br000010) 1992 C. Bischof, C. Van Loan, The WY representation for products of Householder matrices, SIAM J. Sci. Stat. Comput. 8 (1). Träff, Ripke (br000180) 2005; vol. 3726 G. Ballard, J. Demmel, O. Holtz, O. Schwartz, Minimizing Communication in Linear Algebra, SIAM J. Matrix Anal. Appl. 32 (3). Mohiyuddin, Hoemmen, Demmel, Yelick (br000135) 2009 Farley (br000085) 1980; 39 Fukaya, Nakatsukasa, Yanagisawa, Yamamoto (br000090) 2014 Agullo, Coti, Dongarra, Herault, Langou (br000005) 2010 Ballard, Demmel, Grigori, Jacquelin, Nguyen, Solomonik (br000030) 2013 Hopper, NERSC’s Cray XE6 System Chan, Heimlich, Purkayastha, van~de Geijn (br000050) 2007; 19 Blackford, Choi, Cleary, D’Azevedo, Demmel, Dhillon, Dongarra, Hammarling, Henry, Petitet, Stanley, Walker, Whaley (br000045) 1997 Y. Yamamoto, Personal communication, 2012. Schreiber, Van~Loan (br000130) 1989; 10 Hoemmen (br000110) 2011 Ballard, Demmel, Grigori, Jacquelin, Nguyen, Solomonik (br000025) 2014 Anderson, Ballard, Demmel, Keutzer (br000015) 2011 Bischof, Sun (br000040) 1994 Higham (br000105) 2002 Puglisi (br000145) 1992; 13 Sun, Bischof (br000165) 1995; 16 Schreiber, Parlett (br000150) 1988; 25 Yamamoto (br000185) 2012 Poulson, Marker, van~de Geijn, Hammond, Romero (br000055) 2013; 39 Yamamoto, Nakatsukasa, Yanagisawa, Fukaya (br000195) 2014 (br000095) 2011 Demmel, Grigori, Gu, Xiang (br000060) 2013 2014. Golub, Plemmons, Sameh (br000100) 1988 Thakur, Rabenseifner, Gropp (br000170) 2005; 19 Puglisi (10.1016/j.jpdc.2015.06.003_br000145) 1992; 13 Ballard (10.1016/j.jpdc.2015.06.003_br000025) 2014 Schreiber (10.1016/j.jpdc.2015.06.003_br000130) 1989; 10 Demmel (10.1016/j.jpdc.2015.06.003_br000065) 2012; 34 10.1016/j.jpdc.2015.06.003_br000120 Tiskin (10.1016/j.jpdc.2015.06.003_br000175) 2007; 23 Anderson (10.1016/j.jpdc.2015.06.003_br000015) 2011 Mori (10.1016/j.jpdc.2015.06.003_br000140) 2012; 29 Thakur (10.1016/j.jpdc.2015.06.003_br000170) 2005; 19 Träff (10.1016/j.jpdc.2015.06.003_br000180) 2005; vol. 3726 Bischof (10.1016/j.jpdc.2015.06.003_br000040) 1994 Fukaya (10.1016/j.jpdc.2015.06.003_br000090) 2014 Schreiber (10.1016/j.jpdc.2015.06.003_br000150) 1988; 25 Blackford (10.1016/j.jpdc.2015.06.003_br000045) 1997 (10.1016/j.jpdc.2015.06.003_br000095) 2011 Mohiyuddin (10.1016/j.jpdc.2015.06.003_br000135) 2009 Dongarra (10.1016/j.jpdc.2015.06.003_br000075) 2013; 39 Demmel (10.1016/j.jpdc.2015.06.003_br000060) 2013 10.1016/j.jpdc.2015.06.003_br000190 Agullo (10.1016/j.jpdc.2015.06.003_br000005) 2010 10.1016/j.jpdc.2015.06.003_br000115 Farley (10.1016/j.jpdc.2015.06.003_br000085) 1980; 39 Hoemmen (10.1016/j.jpdc.2015.06.003_br000110) 2011 10.1016/j.jpdc.2015.06.003_br000035 10.1016/j.jpdc.2015.06.003_br000155 Auckenthaler (10.1016/j.jpdc.2015.06.003_br000020) 2014; 40 Anderson (10.1016/j.jpdc.2015.06.003_br000010) 1992 Poulson (10.1016/j.jpdc.2015.06.003_br000055) 2013; 39 Higham (10.1016/j.jpdc.2015.06.003_br000105) 2002 Sun (10.1016/j.jpdc.2015.06.003_br000165) 1995; 16 Golub (10.1016/j.jpdc.2015.06.003_br000125) 2012 Yamamoto (10.1016/j.jpdc.2015.06.003_br000185) 2012 Yamamoto (10.1016/j.jpdc.2015.06.003_br000195) 2014 Ballard (10.1016/j.jpdc.2015.06.003_br000030) 2013 Chan (10.1016/j.jpdc.2015.06.003_br000050) 2007; 19 10.1016/j.jpdc.2015.06.003_br000080 Golub (10.1016/j.jpdc.2015.06.003_br000100) 1988 Demmel (10.1016/j.jpdc.2015.06.003_br000070) 2008 Song (10.1016/j.jpdc.2015.06.003_br000160) 2010 |
| References_xml | – volume: 34 start-page: A206 year: 2012 end-page: A239 ident: br000065 article-title: Communication-optimal parallel and sequential QR and LU factorizations publication-title: SIAM J. Sci. Comput. – year: 2014 ident: br000195 article-title: Roundoff Error Analysis of the CholeskyQR2 Algorithm, Tech. Rep. 43 – volume: 39 start-page: 212 year: 2013 end-page: 232 ident: br000075 article-title: Hierarchical QR factorization algorithms for multi-core clusters publication-title: Parallel Comput. – volume: 19 start-page: 1749 year: 2007 end-page: 1783 ident: br000050 article-title: Collective communication: theory, practice, and experience publication-title: Concurrency Comput. Pract. Exp. – reference: C. Bischof, C. Van Loan, The WY representation for products of Householder matrices, SIAM J. Sci. Stat. Comput. 8 (1). – volume: 19 start-page: 49 year: 2005 end-page: 66 ident: br000170 article-title: Optimization of collective communication operations in MPICH publication-title: Int. J. High Perform. Comput. Appl. – year: 2013 ident: br000030 article-title: Reconstructing householder vectors from Tall-Skinny QR, Tech. Rep. – reference: , 2014. – volume: 39 start-page: 385 year: 1980 end-page: 390 ident: br000085 article-title: Broadcast time in communication networks publication-title: SIAM J. Appl. Math. – volume: 25 start-page: 189 year: 1988 end-page: 205 ident: br000150 article-title: Block reflectors: Theory and computation publication-title: SIAM J. Numer. Anal. – year: 2012 ident: br000125 publication-title: Matrix Computations – volume: vol. 3726 start-page: 45 year: 2005 end-page: 56 ident: br000180 article-title: Optimal broadcast for fully connected networks publication-title: High Performance Computing and Communications – volume: 10 start-page: 53 year: 1989 end-page: 57 ident: br000130 article-title: A storage-efficient WY representation for products of householder transformations publication-title: SIAM J. Sci. Stat. Comput. – reference: Edison, NERSC’s Cray XC30 System, – year: 1992 ident: br000010 article-title: LAPACK Users’ Guide – start-page: 1 year: 2010 end-page: 11 ident: br000160 article-title: Scalable tile communication-avoiding QR factorization on multicore cluster systems publication-title: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis SC’10 – volume: 23 start-page: 179 year: 2007 end-page: 188 ident: br000175 article-title: Communication-efficient parallel generic pairwise elimination publication-title: Future Gener. Comput. Syst. – start-page: 48 year: 2011 end-page: 58 ident: br000015 article-title: Communication-avoiding QR decomposition for GPUs publication-title: Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium IPDPS’11 – volume: 16 start-page: 1184 year: 1995 end-page: 1196 ident: br000165 article-title: A basis-kernel representation of orthogonal matrices publication-title: SIAM J. Matrix Anal. Appl. – start-page: 31 year: 2014 end-page: 38 ident: br000090 article-title: CholeskyQR2: A simple and communication-avoiding algorithm for computing a Tall-skinny QR factorization on a large-scale parallel system publication-title: Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems ScalA’14 – year: 2012 ident: br000185 article-title: Aggregation of the compact WY representations generated by the TSQR algorithm publication-title: Conference Talk Presented at SIAM Applied Linear Algebra – reference: G. Ballard, J. Demmel, O. Holtz, O. Schwartz, Minimizing Communication in Linear Algebra, SIAM J. Matrix Anal. Appl. 32 (3). – volume: 29 start-page: 111 year: 2012 end-page: 130 ident: br000140 article-title: Backward error analysis of the AllReduce algorithm for Householder QR decomposition publication-title: Japan J. Ind. Appl. Math. – year: 2013 ident: br000060 article-title: Communication Avoiding Rank Revealing QR Factorization with Column Pivoting, Tech. Rep. UCB/EECS-2013-46 – year: 1994 ident: br000040 article-title: On Orthogonal Block Elimination, Tech. Rep. MCS-P450-0794 – volume: 39 start-page: 13:1 year: 2013 end-page: 13:24 ident: br000055 article-title: Elemental: A new framework for distributed memory dense matrix computations publication-title: ACM Trans. Math. Softw. – reference: Hopper, NERSC’s Cray XE6 System, – year: 1997 ident: br000045 article-title: ScaLAPACK Users’ Guide – year: 2008 ident: br000070 article-title: Communication-Optimal Parallel and Sequential QR and LU Factorizations, Tech. Rep. UCB/EECS-2008-89 – start-page: 1 year: 2010 end-page: 11 ident: br000005 article-title: QR factorization of tall and skinny matrices in a grid computing environment publication-title: Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on – start-page: 1159 year: 2014 end-page: 1170 ident: br000025 article-title: Reconstructing Householder Vectors from Tall-Skinny QR publication-title: IEEE 28th International Parallel and Distributed Processing Symposium – year: 2011 ident: br000095 publication-title: The Future of Computing Performance: Game Over or Next Level? – start-page: 966 year: 2011 end-page: 977 ident: br000110 article-title: A Communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method publication-title: Parallel Distributed Processing Symposium, IPDPS, 2011 IEEE International – start-page: 171 year: 1988 end-page: 179 ident: br000100 article-title: Parallel block schemes for large-scale least-squares computations publication-title: High-speed Computing: Scientific Applications and Algorithm Design – reference: Y. Yamamoto, Personal communication, 2012. – year: 2002 ident: br000105 article-title: Accuracy and Stability of Numerical Algorithms – volume: 13 start-page: 723 year: 1992 end-page: 726 ident: br000145 article-title: Modification of the householder method based on compact WY representation publication-title: SIAM J. Sci. Stat. Comput. – start-page: 36:1 year: 2009 end-page: 36:12 ident: br000135 article-title: Minimizing communication in sparse matrix solvers publication-title: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis SC’09 – reference: E. Solomonik, J. Demmel, Communication-optimal 2.5D matrix multiplication and LU factorization algorithms, in: Springer Lecture Notes in Computer Science, Proceedings of Euro-Par, Bordeaux, France, 2011, pp. 90–109. – volume: 40 start-page: 186 year: 2014 end-page: 194 ident: br000020 article-title: A blocked QR-decomposition for the parallel symmetric eigenvalue problem publication-title: Parallel Comput. – volume: 39 start-page: 385 issue: 2 year: 1980 ident: 10.1016/j.jpdc.2015.06.003_br000085 article-title: Broadcast time in communication networks publication-title: SIAM J. Appl. Math. doi: 10.1137/0139032 – year: 1997 ident: 10.1016/j.jpdc.2015.06.003_br000045 – start-page: 171 year: 1988 ident: 10.1016/j.jpdc.2015.06.003_br000100 article-title: Parallel block schemes for large-scale least-squares computations – ident: 10.1016/j.jpdc.2015.06.003_br000120 doi: 10.1137/0908009 – year: 2013 ident: 10.1016/j.jpdc.2015.06.003_br000030 – year: 2013 ident: 10.1016/j.jpdc.2015.06.003_br000060 – year: 1992 ident: 10.1016/j.jpdc.2015.06.003_br000010 – year: 2012 ident: 10.1016/j.jpdc.2015.06.003_br000185 article-title: Aggregation of the compact WY representations generated by the TSQR algorithm – volume: vol. 3726 start-page: 45 year: 2005 ident: 10.1016/j.jpdc.2015.06.003_br000180 article-title: Optimal broadcast for fully connected networks – ident: 10.1016/j.jpdc.2015.06.003_br000155 doi: 10.1007/978-3-642-23397-5_10 – year: 2012 ident: 10.1016/j.jpdc.2015.06.003_br000125 – ident: 10.1016/j.jpdc.2015.06.003_br000190 – volume: 10 start-page: 53 issue: 1 year: 1989 ident: 10.1016/j.jpdc.2015.06.003_br000130 article-title: A storage-efficient WY representation for products of householder transformations publication-title: SIAM J. Sci. Stat. Comput. doi: 10.1137/0910005 – ident: 10.1016/j.jpdc.2015.06.003_br000035 doi: 10.1137/090769156 – year: 2011 ident: 10.1016/j.jpdc.2015.06.003_br000095 – volume: 34 start-page: A206 issue: 1 year: 2012 ident: 10.1016/j.jpdc.2015.06.003_br000065 article-title: Communication-optimal parallel and sequential QR and LU factorizations publication-title: SIAM J. Sci. Comput. doi: 10.1137/080731992 – start-page: 1 year: 2010 ident: 10.1016/j.jpdc.2015.06.003_br000005 article-title: QR factorization of tall and skinny matrices in a grid computing environment – start-page: 36:1 year: 2009 ident: 10.1016/j.jpdc.2015.06.003_br000135 article-title: Minimizing communication in sparse matrix solvers – start-page: 1 year: 2010 ident: 10.1016/j.jpdc.2015.06.003_br000160 article-title: Scalable tile communication-avoiding QR factorization on multicore cluster systems – start-page: 31 year: 2014 ident: 10.1016/j.jpdc.2015.06.003_br000090 article-title: CholeskyQR2: A simple and communication-avoiding algorithm for computing a Tall-skinny QR factorization on a large-scale parallel system – volume: 19 start-page: 1749 issue: 13 year: 2007 ident: 10.1016/j.jpdc.2015.06.003_br000050 article-title: Collective communication: theory, practice, and experience publication-title: Concurrency Comput. Pract. Exp. doi: 10.1002/cpe.1206 – volume: 39 start-page: 212 issue: 4–5 year: 2013 ident: 10.1016/j.jpdc.2015.06.003_br000075 article-title: Hierarchical QR factorization algorithms for multi-core clusters publication-title: Parallel Comput. doi: 10.1016/j.parco.2013.01.003 – start-page: 48 year: 2011 ident: 10.1016/j.jpdc.2015.06.003_br000015 article-title: Communication-avoiding QR decomposition for GPUs – volume: 40 start-page: 186 issue: 7 year: 2014 ident: 10.1016/j.jpdc.2015.06.003_br000020 article-title: A blocked QR-decomposition for the parallel symmetric eigenvalue problem publication-title: Parallel Comput. doi: 10.1016/j.parco.2014.03.010 – start-page: 1159 year: 2014 ident: 10.1016/j.jpdc.2015.06.003_br000025 article-title: Reconstructing Householder Vectors from Tall-Skinny QR – year: 2008 ident: 10.1016/j.jpdc.2015.06.003_br000070 – volume: 23 start-page: 179 issue: 2 year: 2007 ident: 10.1016/j.jpdc.2015.06.003_br000175 article-title: Communication-efficient parallel generic pairwise elimination publication-title: Future Gener. Comput. Syst. doi: 10.1016/j.future.2006.04.017 – year: 2002 ident: 10.1016/j.jpdc.2015.06.003_br000105 – volume: 19 start-page: 49 issue: 1 year: 2005 ident: 10.1016/j.jpdc.2015.06.003_br000170 article-title: Optimization of collective communication operations in MPICH publication-title: Int. J. High Perform. Comput. Appl. doi: 10.1177/1094342005051521 – volume: 39 start-page: 13:1 issue: 2 year: 2013 ident: 10.1016/j.jpdc.2015.06.003_br000055 article-title: Elemental: A new framework for distributed memory dense matrix computations publication-title: ACM Trans. Math. Softw. doi: 10.1145/2427023.2427030 – volume: 25 start-page: 189 issue: 1 year: 1988 ident: 10.1016/j.jpdc.2015.06.003_br000150 article-title: Block reflectors: Theory and computation publication-title: SIAM J. Numer. Anal. doi: 10.1137/0725014 – ident: 10.1016/j.jpdc.2015.06.003_br000115 – volume: 13 start-page: 723 issue: 3 year: 1992 ident: 10.1016/j.jpdc.2015.06.003_br000145 article-title: Modification of the householder method based on compact WY representation publication-title: SIAM J. Sci. Stat. Comput. doi: 10.1137/0913042 – year: 1994 ident: 10.1016/j.jpdc.2015.06.003_br000040 – ident: 10.1016/j.jpdc.2015.06.003_br000080 – volume: 29 start-page: 111 issue: 1 year: 2012 ident: 10.1016/j.jpdc.2015.06.003_br000140 article-title: Backward error analysis of the AllReduce algorithm for Householder QR decomposition publication-title: Japan J. Ind. Appl. Math. doi: 10.1007/s13160-011-0053-x – volume: 16 start-page: 1184 issue: 4 year: 1995 ident: 10.1016/j.jpdc.2015.06.003_br000165 article-title: A basis-kernel representation of orthogonal matrices publication-title: SIAM J. Matrix Anal. Appl. doi: 10.1137/S0895479894276369 – year: 2014 ident: 10.1016/j.jpdc.2015.06.003_br000195 – start-page: 966 year: 2011 ident: 10.1016/j.jpdc.2015.06.003_br000110 article-title: A Communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method |
| SSID | ssj0011578 |
| Score | 2.2125397 |
| Snippet | The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more... |
| SourceID | osti hal proquest crossref elsevier |
| SourceType | Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 3 |
| SubjectTerms | Algorithms Asymptotic properties Communication-avoiding algorithms Computer Science Dense linear algebra Distributed, Parallel, and Cluster Computing Mathematical analysis Mathematical models MATHEMATICS AND COMPUTING Numerical stability QR decomposition Reconstruction Representations Vectors (mathematics) |
| Title | Reconstructing Householder vectors from Tall-Skinny QR |
| URI | https://dx.doi.org/10.1016/j.jpdc.2015.06.003 https://www.proquest.com/docview/1778036077 https://inria.hal.science/hal-01241785 https://www.osti.gov/servlets/purl/1236219 |
| Volume | 85 |
| WOSCitedRecordID | wos000362620200002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1096-0848 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0011578 issn: 0743-7315 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9owELcY3cNe9j2NdZuyaW8oKJ_YftgDGu2g6xBtmcab5ThOKQoBUYq6f3V_ze7iJMCqVt3DXiyUkNj4jruf7d_dEfJJtnk7Cnxle9yjdqCUY_OAxnbAIyYV8_1Y5oHCx3QwYOMxH9Zqv8tYmHVKs4xdX_PFfxU1XANhY-jsP4i7eilcgM8gdGhB7NDeS_C4oCzTwmbnzR4s7TWeMellc61NcZ08pmQk09TOa2_9ap6c3gJSMTN4mmqTTyDGJLtYH0vnoXCLq1Xp9_KtUFAow5P_utTVRn1Xz2aGCJDzcZs_WxvGz8X5fFkFZ1fu4Ugq8FVFHfnvSI4sCEw4e_ryc3_YHZ41AVEEYOjw0AFGM5QLDEQGTR5cmSMoM2KwdrDyx30FPBjopNjhajLb2etwwyLor9qAK9DClo3EDKvUNxGhLW1suIO8amYSeJZG3tQFuuEvzNbFtDVdxJjP0g3zZK6Ov_GOJSOg1zkTw-6hOO4Pvu3e3WI09jrH0E5kCqMGaERZuIaV-Z5HQw4Wd6_TPxgfVWdcbmhwQvkTipAuwz78e0C3waYHE-Tv1ufgD26giRwijZ6Sx4XaWB2jk89ITWfPyZOybohVuJEXpL2rotaWilqFilqootaWilonpy_Jj8OD0ZeeXdTvsBXA_pUd-EwzHXnKT7xQ-knAAW3zdhInTuRLyZX0mCvB58QAe30A6g7c0jQO3YRGiPVfkXo2z_RrYnGWJBELE4fHNNAqidxAcVczxamnIxo3iFtOj1BFcnussZKKksU4FTilAqdU5FROv0Ga1TMLk9rlzm-H5ayLApwa0ClAl-587iOIqOoAs7mDlgi8ttGRBtlHCeKLMXWzQo4bvBkzIwGiaJAPpWAFGH880ZOZBtEIl1IGENSh9M19utknjzb_qrekDnLW78hDtV5dXC7fF_r5ByLb0SE |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Reconstructing+Householder+vectors+from+Tall-Skinny+QR&rft.jtitle=Journal+of+parallel+and+distributed+computing&rft.au=Ballard%2C+Grey&rft.au=Demmel%2C+James+W.&rft.au=Grigori%2C+Laura&rft.au=Jacquelin%2C+Mathias&rft.series=IPDPS+2014+Selected+Papers+on+Numerical+and+Combinatorial+Algorithms&rft.date=2015-11-01&rft.pub=Elsevier&rft.issn=0743-7315&rft.eissn=1096-0848&rft.volume=85&rft_id=info:doi/10.1016%2Fj.jpdc.2015.06.003&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai%3AHAL%3Ahal-01241785v1 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0743-7315&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0743-7315&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0743-7315&client=summon |