Efficient algorithms for computing rank‐revealing factorizations on a GPU

Standard rank‐revealing factorizations such as the singular value decomposition (SVD) and column pivoted QR factorization are challenging to implement efficiently on a GPU. A major difficulty in this regard is the inability of standard algorithms to cast most operations in terms of the Level‐3 BLAS....

Full description

Saved in:
Bibliographic Details
Published in:Numerical linear algebra with applications Vol. 30; no. 6
Main Authors: Heavner, Nathan, Chen, Chao, Gopal, Abinand, Martinsson, Per‐Gunnar
Format: Journal Article
Language:English
Published: Oxford Wiley Subscription Services, Inc 01.12.2023
Subjects:
ISSN:1070-5325, 1099-1506
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Standard rank‐revealing factorizations such as the singular value decomposition (SVD) and column pivoted QR factorization are challenging to implement efficiently on a GPU. A major difficulty in this regard is the inability of standard algorithms to cast most operations in terms of the Level‐3 BLAS. This article presents two alternative algorithms for computing a rank‐revealing factorization of the form , where and are orthogonal and is trapezoidal (or triangular if is square). Both algorithms use randomized projection techniques to cast most of the flops in terms of matrix‐matrix multiplication, which is exceptionally efficient on the GPU. Numerical experiments illustrate that these algorithms achieve significant acceleration over finely tuned GPU implementations of the SVD while providing low rank approximation errors close to that of the SVD.
AbstractList Standard rank‐revealing factorizations such as the singular value decomposition (SVD) and column pivoted QR factorization are challenging to implement efficiently on a GPU. A major difficulty in this regard is the inability of standard algorithms to cast most operations in terms of the Level‐3 BLAS. This article presents two alternative algorithms for computing a rank‐revealing factorization of the form A=UTV∗$$ \mathbf{\mathsf{A}}=\mathbf{\mathsf{UT}}{\mathbf{\mathsf{V}}}^{\ast } $$, where U$$ \mathbf{\mathsf{U}} $$ and V$$ \mathbf{\mathsf{V}} $$ are orthogonal and T$$ \mathbf{\mathsf{T}} $$ is trapezoidal (or triangular if A$$ \mathbf{\mathsf{A}} $$ is square). Both algorithms use randomized projection techniques to cast most of the flops in terms of matrix‐matrix multiplication, which is exceptionally efficient on the GPU. Numerical experiments illustrate that these algorithms achieve significant acceleration over finely tuned GPU implementations of the SVD while providing low rank approximation errors close to that of the SVD.
Standard rank‐revealing factorizations such as the singular value decomposition (SVD) and column pivoted QR factorization are challenging to implement efficiently on a GPU. A major difficulty in this regard is the inability of standard algorithms to cast most operations in terms of the Level‐3 BLAS. This article presents two alternative algorithms for computing a rank‐revealing factorization of the form , where and are orthogonal and is trapezoidal (or triangular if is square). Both algorithms use randomized projection techniques to cast most of the flops in terms of matrix‐matrix multiplication, which is exceptionally efficient on the GPU. Numerical experiments illustrate that these algorithms achieve significant acceleration over finely tuned GPU implementations of the SVD while providing low rank approximation errors close to that of the SVD.
Author Gopal, Abinand
Heavner, Nathan
Martinsson, Per‐Gunnar
Chen, Chao
Author_xml – sequence: 1
  givenname: Nathan
  surname: Heavner
  fullname: Heavner, Nathan
  organization: Department of Applied Mathematics University of Colorado at Boulder Boulder Colorado USA
– sequence: 2
  givenname: Chao
  orcidid: 0000-0002-5385-3651
  surname: Chen
  fullname: Chen, Chao
  organization: Oden Institute University of Texas at Austin Austin Texas USA
– sequence: 3
  givenname: Abinand
  surname: Gopal
  fullname: Gopal, Abinand
  organization: Department of Mathematics Yale University New Haven Connecticut USA
– sequence: 4
  givenname: Per‐Gunnar
  surname: Martinsson
  fullname: Martinsson, Per‐Gunnar
  organization: Oden Institute & Department of Mathematics University of Texas at Austin Austin Texas USA
BookMark eNotkMFKAzEURYNUsK2CnxBw42bqSzKZZJZSahULurDrkKZJnTpNapIKuvIT_Ea_xBnq6l0uh_vgjNDAB28RuiQwIQD0xrd6QjnhJ2hIoK4LwqEa9FlAwRnlZ2iU0hYAKl6zIXqcOdeYxvqMdbsJscmvu4RdiNiE3f6QG7_BUfu33--faD-sbvvCaZM79EvnJviEg8caz5-X5-jU6TbZi_87Rsu72cv0vlg8zR-mt4vCUFLlwoE2tXZSk5UGblZkzSx1outkl6TgdWk4EZTWxAJ1q5KuudBrWZUgHSWOjdHVcXcfw_vBpqy24RB991JRKTkTpWSio66PlIkhpWid2sdmp-OnIqB6VapTpXpV7A9bmF7Z
Cites_doi 10.1109/78.139256
10.1137/090771806
10.1109/7.250406
10.1023/A:1019112103049
10.1145/3242670
10.1007/BF01396757
10.1073/pnas.0709640104
10.1145/2535371
10.1137/0913043
10.1137/S0097539704442696
10.1137/0905030
10.1137/1.9780898719574
10.1137/1.9781611971446
10.1137/18M1179432
10.1145/355984.355990
10.1109/78.157185
10.1137/0717073
10.4153/CMB-1966-083-2
10.1137/S0895479892241913
10.1109/ISBI.2008.4541126
10.1007/s00607-002-1469-6
10.1137/S0895479892242232
10.1016/j.parco.2009.12.005
10.1007/BF01436084
10.1109/IPDPSW.2010.5470941
10.1007/978-3-319-06548-9_1
10.1137/15M1026080
10.1137/0910005
10.1016/j.acha.2010.02.003
10.1137/0911029
10.1007/978-1-4757-4532-0_2
10.1137/S0895479891223781
10.1007/s00211-007-0114-x
10.1016/0024-3795(87)90103-0
10.1155/2010/540159
10.1137/17M1141977
10.1145/3019134
10.1007/BF01436075
10.1137/S1064827597325141
10.1137/13092157X
10.1137/1.9781611971408
10.1007/BF02288367
10.1109/JPROC.2008.917757
10.1137/0917055
10.1137/1.9781611971484
10.1023/A:1019254318361
10.1137/S1064827595296732
10.1137/1.9781611971217
10.1145/567806.567807
10.1093/qmath/11.1.50
10.1137/16M1074527
ContentType Journal Article
Copyright 2023 John Wiley & Sons, Ltd.
Copyright_xml – notice: 2023 John Wiley & Sons, Ltd.
DBID AAYXX
CITATION
7SC
7TB
8FD
FR3
JQ2
KR7
L7M
L~C
L~D
DOI 10.1002/nla.2515
DatabaseName CrossRef
Computer and Information Systems Abstracts
Mechanical & Transportation Engineering Abstracts
Technology Research Database
Engineering Research Database
ProQuest Computer Science Collection
Civil Engineering Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Civil Engineering Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Mechanical & Transportation Engineering Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Civil Engineering Abstracts
CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
EISSN 1099-1506
ExternalDocumentID 10_1002_nla_2515
GroupedDBID -~X
.3N
.4S
.DC
.GA
.Y3
05W
0R~
10A
123
1L6
1OB
1OC
1ZS
31~
33P
3SF
3WU
4.4
50Y
50Z
51W
51X
52M
52N
52O
52P
52S
52T
52U
52W
52X
5VS
66C
702
7PT
8-0
8-1
8-3
8-4
8-5
8UM
930
A03
AAESR
AAEVG
AAHQN
AAMMB
AAMNL
AANHP
AANLZ
AAONW
AASGY
AAXRX
AAYCA
AAYXX
AAZKR
ABCQN
ABCUV
ABEFU
ABEML
ABIJN
ABPVW
ACAHQ
ACBWZ
ACCZN
ACGFS
ACPOU
ACRPL
ACSCC
ACXBN
ACXQS
ACYXJ
ADBBV
ADEOM
ADIZJ
ADKYN
ADMGS
ADNMO
ADOZA
ADXAS
ADZMN
AEFGJ
AEIGN
AEIMD
AENEX
AEUYR
AEYWJ
AFBPY
AFFPM
AFGKR
AFWVQ
AFZJQ
AGHNM
AGQPQ
AGXDD
AGYGG
AHBTC
AIDQK
AIDYY
AIQQE
AITYG
AIURR
AJXKR
ALAGY
ALMA_UNASSIGNED_HOLDINGS
ALUQN
ALVPJ
AMBMR
AMVHM
AMYDB
ARCSS
ASPBG
ATUGU
AUFTA
AVWKF
AZBYB
AZFZN
AZVAB
BAFTC
BDRZF
BFHJK
BHBCM
BMNLL
BMXJE
BNHUX
BROTX
BRXPI
BY8
CITATION
CS3
D-E
D-F
DCZOG
DPXWK
DR2
DRFUL
DRSTM
DU5
EBS
EDO
EJD
F00
F01
F04
FEDTE
G-S
G.N
GBZZK
GNP
GODZA
H.T
H.X
HBH
HF~
HGLYW
HHY
HVGLF
HZ~
IX1
J0M
JPC
KQQ
LATKE
LAW
LC2
LC3
LEEKS
LH4
LITHE
LOXES
LP6
LP7
LUTES
LW6
LYRES
M6O
MEWTI
MK4
MRFUL
MRSTM
MSFUL
MSSTM
MXFUL
MXSTM
N04
N05
N9A
NF~
O66
O8X
O9-
OIG
P2P
P2W
P2X
P4D
PALCI
PQQKQ
Q.N
Q11
QB0
QRW
R.K
RIWAO
RJQFR
ROL
RX1
RYL
SAMSI
SUPJJ
TUS
UB1
V2E
W8V
W99
WBKPD
WIB
WIH
WIK
WOHZO
WQJ
WXSBR
WYISQ
XBAML
XG1
XPP
XV2
ZZTAW
~IA
~WT
7SC
7TB
8FD
FR3
JQ2
KR7
L7M
L~C
L~D
ID FETCH-LOGICAL-c216t-f0ac9af8a1ba05cb1d3e2f7c9a8d3e87594c5172291e02fb42d57ad86408f21f3
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001000637800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1070-5325
IngestDate Sun Jul 13 05:34:18 EDT 2025
Sat Nov 29 05:32:00 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 6
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c216t-f0ac9af8a1ba05cb1d3e2f7c9a8d3e87594c5172291e02fb42d57ad86408f21f3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-5385-3651
PQID 2885374837
PQPubID 2034341
ParticipantIDs proquest_journals_2885374837
crossref_primary_10_1002_nla_2515
PublicationCentury 2000
PublicationDate 2023-12-00
20231201
PublicationDateYYYYMMDD 2023-12-01
PublicationDate_xml – month: 12
  year: 2023
  text: 2023-12-00
PublicationDecade 2020
PublicationPlace Oxford
PublicationPlace_xml – name: Oxford
PublicationTitle Numerical linear algebra with applications
PublicationYear 2023
Publisher Wiley Subscription Services, Inc
Publisher_xml – name: Wiley Subscription Services, Inc
References e_1_2_10_23_1
e_1_2_10_46_1
e_1_2_10_21_1
e_1_2_10_44_1
e_1_2_10_42_1
e_1_2_10_40_1
Golub GH (e_1_2_10_33_1) 1996
Stewart GW (e_1_2_10_22_1) 1994
e_1_2_10_2_1
e_1_2_10_4_1
e_1_2_10_18_1
e_1_2_10_53_1
e_1_2_10_6_1
e_1_2_10_16_1
e_1_2_10_39_1
e_1_2_10_55_1
e_1_2_10_8_1
e_1_2_10_14_1
e_1_2_10_37_1
e_1_2_10_13_1
e_1_2_10_34_1
e_1_2_10_11_1
e_1_2_10_32_1
e_1_2_10_30_1
Rudelson M (e_1_2_10_57_1) 2010
e_1_2_10_51_1
e_1_2_10_29_1
e_1_2_10_27_1
e_1_2_10_25_1
e_1_2_10_48_1
e_1_2_10_24_1
e_1_2_10_45_1
e_1_2_10_43_1
e_1_2_10_20_1
e_1_2_10_41_1
e_1_2_10_52_1
e_1_2_10_3_1
e_1_2_10_19_1
e_1_2_10_54_1
e_1_2_10_5_1
e_1_2_10_17_1
e_1_2_10_38_1
e_1_2_10_56_1
e_1_2_10_7_1
e_1_2_10_15_1
e_1_2_10_36_1
e_1_2_10_12_1
e_1_2_10_35_1
e_1_2_10_9_1
e_1_2_10_10_1
e_1_2_10_31_1
e_1_2_10_50_1
e_1_2_10_28_1
e_1_2_10_49_1
e_1_2_10_26_1
e_1_2_10_47_1
References_xml – ident: e_1_2_10_3_1
  doi: 10.1109/78.139256
– ident: e_1_2_10_40_1
  doi: 10.1137/090771806
– ident: e_1_2_10_16_1
  doi: 10.1109/7.250406
– ident: e_1_2_10_50_1
  doi: 10.1023/A:1019112103049
– ident: e_1_2_10_23_1
  doi: 10.1145/3242670
– ident: e_1_2_10_34_1
  doi: 10.1007/BF01396757
– ident: e_1_2_10_14_1
  doi: 10.1073/pnas.0709640104
– ident: e_1_2_10_36_1
  doi: 10.1145/2535371
– ident: e_1_2_10_5_1
  doi: 10.1137/0913043
– ident: e_1_2_10_15_1
  doi: 10.1137/S0097539704442696
– ident: e_1_2_10_11_1
  doi: 10.1137/0905030
– ident: e_1_2_10_32_1
  doi: 10.1137/1.9780898719574
– ident: e_1_2_10_37_1
  doi: 10.1137/1.9781611971446
– ident: e_1_2_10_49_1
  doi: 10.1137/18M1179432
– ident: e_1_2_10_54_1
  doi: 10.1145/355984.355990
– ident: e_1_2_10_19_1
– ident: e_1_2_10_13_1
  doi: 10.1109/78.157185
– ident: e_1_2_10_10_1
  doi: 10.1137/0717073
– ident: e_1_2_10_20_1
  doi: 10.4153/CMB-1966-083-2
– ident: e_1_2_10_51_1
  doi: 10.1137/S0895479892241913
– ident: e_1_2_10_27_1
  doi: 10.1109/ISBI.2008.4541126
– volume-title: Matrix computations
  year: 1996
  ident: e_1_2_10_33_1
– ident: e_1_2_10_18_1
  doi: 10.1007/s00607-002-1469-6
– ident: e_1_2_10_35_1
  doi: 10.1137/S0895479892242232
– ident: e_1_2_10_41_1
  doi: 10.1016/j.parco.2009.12.005
– ident: e_1_2_10_29_1
  doi: 10.1007/BF01436084
– ident: e_1_2_10_42_1
  doi: 10.1109/IPDPSW.2010.5470941
– ident: e_1_2_10_43_1
  doi: 10.1007/978-3-319-06548-9_1
– ident: e_1_2_10_52_1
  doi: 10.1137/15M1026080
– ident: e_1_2_10_48_1
  doi: 10.1137/0910005
– ident: e_1_2_10_56_1
  doi: 10.1016/j.acha.2010.02.003
– ident: e_1_2_10_6_1
  doi: 10.1137/0911029
– ident: e_1_2_10_25_1
  doi: 10.1007/978-1-4757-4532-0_2
– ident: e_1_2_10_2_1
  doi: 10.1137/S0895479891223781
– ident: e_1_2_10_39_1
  doi: 10.1007/s00211-007-0114-x
– start-page: 225
  volume-title: Numerical analysis 1993: Proceedings of the 15th Dundee Conference June–July 1993
  year: 1994
  ident: e_1_2_10_22_1
– ident: e_1_2_10_38_1
– ident: e_1_2_10_4_1
  doi: 10.1016/0024-3795(87)90103-0
– ident: e_1_2_10_28_1
  doi: 10.1155/2010/540159
– ident: e_1_2_10_53_1
  doi: 10.1137/17M1141977
– ident: e_1_2_10_17_1
  doi: 10.1145/3019134
– ident: e_1_2_10_7_1
  doi: 10.1007/BF01436075
– ident: e_1_2_10_55_1
  doi: 10.1137/S1064827597325141
– start-page: 1576
  volume-title: Proceedings of the International Congress of Mathematicians 2010 (ICM 2010)
  year: 2010
  ident: e_1_2_10_57_1
– ident: e_1_2_10_46_1
  doi: 10.1137/13092157X
– ident: e_1_2_10_12_1
  doi: 10.1137/1.9781611971408
– ident: e_1_2_10_44_1
  doi: 10.1007/BF02288367
– ident: e_1_2_10_26_1
  doi: 10.1109/JPROC.2008.917757
– ident: e_1_2_10_21_1
  doi: 10.1137/0917055
– ident: e_1_2_10_9_1
  doi: 10.1137/1.9781611971484
– ident: e_1_2_10_24_1
  doi: 10.1023/A:1019254318361
– ident: e_1_2_10_30_1
  doi: 10.1137/S1064827595296732
– ident: e_1_2_10_8_1
  doi: 10.1137/1.9781611971217
– ident: e_1_2_10_31_1
  doi: 10.1145/567806.567807
– ident: e_1_2_10_45_1
  doi: 10.1093/qmath/11.1.50
– ident: e_1_2_10_47_1
  doi: 10.1137/16M1074527
SSID ssj0006593
Score 2.3336682
Snippet Standard rank‐revealing factorizations such as the singular value decomposition (SVD) and column pivoted QR factorization are challenging to implement...
SourceID proquest
crossref
SourceType Aggregation Database
Index Database
SubjectTerms Algorithms
Computation
Factorization
Matrices (mathematics)
Singular value decomposition
Title Efficient algorithms for computing rank‐revealing factorizations on a GPU
URI https://www.proquest.com/docview/2885374837
Volume 30
WOSCitedRecordID wos001000637800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVWIB
  databaseName: Wiley Online Library Full Collection 2020
  customDbUrl:
  eissn: 1099-1506
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0006593
  issn: 1070-5325
  databaseCode: DRFUL
  dateStart: 19960101
  isFulltext: true
  titleUrlDefault: https://onlinelibrary.wiley.com
  providerName: Wiley-Blackwell
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELaWLQc4IJ6iUJCRuFUpiZOs7SOCtkiU1Qp1pd5WfkKlVbbal3rkzInfyC9hHD-SLZdy4BJFluJEni-eseebzwi9LZRwUbvNdKkLJ6qtMzmSRSaVpcaC_668zuwZHY_ZxQWfDAY_Yy3Mdk6bhl1f86v_ampoA2O70tl_MHfqFBrgHowOVzA7XG9l-ONWFKKljs-_LWDt_91rLrTs8U3LcnYHtSeWg9NwEm1Ruj97JxZmujSCODydTPvx63jjMzzzQxeeiqV7h0s9hxq5Xja822QV21BUM2736TtCgYkJ_0ViAcES3rMGpGPo6G6_3IkdrEJ12AS6i19_CgG4WPb3LkjZ44H46RYmnKwufenzkQltnGdO97A_R4fczWU34f419Qcp2bk4gpCt7txbTOnf8HqJi-h1m0mrru2evIP2CK05G6K9j19PpmfJr4-8hHP65ihlnJN38a27wc2ub28DlvOH6EFYaeD3HiGP0MA0j9H9L0mmd_UEfU5YwR1WMGAFJ6xgh5XfP34llOBdlOBFgwUGlDxF05Pj8w-fsnC6RqZIMVpnNheKC8tEIUVeK1no0hBLoY3BHSxjeaVqCG8JL0xOrKyIrqnQbFTlzJLCls_QsFk05jnCJVPaWFlLqcAnaMEMzQUHz0k5qxQp99GbOCyzKy-iMrs57PvoII7XLPxNqxlhEE1Sd-jBi1t08RLd60B2gIbr5ca8QnfVdn25Wr4O5vwDQsJ0eA
linkProvider Wiley-Blackwell
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Efficient+algorithms+for+computing+rank%E2%80%90revealing+factorizations+on+a+GPU&rft.jtitle=Numerical+linear+algebra+with+applications&rft.au=Heavner%2C+Nathan&rft.au=Chen%2C+Chao&rft.au=Gopal%2C+Abinand&rft.au=Martinsson%2C+Per%E2%80%90Gunnar&rft.date=2023-12-01&rft.issn=1070-5325&rft.eissn=1099-1506&rft.volume=30&rft.issue=6&rft_id=info:doi/10.1002%2Fnla.2515&rft.externalDBID=n%2Fa&rft.externalDocID=10_1002_nla_2515
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1070-5325&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1070-5325&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1070-5325&client=summon