On iterative QR pre-processing in the parallel block-Jacobi SVD algorithm

An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an m × n matrix A includes the pre-processing step, which consists of the QR factorization of A with column pivoting followed by the optional LQ factorization of the R-factor. Then the itera...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Parallel computing Ročník 36; číslo 5; s. 297 - 307
Hlavní autori: Bečka, Martin, Okša, Gabriel, Vajteršic, Marián, Grigori, Laura
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 01.06.2010
Predmet:
ISSN:0167-8191, 1872-7336
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an m × n matrix A includes the pre-processing step, which consists of the QR factorization of A with column pivoting followed by the optional LQ factorization of the R-factor. Then the iterative two-sided block-Jacobi algorithm is applied in parallel to the R-factor (or L-factor). For the efficient computation of the parallel QR (or LQ) factorization with (or without) column pivoting implemented in the ScaLAPACK, some matrix block cyclic distribution on a process grid r × c with p = r × c , r , c ⩾ 1 , and block size n b × n b is required so that all processors remain busy during the whole parallel QR (or LQ) factorization. Optimal values for parameters r, c and n b are estimated experimentally using matrices of order n = 4000 and 8000, and the number of processors p = 8 and 16, respectively. It turns out that the optimal values are about n b = 100 and r ⩽ c with both r, c near to p . These parameters are then used in numerical experiments for six various distributions of singular values combined with well- ( κ = 10 1 ) and ill-conditioned matrices ( κ = 10 8 ) . It is shown that using optimal parameters in the pre-processing step, the parallel two-sided block-Jacobi SVD algorithm performs better (or equally well) than the ScaLAPACK routine PDGESVD for matrices with a multiple minimal/maximal singular value regardless to the condition number. For other distributions of singular values, our algorithm is slower than the ScaLAPACK. The un-pivoted QRLQ pre-processing step is then re-formulated and extended to the QR iteration, and its connection to the QR algorithm applied to specific symmetric, positive definite matrices is shown. This connection helps to explain observations in another set of experiments with a variable number of QR iteration steps. In general, the best results for all six distributions of singular values are achieved by using about six QR iteration steps in pre-processing.
AbstractList An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an mxn matrix A includes the pre-processing step, which consists of the QR factorization of A with column pivoting followed by the optional LQ factorization of the R-factor. Then the iterative two-sided block-Jacobi algorithm is applied in parallel to the R-factor (or L-factor). For the efficient computation of the parallel QR (or LQ) factorization with (or without) column pivoting implemented in the ScaLAPACK, some matrix block cyclic distribution on a process grid rxc with [MathML equation], and block size n sub()bn sub()bis required so that all processors remain busy during the whole parallel QR (or LQ) factorization. Optimal values for parameters r, c and n sub()bare estimated experimentally using matrices of order n=4000 and 8000, and the number of processors p=8 and 16, respectively. It turns out that the optimal values are about n sub()b100 and r[less-than-or-equals, slant]c with both r, c near to [MathML equation]. These parameters are then used in numerical experiments for six various distributions of singular values combined with well- ([kappa]=10 super(1)) and ill-conditioned matrices ([kappa]=10 super(8)). It is shown that using optimal parameters in the pre-processing step, the parallel two-sided block-Jacobi SVD algorithm performs better (or equally well) than the ScaLAPACK routine PDGESVD for matrices with a multiple minimal/maximal singular value regardless to the condition number. For other distributions of singular values, our algorithm is slower than the ScaLAPACK. The un-pivoted QRLQ pre-processing step is then re-formulated and extended to the QR iteration, and its connection to the QR algorithm applied to specific symmetric, positive definite matrices is shown. This connection helps to explain observations in another set of experiments with a variable number of QR iteration steps. In general, the best results for all six distributions of singular values are achieved by using about six QR iteration steps in pre-processing.
An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an m × n matrix A includes the pre-processing step, which consists of the QR factorization of A with column pivoting followed by the optional LQ factorization of the R-factor. Then the iterative two-sided block-Jacobi algorithm is applied in parallel to the R-factor (or L-factor). For the efficient computation of the parallel QR (or LQ) factorization with (or without) column pivoting implemented in the ScaLAPACK, some matrix block cyclic distribution on a process grid r × c with p = r × c , r , c ⩾ 1 , and block size n b × n b is required so that all processors remain busy during the whole parallel QR (or LQ) factorization. Optimal values for parameters r, c and n b are estimated experimentally using matrices of order n = 4000 and 8000, and the number of processors p = 8 and 16, respectively. It turns out that the optimal values are about n b = 100 and r ⩽ c with both r, c near to p . These parameters are then used in numerical experiments for six various distributions of singular values combined with well- ( κ = 10 1 ) and ill-conditioned matrices ( κ = 10 8 ) . It is shown that using optimal parameters in the pre-processing step, the parallel two-sided block-Jacobi SVD algorithm performs better (or equally well) than the ScaLAPACK routine PDGESVD for matrices with a multiple minimal/maximal singular value regardless to the condition number. For other distributions of singular values, our algorithm is slower than the ScaLAPACK. The un-pivoted QRLQ pre-processing step is then re-formulated and extended to the QR iteration, and its connection to the QR algorithm applied to specific symmetric, positive definite matrices is shown. This connection helps to explain observations in another set of experiments with a variable number of QR iteration steps. In general, the best results for all six distributions of singular values are achieved by using about six QR iteration steps in pre-processing.
Author Vajteršic, Marián
Bečka, Martin
Grigori, Laura
Okša, Gabriel
Author_xml – sequence: 1
  givenname: Martin
  surname: Bečka
  fullname: Bečka, Martin
  organization: Institute of Mathematics, Dept. of Informatics, Slovak Academy of Sciences, Bratislava, Slovak Republic
– sequence: 2
  givenname: Gabriel
  surname: Okša
  fullname: Okša, Gabriel
  email: Gabriel.Oksa@savba.sk
  organization: Institute of Mathematics, Dept. of Informatics, Slovak Academy of Sciences, Bratislava, Slovak Republic
– sequence: 3
  givenname: Marián
  surname: Vajteršic
  fullname: Vajteršic, Marián
  organization: Dept. of Computer Sciences, University of Salzburg, Salzburg, Austria
– sequence: 4
  givenname: Laura
  surname: Grigori
  fullname: Grigori, Laura
  organization: INRIA, University Paris Sud-11, Orsay, France
BookMark eNqFkLtOAzEQRS0UJELgC2jcUe3ix3ofBQUKr6BIEc_W8jqTxMFZB9uJxN_jECoKqKaYe65mzjHqda4DhM4oySmh5cUyXyuvXc4IaXLKckL5AerTumJZxXnZQ_2UqrKaNvQIHYewJISURU36aDTpsIngVTRbwI9PeO0hW3unIQTTzbHpcFwATvXKWrC4tU6_Zw9Ku9bg57drrOzceRMXqxN0OFM2wOnPHKDX25uX4X02ntyNhlfjTBeExkxxBjMQrCVlDS0tGg0gGkobKARv013Q8LbmoEnJClFyUlCoxJQIltazlvEBOt_3pis_NhCiXJmgwVrVgdsEWQleFqIQu2SzT2rvQvAwk9rE9KjrolfGSkrkzp5cym97cmdPUiaTvcTyX-zam5Xyn_9Ql3sKkoCtAS-DNtBpmBoPOsqpM3_yX8CCizI
CitedBy_id crossref_primary_10_1016_j_jmatprotec_2019_04_031
crossref_primary_10_1137_21M1411895
crossref_primary_10_1137_17M1117732
crossref_primary_10_1016_j_parco_2017_10_004
crossref_primary_10_1109_LRA_2018_2854295
Cites_doi 10.1016/0024-3795(87)90103-0
10.1023/A:1024082314087
10.1016/j.parco.2005.06.006
10.1016/S0167-8191(01)00138-7
10.1137/S0895479892236532
10.1137/S1064827597319519
ContentType Journal Article
Copyright 2010 Elsevier B.V.
Copyright_xml – notice: 2010 Elsevier B.V.
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.parco.2009.12.013
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-7336
EndPage 307
ExternalDocumentID 10_1016_j_parco_2009_12_013
S0167819110000232
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
123
1B1
1~.
1~5
29O
4.4
457
4G.
5VS
6OB
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABEFU
ABFNM
ABJNI
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
KOM
LG9
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SCC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
WH7
WUQ
XPP
ZMT
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c401t-a32efe52b068eb149cee59119e453b000e93b83ec0624563041e75d05253bfb23
ISICitedReferencesCount 10
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000279086400008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0167-8191
IngestDate Thu Oct 02 11:22:30 EDT 2025
Sat Nov 29 07:23:13 EST 2025
Tue Nov 18 21:59:22 EST 2025
Fri Feb 23 02:30:42 EST 2024
IsPeerReviewed true
IsScholarly true
Issue 5
Keywords QR algorithm
QR iteration
Message passing interface
Process grid
Cyclic matrix distribution
Two-sided block-Jacobi method
Blocking factor
Singular value decomposition
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c401t-a32efe52b068eb149cee59119e453b000e93b83ec0624563041e75d05253bfb23
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
PQID 753645452
PQPubID 23500
PageCount 11
ParticipantIDs proquest_miscellaneous_753645452
crossref_citationtrail_10_1016_j_parco_2009_12_013
crossref_primary_10_1016_j_parco_2009_12_013
elsevier_sciencedirect_doi_10_1016_j_parco_2009_12_013
PublicationCentury 2000
PublicationDate 20100601
PublicationDateYYYYMMDD 2010-06-01
PublicationDate_xml – month: 06
  year: 2010
  text: 20100601
  day: 01
PublicationDecade 2010
PublicationTitle Parallel computing
PublicationYear 2010
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Hong, Pan (bib5) 1992; 58
Huckaby, Chan (bib6) 2003; 32
Okša, Vajteršic (bib7) 2006; 32
Watkins (bib9) 2007
Chandrasekaran, Ipsen (bib3) 1995; 16
Choi, Dongarra, Ostrouchov, Petitet, Walker, Whaley (bib4) 1996; 5
Stewart (bib8) 1999; 20
Chan (bib2) 1987; 88/89
Bečka, Okša, Vajteršic (bib1) 2002; 28
Okša (10.1016/j.parco.2009.12.013_bib7) 2006; 32
Chandrasekaran (10.1016/j.parco.2009.12.013_bib3) 1995; 16
Huckaby (10.1016/j.parco.2009.12.013_bib6) 2003; 32
Chan (10.1016/j.parco.2009.12.013_bib2) 1987; 88/89
Stewart (10.1016/j.parco.2009.12.013_bib8) 1999; 20
Choi (10.1016/j.parco.2009.12.013_bib4) 1996; 5
Watkins (10.1016/j.parco.2009.12.013_bib9) 2007
Hong (10.1016/j.parco.2009.12.013_bib5) 1992; 58
Bečka (10.1016/j.parco.2009.12.013_bib1) 2002; 28
References_xml – volume: 88/89
  start-page: 67
  year: 1987
  end-page: 82
  ident: bib2
  article-title: Rank revealing QR factorizations
  publication-title: Linear Algebra Appl.
– volume: 32
  start-page: 287
  year: 2003
  end-page: 316
  ident: bib6
  article-title: On the convergence of Stewart’s QLP algorithm for approximating the SVD
  publication-title: Numer. Algorithms
– volume: 5
  start-page: 173
  year: 1996
  end-page: 184
  ident: bib4
  article-title: The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines
  publication-title: Sci. Program.
– year: 2007
  ident: bib9
  article-title: The Matrix Eigenvalue Problem
– volume: 16
  start-page: 520
  year: 1995
  end-page: 535
  ident: bib3
  article-title: Analysis pf a QR algorithm for computing singular values
  publication-title: SIAM J. Matrix Anal. Appl.
– volume: 20
  start-page: 1336
  year: 1999
  end-page: 1348
  ident: bib8
  article-title: The QLP approximation to the singular value decomposition
  publication-title: SIAM J. Sci. Comput.
– volume: 32
  start-page: 166
  year: 2006
  end-page: 176
  ident: bib7
  article-title: Efficient pre-processing in the parallel block-Jacobi SVD algorithm
  publication-title: Parallel Comput.
– volume: 28
  start-page: 243
  year: 2002
  end-page: 262
  ident: bib1
  article-title: Dynamic ordering for a parallel block-Jacobi SVD algorithm
  publication-title: Parallel Comput.
– volume: 58
  start-page: 213
  year: 1992
  end-page: 232
  ident: bib5
  article-title: Rank-revealing QR factorizations and the singular value decomposition
  publication-title: Math. Comput.
– year: 2007
  ident: 10.1016/j.parco.2009.12.013_bib9
– volume: 88/89
  start-page: 67
  year: 1987
  ident: 10.1016/j.parco.2009.12.013_bib2
  article-title: Rank revealing QR factorizations
  publication-title: Linear Algebra Appl.
  doi: 10.1016/0024-3795(87)90103-0
– volume: 32
  start-page: 287
  year: 2003
  ident: 10.1016/j.parco.2009.12.013_bib6
  article-title: On the convergence of Stewart’s QLP algorithm for approximating the SVD
  publication-title: Numer. Algorithms
  doi: 10.1023/A:1024082314087
– volume: 32
  start-page: 166
  year: 2006
  ident: 10.1016/j.parco.2009.12.013_bib7
  article-title: Efficient pre-processing in the parallel block-Jacobi SVD algorithm
  publication-title: Parallel Comput.
  doi: 10.1016/j.parco.2005.06.006
– volume: 5
  start-page: 173
  year: 1996
  ident: 10.1016/j.parco.2009.12.013_bib4
  article-title: The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines
  publication-title: Sci. Program.
– volume: 28
  start-page: 243
  year: 2002
  ident: 10.1016/j.parco.2009.12.013_bib1
  article-title: Dynamic ordering for a parallel block-Jacobi SVD algorithm
  publication-title: Parallel Comput.
  doi: 10.1016/S0167-8191(01)00138-7
– volume: 16
  start-page: 520
  issue: 2
  year: 1995
  ident: 10.1016/j.parco.2009.12.013_bib3
  article-title: Analysis pf a QR algorithm for computing singular values
  publication-title: SIAM J. Matrix Anal. Appl.
  doi: 10.1137/S0895479892236532
– volume: 58
  start-page: 213
  year: 1992
  ident: 10.1016/j.parco.2009.12.013_bib5
  article-title: Rank-revealing QR factorizations and the singular value decomposition
  publication-title: Math. Comput.
– volume: 20
  start-page: 1336
  year: 1999
  ident: 10.1016/j.parco.2009.12.013_bib8
  article-title: The QLP approximation to the singular value decomposition
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/S1064827597319519
SSID ssj0006480
Score 1.9243464
Snippet An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an m × n matrix A includes the pre-processing...
An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an mxn matrix A includes the pre-processing step,...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 297
SubjectTerms Algorithms
Blocking
Blocking factor
Cyclic matrix distribution
Factorization
Iterative methods
Mathematical analysis
Mathematical models
Matrices
Matrix methods
Message passing interface
Process grid
QR algorithm
QR iteration
Singular value decomposition
Two-sided block-Jacobi method
Title On iterative QR pre-processing in the parallel block-Jacobi SVD algorithm
URI https://dx.doi.org/10.1016/j.parco.2009.12.013
https://www.proquest.com/docview/753645452
Volume 36
WOSCitedRecordID wos000279086400008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-7336
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0006480
  issn: 0167-8191
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFLaqjQdeuCPGBvIDeypBiWMn8WO1jbEJ7caY-hY5rjvSFrfKumm_g1_M8S20m6jYA1IVVU5sOT4n52Kf8x2EPkgL48XiqFBsGFGwSKOiYhQIAs2CDuFn0fW_5kdHRb_PTzqdXyEX5maSa13c3vLZfyU1tAGxTersA8jdDgoN8B-IDlcgO1z_ifDHuuugkk1M0OmZQQGIZi4dwOev2Owo0ZgqKhMTuS7H0SEIxqrufrvY7YrJ5bSp5z9-LtqtJ-FxaatABH1nE3ysPbo7Fj71J2B5m53b8fYO2-4l9ta-qMAtbwM6LsTIntOb-7X0fWt7bp-0A-w3tZlMSOAWi3sU5ng9xFKFbUsQx8Y1XJS7DvjE8xeLlsSoi9n1Gjl1dXHvCXu37zD6BCsmXR6n3dl1ua3L0Np3VF4biBhi3EalHcQU5eRlQsrYFEJeJznjIOzXewd7_cNWv2fU1uNr3ylgWdmowXtz-Zu9c0fzW3Pm_Bl64v0Q3HP88xx1lH6BnoYaH9iL_Jfo4Fjjlp3w6RleZidcawzshAM74UV2wsBOuGWnV-j7573znS-RL78RSXC655FIiRoqRqo4K0CjUw72FKxHwhVlqTEeFU-rIlUyzszpeRrTROVsYAojptWwIulrtKanWr1BmEuSDwR8_DlYg5xUnA4okVlSVPGAgg7ZQCQsUik9Nr0pkTIpVxBoA31sO80cNMvqx7Ow-qW3Lp3VWAI_re6IA61KkL3mQE1oNb2-KsHVN4B4jLx92FQ20eM_X8kWWps31-odeiRv5vVV897z22-5r6IZ
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On+iterative+QR+pre-processing+in+the+parallel+block-Jacobi+SVD+algorithm&rft.jtitle=Parallel+computing&rft.au=Be%C4%8Dka%2C+Martin&rft.au=Ok%C5%A1a%2C+Gabriel&rft.au=Vajter%C5%A1ic%2C+Mari%C3%A1n&rft.au=Grigori%2C+Laura&rft.date=2010-06-01&rft.issn=0167-8191&rft.volume=36&rft.issue=5-6&rft.spage=297&rft.epage=307&rft_id=info:doi/10.1016%2Fj.parco.2009.12.013&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_parco_2009_12_013
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-8191&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-8191&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-8191&client=summon