On iterative QR pre-processing in the parallel block-Jacobi SVD algorithm
An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an m × n matrix A includes the pre-processing step, which consists of the QR factorization of A with column pivoting followed by the optional LQ factorization of the R-factor. Then the itera...
Uložené v:
| Vydané v: | Parallel computing Ročník 36; číslo 5; s. 297 - 307 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier B.V
01.06.2010
|
| Predmet: | |
| ISSN: | 0167-8191, 1872-7336 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an
m
×
n
matrix
A includes the pre-processing step, which consists of the QR factorization of
A with column pivoting followed by the optional LQ factorization of the
R-factor. Then the iterative two-sided block-Jacobi algorithm is applied in parallel to the
R-factor (or
L-factor). For the efficient computation of the parallel QR (or LQ) factorization with (or without) column pivoting implemented in the ScaLAPACK, some matrix block cyclic distribution on a process grid
r
×
c
with
p
=
r
×
c
,
r
,
c
⩾
1
, and block size
n
b
×
n
b
is required so that all processors remain busy during the whole parallel QR (or LQ) factorization. Optimal values for parameters
r,
c and
n
b
are estimated experimentally using matrices of order
n
=
4000
and 8000, and the number of processors
p
=
8
and 16, respectively. It turns out that the optimal values are about
n
b
=
100
and
r
⩽
c
with both
r,
c near to
p
. These parameters are then used in numerical experiments for six various distributions of singular values combined with well-
(
κ
=
10
1
)
and ill-conditioned matrices
(
κ
=
10
8
)
. It is shown that using optimal parameters in the pre-processing step, the parallel two-sided block-Jacobi SVD algorithm performs better (or equally well) than the ScaLAPACK routine
PDGESVD for matrices with a multiple minimal/maximal singular value regardless to the condition number. For other distributions of singular values, our algorithm is slower than the ScaLAPACK. The un-pivoted QRLQ pre-processing step is then re-formulated and extended to the QR iteration, and its connection to the QR algorithm applied to specific symmetric, positive definite matrices is shown. This connection helps to explain observations in another set of experiments with a variable number of QR iteration steps. In general, the best results for all six distributions of singular values are achieved by using about six QR iteration steps in pre-processing. |
|---|---|
| AbstractList | An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an mxn matrix A includes the pre-processing step, which consists of the QR factorization of A with column pivoting followed by the optional LQ factorization of the R-factor. Then the iterative two-sided block-Jacobi algorithm is applied in parallel to the R-factor (or L-factor). For the efficient computation of the parallel QR (or LQ) factorization with (or without) column pivoting implemented in the ScaLAPACK, some matrix block cyclic distribution on a process grid rxc with [MathML equation], and block size n sub()bn sub()bis required so that all processors remain busy during the whole parallel QR (or LQ) factorization. Optimal values for parameters r, c and n sub()bare estimated experimentally using matrices of order n=4000 and 8000, and the number of processors p=8 and 16, respectively. It turns out that the optimal values are about n sub()b100 and r[less-than-or-equals, slant]c with both r, c near to [MathML equation]. These parameters are then used in numerical experiments for six various distributions of singular values combined with well- ([kappa]=10 super(1)) and ill-conditioned matrices ([kappa]=10 super(8)). It is shown that using optimal parameters in the pre-processing step, the parallel two-sided block-Jacobi SVD algorithm performs better (or equally well) than the ScaLAPACK routine PDGESVD for matrices with a multiple minimal/maximal singular value regardless to the condition number. For other distributions of singular values, our algorithm is slower than the ScaLAPACK. The un-pivoted QRLQ pre-processing step is then re-formulated and extended to the QR iteration, and its connection to the QR algorithm applied to specific symmetric, positive definite matrices is shown. This connection helps to explain observations in another set of experiments with a variable number of QR iteration steps. In general, the best results for all six distributions of singular values are achieved by using about six QR iteration steps in pre-processing. An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an m × n matrix A includes the pre-processing step, which consists of the QR factorization of A with column pivoting followed by the optional LQ factorization of the R-factor. Then the iterative two-sided block-Jacobi algorithm is applied in parallel to the R-factor (or L-factor). For the efficient computation of the parallel QR (or LQ) factorization with (or without) column pivoting implemented in the ScaLAPACK, some matrix block cyclic distribution on a process grid r × c with p = r × c , r , c ⩾ 1 , and block size n b × n b is required so that all processors remain busy during the whole parallel QR (or LQ) factorization. Optimal values for parameters r, c and n b are estimated experimentally using matrices of order n = 4000 and 8000, and the number of processors p = 8 and 16, respectively. It turns out that the optimal values are about n b = 100 and r ⩽ c with both r, c near to p . These parameters are then used in numerical experiments for six various distributions of singular values combined with well- ( κ = 10 1 ) and ill-conditioned matrices ( κ = 10 8 ) . It is shown that using optimal parameters in the pre-processing step, the parallel two-sided block-Jacobi SVD algorithm performs better (or equally well) than the ScaLAPACK routine PDGESVD for matrices with a multiple minimal/maximal singular value regardless to the condition number. For other distributions of singular values, our algorithm is slower than the ScaLAPACK. The un-pivoted QRLQ pre-processing step is then re-formulated and extended to the QR iteration, and its connection to the QR algorithm applied to specific symmetric, positive definite matrices is shown. This connection helps to explain observations in another set of experiments with a variable number of QR iteration steps. In general, the best results for all six distributions of singular values are achieved by using about six QR iteration steps in pre-processing. |
| Author | Vajteršic, Marián Bečka, Martin Grigori, Laura Okša, Gabriel |
| Author_xml | – sequence: 1 givenname: Martin surname: Bečka fullname: Bečka, Martin organization: Institute of Mathematics, Dept. of Informatics, Slovak Academy of Sciences, Bratislava, Slovak Republic – sequence: 2 givenname: Gabriel surname: Okša fullname: Okša, Gabriel email: Gabriel.Oksa@savba.sk organization: Institute of Mathematics, Dept. of Informatics, Slovak Academy of Sciences, Bratislava, Slovak Republic – sequence: 3 givenname: Marián surname: Vajteršic fullname: Vajteršic, Marián organization: Dept. of Computer Sciences, University of Salzburg, Salzburg, Austria – sequence: 4 givenname: Laura surname: Grigori fullname: Grigori, Laura organization: INRIA, University Paris Sud-11, Orsay, France |
| BookMark | eNqFkLtOAzEQRS0UJELgC2jcUe3ix3ofBQUKr6BIEc_W8jqTxMFZB9uJxN_jECoKqKaYe65mzjHqda4DhM4oySmh5cUyXyuvXc4IaXLKckL5AerTumJZxXnZQ_2UqrKaNvQIHYewJISURU36aDTpsIngVTRbwI9PeO0hW3unIQTTzbHpcFwATvXKWrC4tU6_Zw9Ku9bg57drrOzceRMXqxN0OFM2wOnPHKDX25uX4X02ntyNhlfjTBeExkxxBjMQrCVlDS0tGg0gGkobKARv013Q8LbmoEnJClFyUlCoxJQIltazlvEBOt_3pis_NhCiXJmgwVrVgdsEWQleFqIQu2SzT2rvQvAwk9rE9KjrolfGSkrkzp5cym97cmdPUiaTvcTyX-zam5Xyn_9Ql3sKkoCtAS-DNtBpmBoPOsqpM3_yX8CCizI |
| CitedBy_id | crossref_primary_10_1016_j_jmatprotec_2019_04_031 crossref_primary_10_1137_21M1411895 crossref_primary_10_1137_17M1117732 crossref_primary_10_1016_j_parco_2017_10_004 crossref_primary_10_1109_LRA_2018_2854295 |
| Cites_doi | 10.1016/0024-3795(87)90103-0 10.1023/A:1024082314087 10.1016/j.parco.2005.06.006 10.1016/S0167-8191(01)00138-7 10.1137/S0895479892236532 10.1137/S1064827597319519 |
| ContentType | Journal Article |
| Copyright | 2010 Elsevier B.V. |
| Copyright_xml | – notice: 2010 Elsevier B.V. |
| DBID | AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
| DOI | 10.1016/j.parco.2009.12.013 |
| DatabaseName | CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Computer and Information Systems Abstracts |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1872-7336 |
| EndPage | 307 |
| ExternalDocumentID | 10_1016_j_parco_2009_12_013 S0167819110000232 |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 123 1B1 1~. 1~5 29O 4.4 457 4G. 5VS 6OB 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABEFU ABFNM ABJNI ABMAC ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM LG9 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SCC SDF SDG SDP SES SEW SPC SPCBC SST SSV SSZ T5K WH7 WUQ XPP ZMT ~G- 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD 7SC 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c401t-a32efe52b068eb149cee59119e453b000e93b83ec0624563041e75d05253bfb23 |
| ISICitedReferencesCount | 10 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000279086400008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0167-8191 |
| IngestDate | Thu Oct 02 11:22:30 EDT 2025 Sat Nov 29 07:23:13 EST 2025 Tue Nov 18 21:59:22 EST 2025 Fri Feb 23 02:30:42 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 5 |
| Keywords | QR algorithm QR iteration Message passing interface Process grid Cyclic matrix distribution Two-sided block-Jacobi method Blocking factor Singular value decomposition |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c401t-a32efe52b068eb149cee59119e453b000e93b83ec0624563041e75d05253bfb23 |
| Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
| PQID | 753645452 |
| PQPubID | 23500 |
| PageCount | 11 |
| ParticipantIDs | proquest_miscellaneous_753645452 crossref_citationtrail_10_1016_j_parco_2009_12_013 crossref_primary_10_1016_j_parco_2009_12_013 elsevier_sciencedirect_doi_10_1016_j_parco_2009_12_013 |
| PublicationCentury | 2000 |
| PublicationDate | 20100601 |
| PublicationDateYYYYMMDD | 2010-06-01 |
| PublicationDate_xml | – month: 06 year: 2010 text: 20100601 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Parallel computing |
| PublicationYear | 2010 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Hong, Pan (bib5) 1992; 58 Huckaby, Chan (bib6) 2003; 32 Okša, Vajteršic (bib7) 2006; 32 Watkins (bib9) 2007 Chandrasekaran, Ipsen (bib3) 1995; 16 Choi, Dongarra, Ostrouchov, Petitet, Walker, Whaley (bib4) 1996; 5 Stewart (bib8) 1999; 20 Chan (bib2) 1987; 88/89 Bečka, Okša, Vajteršic (bib1) 2002; 28 Okša (10.1016/j.parco.2009.12.013_bib7) 2006; 32 Chandrasekaran (10.1016/j.parco.2009.12.013_bib3) 1995; 16 Huckaby (10.1016/j.parco.2009.12.013_bib6) 2003; 32 Chan (10.1016/j.parco.2009.12.013_bib2) 1987; 88/89 Stewart (10.1016/j.parco.2009.12.013_bib8) 1999; 20 Choi (10.1016/j.parco.2009.12.013_bib4) 1996; 5 Watkins (10.1016/j.parco.2009.12.013_bib9) 2007 Hong (10.1016/j.parco.2009.12.013_bib5) 1992; 58 Bečka (10.1016/j.parco.2009.12.013_bib1) 2002; 28 |
| References_xml | – volume: 88/89 start-page: 67 year: 1987 end-page: 82 ident: bib2 article-title: Rank revealing QR factorizations publication-title: Linear Algebra Appl. – volume: 32 start-page: 287 year: 2003 end-page: 316 ident: bib6 article-title: On the convergence of Stewart’s QLP algorithm for approximating the SVD publication-title: Numer. Algorithms – volume: 5 start-page: 173 year: 1996 end-page: 184 ident: bib4 article-title: The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines publication-title: Sci. Program. – year: 2007 ident: bib9 article-title: The Matrix Eigenvalue Problem – volume: 16 start-page: 520 year: 1995 end-page: 535 ident: bib3 article-title: Analysis pf a QR algorithm for computing singular values publication-title: SIAM J. Matrix Anal. Appl. – volume: 20 start-page: 1336 year: 1999 end-page: 1348 ident: bib8 article-title: The QLP approximation to the singular value decomposition publication-title: SIAM J. Sci. Comput. – volume: 32 start-page: 166 year: 2006 end-page: 176 ident: bib7 article-title: Efficient pre-processing in the parallel block-Jacobi SVD algorithm publication-title: Parallel Comput. – volume: 28 start-page: 243 year: 2002 end-page: 262 ident: bib1 article-title: Dynamic ordering for a parallel block-Jacobi SVD algorithm publication-title: Parallel Comput. – volume: 58 start-page: 213 year: 1992 end-page: 232 ident: bib5 article-title: Rank-revealing QR factorizations and the singular value decomposition publication-title: Math. Comput. – year: 2007 ident: 10.1016/j.parco.2009.12.013_bib9 – volume: 88/89 start-page: 67 year: 1987 ident: 10.1016/j.parco.2009.12.013_bib2 article-title: Rank revealing QR factorizations publication-title: Linear Algebra Appl. doi: 10.1016/0024-3795(87)90103-0 – volume: 32 start-page: 287 year: 2003 ident: 10.1016/j.parco.2009.12.013_bib6 article-title: On the convergence of Stewart’s QLP algorithm for approximating the SVD publication-title: Numer. Algorithms doi: 10.1023/A:1024082314087 – volume: 32 start-page: 166 year: 2006 ident: 10.1016/j.parco.2009.12.013_bib7 article-title: Efficient pre-processing in the parallel block-Jacobi SVD algorithm publication-title: Parallel Comput. doi: 10.1016/j.parco.2005.06.006 – volume: 5 start-page: 173 year: 1996 ident: 10.1016/j.parco.2009.12.013_bib4 article-title: The design and implementation of the ScaLAPACK LU, QR and Cholesky factorization routines publication-title: Sci. Program. – volume: 28 start-page: 243 year: 2002 ident: 10.1016/j.parco.2009.12.013_bib1 article-title: Dynamic ordering for a parallel block-Jacobi SVD algorithm publication-title: Parallel Comput. doi: 10.1016/S0167-8191(01)00138-7 – volume: 16 start-page: 520 issue: 2 year: 1995 ident: 10.1016/j.parco.2009.12.013_bib3 article-title: Analysis pf a QR algorithm for computing singular values publication-title: SIAM J. Matrix Anal. Appl. doi: 10.1137/S0895479892236532 – volume: 58 start-page: 213 year: 1992 ident: 10.1016/j.parco.2009.12.013_bib5 article-title: Rank-revealing QR factorizations and the singular value decomposition publication-title: Math. Comput. – volume: 20 start-page: 1336 year: 1999 ident: 10.1016/j.parco.2009.12.013_bib8 article-title: The QLP approximation to the singular value decomposition publication-title: SIAM J. Sci. Comput. doi: 10.1137/S1064827597319519 |
| SSID | ssj0006480 |
| Score | 1.9243464 |
| Snippet | An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an
m
×
n
matrix
A includes the pre-processing... An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an mxn matrix A includes the pre-processing step,... |
| SourceID | proquest crossref elsevier |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 297 |
| SubjectTerms | Algorithms Blocking Blocking factor Cyclic matrix distribution Factorization Iterative methods Mathematical analysis Mathematical models Matrices Matrix methods Message passing interface Process grid QR algorithm QR iteration Singular value decomposition Two-sided block-Jacobi method |
| Title | On iterative QR pre-processing in the parallel block-Jacobi SVD algorithm |
| URI | https://dx.doi.org/10.1016/j.parco.2009.12.013 https://www.proquest.com/docview/753645452 |
| Volume | 36 |
| WOSCitedRecordID | wos000279086400008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-7336 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0006480 issn: 0167-8191 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFLaqjQdeuCPGBvIDeypBiWMn8WO1jbEJ7caY-hY5rjvSFrfKumm_g1_M8S20m6jYA1IVVU5sOT4n52Kf8x2EPkgL48XiqFBsGFGwSKOiYhQIAs2CDuFn0fW_5kdHRb_PTzqdXyEX5maSa13c3vLZfyU1tAGxTersA8jdDgoN8B-IDlcgO1z_ifDHuuugkk1M0OmZQQGIZi4dwOev2Owo0ZgqKhMTuS7H0SEIxqrufrvY7YrJ5bSp5z9-LtqtJ-FxaatABH1nE3ysPbo7Fj71J2B5m53b8fYO2-4l9ta-qMAtbwM6LsTIntOb-7X0fWt7bp-0A-w3tZlMSOAWi3sU5ng9xFKFbUsQx8Y1XJS7DvjE8xeLlsSoi9n1Gjl1dXHvCXu37zD6BCsmXR6n3dl1ua3L0Np3VF4biBhi3EalHcQU5eRlQsrYFEJeJznjIOzXewd7_cNWv2fU1uNr3ylgWdmowXtz-Zu9c0fzW3Pm_Bl64v0Q3HP88xx1lH6BnoYaH9iL_Jfo4Fjjlp3w6RleZidcawzshAM74UV2wsBOuGWnV-j7573znS-RL78RSXC655FIiRoqRqo4K0CjUw72FKxHwhVlqTEeFU-rIlUyzszpeRrTROVsYAojptWwIulrtKanWr1BmEuSDwR8_DlYg5xUnA4okVlSVPGAgg7ZQCQsUik9Nr0pkTIpVxBoA31sO80cNMvqx7Ow-qW3Lp3VWAI_re6IA61KkL3mQE1oNb2-KsHVN4B4jLx92FQ20eM_X8kWWps31-odeiRv5vVV897z22-5r6IZ |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On+iterative+QR+pre-processing+in+the+parallel+block-Jacobi+SVD+algorithm&rft.jtitle=Parallel+computing&rft.au=Be%C4%8Dka%2C+Martin&rft.au=Ok%C5%A1a%2C+Gabriel&rft.au=Vajter%C5%A1ic%2C+Mari%C3%A1n&rft.au=Grigori%2C+Laura&rft.date=2010-06-01&rft.issn=0167-8191&rft.volume=36&rft.issue=5-6&rft.spage=297&rft.epage=307&rft_id=info:doi/10.1016%2Fj.parco.2009.12.013&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_parco_2009_12_013 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-8191&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-8191&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-8191&client=summon |