A high performance implementation of Zolo-SVD algorithm on distributed memory systems
This paper introduces a high performance implementation of the Zolo-SVD algorithm on distributed memory systems, which is based on the polar decomposition (PD) algorithm via the Zolotarev’s function (Zolo-PD), originally proposed by Nakatsukasa and Freund [SIAM Review, 2016]. Our implementation high...
Uloženo v:
| Vydáno v: | Parallel computing Ročník 86; s. 57 - 65 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
01.08.2019
|
| Témata: | |
| ISSN: | 0167-8191, 1872-7336 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | This paper introduces a high performance implementation of the Zolo-SVD algorithm on distributed memory systems, which is based on the polar decomposition (PD) algorithm via the Zolotarev’s function (Zolo-PD), originally proposed by Nakatsukasa and Freund [SIAM Review, 2016]. Our implementation highly relies on the routines of ScaLAPACK and therefore it is portable. Compared with the other PD algorithms such as the QR-based dynamically weighted Halley method (QDWH-PD), Zolo-PD is naturally parallelizable and has better scalability though performs more floating-point operations. When using many processors, Zolo-PD is usually 1.20 times faster than the QDWH-PD algorithm, and Zolo-SVD can be about two times faster than the ScaLAPACK routine PDGESVD. These numerical experiments are performed on Tianhe-2A supercomputer, one of the fastest supercomputers in the world, and the tested matrices include some sparse matrices from particular applications and some randomly generated dense matrices with different dimensions. Our QDWH-SVD and Zolo-SVD implementations are freely available at https://github.com/shengguolsg/Zolo-SVD. |
|---|---|
| AbstractList | This paper introduces a high performance implementation of the Zolo-SVD algorithm on distributed memory systems, which is based on the polar decomposition (PD) algorithm via the Zolotarev’s function (Zolo-PD), originally proposed by Nakatsukasa and Freund [SIAM Review, 2016]. Our implementation highly relies on the routines of ScaLAPACK and therefore it is portable. Compared with the other PD algorithms such as the QR-based dynamically weighted Halley method (QDWH-PD), Zolo-PD is naturally parallelizable and has better scalability though performs more floating-point operations. When using many processors, Zolo-PD is usually 1.20 times faster than the QDWH-PD algorithm, and Zolo-SVD can be about two times faster than the ScaLAPACK routine PDGESVD. These numerical experiments are performed on Tianhe-2A supercomputer, one of the fastest supercomputers in the world, and the tested matrices include some sparse matrices from particular applications and some randomly generated dense matrices with different dimensions. Our QDWH-SVD and Zolo-SVD implementations are freely available at https://github.com/shengguolsg/Zolo-SVD. |
| Author | Li, Shengguo Liu, Jie Du, Yunfei |
| Author_xml | – sequence: 1 givenname: Shengguo surname: Li fullname: Li, Shengguo email: nudtlsg@gmail.com organization: College of Computer Science, National University of Defense Technology, Changsha 410073, China – sequence: 2 givenname: Jie surname: Liu fullname: Liu, Jie organization: College of Computer Science, National University of Defense Technology, Changsha 410073, China – sequence: 3 givenname: Yunfei surname: Du fullname: Du, Yunfei email: duyunfei@mail.sysu.edu.cn organization: Department of Computer Science, Sun Yat-Sen University, Guangzhou 510006, China |
| BookMark | eNp9kMtOwzAQRS0EEm3hC9j4BxL8SJxkwaIqT6kSCygLNpZjj1tXcRzZAal_T0pZs5rFzBnde-bovA89IHRDSU4JFbf7fFBRh5wR2uSkyAkpztCM1hXLKs7FOZpNV1VW04ZeonlKe0KIKGoyQ5sl3rntDg8QbYhe9Rqw80MHHvpRjS70OFj8GbqQvX3cY9VtQ3TjzuNpYVwao2u_RjDYgw_xgNMhjeDTFbqwqktw_TcXaPP48L56ztavTy-r5TrTrORj1jJbCCZarY0FS-qWUytYU4HWnJu2olVTQmGFIqrklDJqDOjSlkXNRM2p4QvET391DClFsHKIzqt4kJTIoxm5l79m5NGMJIWczEzU3YmCKdq3gyiTdjA1Ny6CHqUJ7l_-BzY-cP8 |
| Cites_doi | 10.1007/BF02287921 10.1145/2049662.2049663 10.1137/120876605 10.1016/j.parco.2011.05.002 10.1137/090771806 10.1137/1.9781611971446 10.1137/080731992 10.1137/16M1058467 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 10.1016/S0024-3795(01)00569-9 10.1016/S0167-8191(99)00041-1 10.1137/050628301 10.1137/050636723 10.1137/0911052 10.1137/S1064827597329266 10.1145/2450153.2450154 10.1145/2894747 10.1016/j.laa.2004.09.019 10.1137/S0895479892242232 10.1137/090774999 10.1137/S1064827597327309 10.1109/TPDS.2017.2755655 10.1016/0010-4655(96)00017-3 |
| ContentType | Journal Article |
| Copyright | 2019 Elsevier B.V. |
| Copyright_xml | – notice: 2019 Elsevier B.V. |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.parco.2019.04.004 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1872-7336 |
| EndPage | 65 |
| ExternalDocumentID | 10_1016_j_parco_2019_04_004 S0167819118301807 |
| GroupedDBID | --K --M -~X .DC .~1 0R~ 123 1B1 1~. 1~5 29O 4.4 457 4G. 5VS 6OB 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABEFU ABFNM ABJNI ABMAC ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM LG9 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SCC SDF SDG SDP SES SEW SPC SPCBC SST SSV SSZ T5K WH7 WUQ XPP ZMT ~G- 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD |
| ID | FETCH-LOGICAL-c253t-b2f4626bccdfef08b31f6297ecc33db71795e4f6a0a531121ddec5f54826831d3 |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000472685800006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0167-8191 |
| IngestDate | Sat Nov 29 04:06:56 EST 2025 Fri Feb 23 02:29:26 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | QDWH Zolotarev ScaLAPACK Polar decomposition Distributed parallel algorithm 68W10 65F15 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c253t-b2f4626bccdfef08b31f6297ecc33db71795e4f6a0a531121ddec5f54826831d3 |
| PageCount | 9 |
| ParticipantIDs | crossref_primary_10_1016_j_parco_2019_04_004 elsevier_sciencedirect_doi_10_1016_j_parco_2019_04_004 |
| PublicationCentury | 2000 |
| PublicationDate | August 2019 2019-08-00 |
| PublicationDateYYYYMMDD | 2019-08-01 |
| PublicationDate_xml | – month: 08 year: 2019 text: August 2019 |
| PublicationDecade | 2010 |
| PublicationTitle | Parallel computing |
| PublicationYear | 2019 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Demmel, Kahan (bib0013) 1990; 11 Nakatsukasa, Higham (bib0033) 2013; 35 Zha, Simon (bib0040) 1999; 21 Sukkari, Ltaief, Keyes (bib0037) 2016; 43 M. Faverge, J. Langou, Y. Robert, J. Dongarra, Bidiagonalization with parallel tiled algorithms, 2016, ArXiv: 1611.06892v1. Hotelling (bib0024) 1935; 1 Halko, Martinsson, Tropp (bib0020) 2011; 53 Nakatsukasa, Bai, Gygi (bib0030) 2010; 31 Akhiezer. (bib0001) 1990 Gu, Eisenstat (bib0017) 1995; 16 Davis, Hu (bib0009) 2011; 38 Simon, Zha (bib0035) 2000; 21 H. Ltaief, D. Sukkari, A. Esposito, Y. Nakatsukasa, D. Keyes, Massively parallel polar decomposition on distributed-memory systems. submitted to ACM transactions on parallel computing under revision, 2018 Demmel, Grigori, Hoemmen, Langou (bib0012) 2012; 34 Auckenthaler, Blum, Bungartz, Huckle, Johanni, Krämer, Lang, Lederer, Willems (bib0002) 2011; 37 Barlow, Bosner, Drmač (bib0004) 2005; 397 Großer, Lang (bib0014) 1999; 25 Iwen, Ong (bib0025) 2016; 37 Blackford, Choi, Cleary, D’Azevedo, Demmel, Dhillon, Dongarra, Hammarling, Henry, Petitet, Stanley, Walker, Whaley (bib0005) 1997 Deerwester, Dumais, Furnas, Landauer, Harshman (bib0010) 1990; 41 J. Demmel, Applied numerical linear algebra, 1997, SIAM, Philadelphia. Higham, Papadimitriou (bib0023) 1994 Choi, Demmel, Dhillon, Dongarra, Ostrouchov, Petitet, Stanley, Walker, Whaley (bib0007) 1996; 97 Ltaif, Luszczek, Dongarra (bib0027) 2013; 39 Nakatsukasa, Higham (bib0032) 2013; 35 D. Sukkari, H. Ltaief, A. Esposito, D. Keyes, A QDWH-based SVD software framework on distributed-memory manycore systems. submitted to ACM transactions on mathematical software (under revision, 2017 Willems, Lang, Vömel (bib0039) 2006; 28 Nakatsukasa, Freund (bib0031) 2016; xx(x) Haidar, Luszczek, Kurzak, Dongarra (bib0019) 2013 Higham (bib0021) 2008 . Golub, Kahan (bib0016) 1965; 2 Ralha (bib0034) 2003; 358 Sukkari, Ltaief, Keyes (bib0038) 2016 Moore (bib0029) 1981; 1 Ltaief, Luszczek, Dongarra (bib0026) 2013; 39 Bosner, Barlow (bib0006) 2007; 29 Higham, Papadimitriou (bib0022) 1993 Ballard, Demmel, Holtz, Schwartz (bib0003) 2011; 21 Sukkari, Ltaief, Faverge, Keyes (bib0036) 2018; 29 Marek, Blum, Johanni, Havu, Lang, Auckenthaler, Heinecke, Bungartz, Lederer (bib0028) 2014; 26 Gu (10.1016/j.parco.2019.04.004_bib0017) 1995; 16 Hotelling (10.1016/j.parco.2019.04.004_bib0024) 1935; 1 Großer (10.1016/j.parco.2019.04.004_bib0014) 1999; 25 Marek (10.1016/j.parco.2019.04.004_bib0028) 2014; 26 Sukkari (10.1016/j.parco.2019.04.004_bib0036) 2018; 29 Sukkari (10.1016/j.parco.2019.04.004_bib0038) 2016 Auckenthaler (10.1016/j.parco.2019.04.004_bib0002) 2011; 37 10.1016/j.parco.2019.04.004_bib0018 Blackford (10.1016/j.parco.2019.04.004_bib0005) 1997 10.1016/j.parco.2019.04.004_bib0015 Iwen (10.1016/j.parco.2019.04.004_bib0025) 2016; 37 Zha (10.1016/j.parco.2019.04.004_bib0040) 1999; 21 Higham (10.1016/j.parco.2019.04.004_bib0022) 1993 10.1016/j.parco.2019.04.004_bib0011 Nakatsukasa (10.1016/j.parco.2019.04.004_bib0032) 2013; 35 Higham (10.1016/j.parco.2019.04.004_bib0023) 1994 Haidar (10.1016/j.parco.2019.04.004_bib0019) 2013 Willems (10.1016/j.parco.2019.04.004_bib0039) 2006; 28 Choi (10.1016/j.parco.2019.04.004_bib0007) 1996; 97 Nakatsukasa (10.1016/j.parco.2019.04.004_bib0030) 2010; 31 Bosner (10.1016/j.parco.2019.04.004_bib0006) 2007; 29 Nakatsukasa (10.1016/j.parco.2019.04.004_bib0031) 2016; xx(x) Ralha (10.1016/j.parco.2019.04.004_bib0034) 2003; 358 Barlow (10.1016/j.parco.2019.04.004_bib0004) 2005; 397 Davis (10.1016/j.parco.2019.04.004_sbref0008) 2011; 38 Halko (10.1016/j.parco.2019.04.004_bib0020) 2011; 53 Ltaif (10.1016/j.parco.2019.04.004_sbref0023) 2013; 39 10.1016/j.parco.2019.04.004_bib0008 Higham (10.1016/j.parco.2019.04.004_bib0021) 2008 Moore (10.1016/j.parco.2019.04.004_sbref0025) 1981; 1 Golub (10.1016/j.parco.2019.04.004_bib0016) 1965; 2 Deerwester (10.1016/j.parco.2019.04.004_bib0010) 1990; 41 Demmel (10.1016/j.parco.2019.04.004_bib0012) 2012; 34 Ballard (10.1016/j.parco.2019.04.004_bib0003) 2011; 21 Demmel (10.1016/j.parco.2019.04.004_bib0013) 1990; 11 Simon (10.1016/j.parco.2019.04.004_bib0035) 2000; 21 Sukkari (10.1016/j.parco.2019.04.004_bib0037) 2016; 43 Akhiezer. (10.1016/j.parco.2019.04.004_bib0001) 1990 Nakatsukasa (10.1016/j.parco.2019.04.004_bib0033) 2013; 35 Ltaief (10.1016/j.parco.2019.04.004_sbref0022) 2013; 39 |
| References_xml | – volume: 38 year: 2011 ident: bib0009 article-title: The Univeristy of Florida sparse matrix collection publication-title: ACM Trans. Math. Softw. – start-page: 80 year: 1994 end-page: 84 ident: bib0023 article-title: A New Parallel Algorithm for Computing the Singular Value Decomposition publication-title: The Fifth SIAM Conference on Applied Linear Algebra – volume: 397 start-page: 35 year: 2005 end-page: 84 ident: bib0004 article-title: A new stable bidiagonal reduction algorithm publication-title: Linear Algebra Appl. – volume: 11 start-page: 873 year: 1990 end-page: 912 ident: bib0013 article-title: Accurate singular values of bidiagonal matrices publication-title: SIAM J. Sci. Comput. – reference: M. Faverge, J. Langou, Y. Robert, J. Dongarra, Bidiagonalization with parallel tiled algorithms, 2016, ArXiv: 1611.06892v1. – volume: 34 start-page: A206 year: 2012 end-page: A239 ident: bib0012 article-title: Communication-optimal parallel and sequential QR and LU factorizations publication-title: SIAM J. Sci. Comput. – year: 1997 ident: bib0005 article-title: ScaLAPACK Users’ Guide – volume: 39 year: 2013 ident: bib0026 article-title: High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures publication-title: ACM Trans. Math. Softw. – volume: 29 start-page: 927 year: 2007 end-page: 953 ident: bib0006 article-title: Block and parallel versions of one-sided bidiagonalization publication-title: SIAM J. Matrix Anal. Appl. – volume: 29 start-page: 312 year: 2018 end-page: 323 ident: bib0036 article-title: Asynchronous task-based polar decomposition on single node manycore architectures publication-title: IEEE Trans. Parallel Distrib. Syst. – volume: 37 start-page: 783 year: 2011 end-page: 794 ident: bib0002 article-title: Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations publication-title: Parallel Comput. – volume: 21 start-page: 782 year: 1999 end-page: 791 ident: bib0040 article-title: On updating problems in latent semantic indexing publication-title: SIAM J. Sci. Comput. – volume: 21 start-page: 2257 year: 2000 end-page: 2274 ident: bib0035 article-title: Low-rank matrix approximation using the Lanczos bidiagonalization process with applications publication-title: SIAM J. Sci. Comput. – volume: 39 start-page: 1 year: 2013 end-page: 22 ident: bib0027 article-title: High performance bidiagonal reduction using tile algorithm on homogeneous multicore architectures publication-title: ACM Trans. Math. Softw. – volume: 25 start-page: 969 year: 1999 end-page: 986 ident: bib0014 article-title: Efficient parallel reduction to bidiagonal form publication-title: Parallel Comput. – volume: 16 start-page: 79 year: 1995 end-page: 92 ident: bib0017 article-title: A divide-and-conquer algorithm for the bidiagonal SVD publication-title: SIAM J. Matrix Anal. Appl. – volume: 35 start-page: A1325 year: 2013 end-page: A1349 ident: bib0033 article-title: Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD publication-title: SIAM J. Sci. Comput. – reference: J. Demmel, Applied numerical linear algebra, 1997, SIAM, Philadelphia. – reference: H. Ltaief, D. Sukkari, A. Esposito, Y. Nakatsukasa, D. Keyes, Massively parallel polar decomposition on distributed-memory systems. submitted to ACM transactions on parallel computing under revision, 2018, – volume: 358 start-page: 219 year: 2003 end-page: 238 ident: bib0034 article-title: One-sided reduction to bidiagonal form publication-title: Linear Algebra Appl. – volume: 97 start-page: 1 year: 1996 end-page: 15 ident: bib0007 article-title: Scalapack: a portable linear algebra library for distributed memory computers-design issues and performance publication-title: Comput. Phys. Commun. – start-page: 605 year: 2016 end-page: 616 ident: bib0038 article-title: High Performance Polar Decomposition on Distributed Memory Systems publication-title: Euro-Par 2016, LNCS, volume 9833 – year: 2008 ident: bib0021 article-title: Functions of Matrices: Theory and Computation – volume: 37 start-page: 1699 year: 2016 end-page: 1718 ident: bib0025 article-title: A distributed and incremental SVD algorithm for agglomerative data analysis on large networks publication-title: SIAM J. Matrix Anal. Appl. – year: 1990 ident: bib0001 article-title: Elements of the Theory of Elliptic Functions – volume: 26 start-page: 1 year: 2014 end-page: 15 ident: bib0028 article-title: The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science publication-title: J. Phys.: Condens. Matter – volume: 41 start-page: 391 year: 1990 end-page: 407 ident: bib0010 article-title: Indexing by latent semantic analysis publication-title: J. Soc. Inf. Sci. – year: 2013 ident: bib0019 article-title: An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware publication-title: In William Gropp, editor, – reference: . – year: 1993 ident: bib0022 article-title: Parallel Singular Value Decomposition via the Polar Decomposition publication-title: Technical Report Numerical Analysis Report 239, Manchester Centre for Computational Mathematics, Manchester, England – volume: 21 start-page: 562 year: 2011 end-page: 580 ident: bib0003 article-title: Minimizing communication in numerical linear algebra publication-title: SIAM J. Matrix Anal. Appl. – volume: 53 start-page: 217 year: 2011 end-page: 288 ident: bib0020 article-title: Finding structure with randomness probabilistic algorithms for constructing approximate matrix decompositions publication-title: SIAM Rev. – volume: xx(x) start-page: xx year: 2016 end-page: xxx ident: bib0031 article-title: Computing fundamental matrix decompositions accurately via the matrix sign function in two iterations: the power of Zolotarev’s functions publication-title: SIAM Rev. – volume: 43 start-page: 1 year: 2016 end-page: 25 ident: bib0037 article-title: A high performance QDWH-SVD solver using hardware accelerators publication-title: ACM Trans. Math. Softw. – volume: 28 start-page: 907 year: 2006 end-page: 926 ident: bib0039 article-title: Computing the bidiagonal SVD using multiple relatively robust representations publication-title: SIAM J. Matrix Anal. Appl. – volume: 31 start-page: 2700 year: 2010 end-page: 2720 ident: bib0030 article-title: Optimizing Halley’s iteration for computing the matrix polar decomposition publication-title: SIAM J. Matrix Anal. Appl. – volume: 1 start-page: 27 year: 1935 end-page: 35 ident: bib0024 article-title: Simplified calculation of principal components publication-title: Psychometrica – volume: 1 year: 1981 ident: bib0029 article-title: Principal component analysis in linear systems: controllability, observability, and model reduction publication-title: IEEE Trans. Autom. Control – reference: D. Sukkari, H. Ltaief, A. Esposito, D. Keyes, A QDWH-based SVD software framework on distributed-memory manycore systems. submitted to ACM transactions on mathematical software (under revision, 2017, – volume: 35 start-page: A1325 year: 2013 end-page: A1349 ident: bib0032 article-title: Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD publication-title: SIAM J. Sci. Comput. – volume: 2 start-page: 205 year: 1965 end-page: 224 ident: bib0016 article-title: Calculating the singular values and pseudo-inverse of a matrix publication-title: SIAM J. Numer. Anal. – volume: 1 start-page: 27 year: 1935 ident: 10.1016/j.parco.2019.04.004_bib0024 article-title: Simplified calculation of principal components publication-title: Psychometrica doi: 10.1007/BF02287921 – volume: 38 issue: 1 year: 2011 ident: 10.1016/j.parco.2019.04.004_sbref0008 article-title: The Univeristy of Florida sparse matrix collection publication-title: ACM Trans. Math. Softw. doi: 10.1145/2049662.2049663 – volume: 35 start-page: A1325 issue: 3 year: 2013 ident: 10.1016/j.parco.2019.04.004_bib0032 article-title: Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD publication-title: SIAM J. Sci. Comput. doi: 10.1137/120876605 – volume: xx(x) start-page: xx year: 2016 ident: 10.1016/j.parco.2019.04.004_bib0031 article-title: Computing fundamental matrix decompositions accurately via the matrix sign function in two iterations: the power of Zolotarev’s functions publication-title: SIAM Rev. – volume: 26 start-page: 1 year: 2014 ident: 10.1016/j.parco.2019.04.004_bib0028 article-title: The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science publication-title: J. Phys.: Condens. Matter – volume: 37 start-page: 783 issue: 12 year: 2011 ident: 10.1016/j.parco.2019.04.004_bib0002 article-title: Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations publication-title: Parallel Comput. doi: 10.1016/j.parco.2011.05.002 – volume: 35 start-page: A1325 issue: 3 year: 2013 ident: 10.1016/j.parco.2019.04.004_bib0033 article-title: Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD publication-title: SIAM J. Sci. Comput. doi: 10.1137/120876605 – volume: 2 start-page: 205 year: 1965 ident: 10.1016/j.parco.2019.04.004_bib0016 article-title: Calculating the singular values and pseudo-inverse of a matrix publication-title: SIAM J. Numer. Anal. – volume: 53 start-page: 217 year: 2011 ident: 10.1016/j.parco.2019.04.004_bib0020 article-title: Finding structure with randomness probabilistic algorithms for constructing approximate matrix decompositions publication-title: SIAM Rev. doi: 10.1137/090771806 – ident: 10.1016/j.parco.2019.04.004_bib0011 doi: 10.1137/1.9781611971446 – volume: 34 start-page: A206 year: 2012 ident: 10.1016/j.parco.2019.04.004_bib0012 article-title: Communication-optimal parallel and sequential QR and LU factorizations publication-title: SIAM J. Sci. Comput. doi: 10.1137/080731992 – volume: 37 start-page: 1699 issue: 4 year: 2016 ident: 10.1016/j.parco.2019.04.004_bib0025 article-title: A distributed and incremental SVD algorithm for agglomerative data analysis on large networks publication-title: SIAM J. Matrix Anal. Appl. doi: 10.1137/16M1058467 – ident: 10.1016/j.parco.2019.04.004_bib0015 – start-page: 605 year: 2016 ident: 10.1016/j.parco.2019.04.004_bib0038 article-title: High Performance Polar Decomposition on Distributed Memory Systems – volume: 41 start-page: 391 year: 1990 ident: 10.1016/j.parco.2019.04.004_bib0010 article-title: Indexing by latent semantic analysis publication-title: J. Soc. Inf. Sci. doi: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 – volume: 1 year: 1981 ident: 10.1016/j.parco.2019.04.004_sbref0025 article-title: Principal component analysis in linear systems: controllability, observability, and model reduction publication-title: IEEE Trans. Autom. Control – volume: 358 start-page: 219 year: 2003 ident: 10.1016/j.parco.2019.04.004_bib0034 article-title: One-sided reduction to bidiagonal form publication-title: Linear Algebra Appl. doi: 10.1016/S0024-3795(01)00569-9 – volume: 25 start-page: 969 year: 1999 ident: 10.1016/j.parco.2019.04.004_bib0014 article-title: Efficient parallel reduction to bidiagonal form publication-title: Parallel Comput. doi: 10.1016/S0167-8191(99)00041-1 – volume: 28 start-page: 907 issue: 4 year: 2006 ident: 10.1016/j.parco.2019.04.004_bib0039 article-title: Computing the bidiagonal SVD using multiple relatively robust representations publication-title: SIAM J. Matrix Anal. Appl. doi: 10.1137/050628301 – ident: 10.1016/j.parco.2019.04.004_bib0008 – volume: 29 start-page: 927 issue: 3 year: 2007 ident: 10.1016/j.parco.2019.04.004_bib0006 article-title: Block and parallel versions of one-sided bidiagonalization publication-title: SIAM J. Matrix Anal. Appl. doi: 10.1137/050636723 – volume: 11 start-page: 873 year: 1990 ident: 10.1016/j.parco.2019.04.004_bib0013 article-title: Accurate singular values of bidiagonal matrices publication-title: SIAM J. Sci. Comput. doi: 10.1137/0911052 – year: 2008 ident: 10.1016/j.parco.2019.04.004_bib0021 – volume: 21 start-page: 562 issue: 2 year: 2011 ident: 10.1016/j.parco.2019.04.004_bib0003 article-title: Minimizing communication in numerical linear algebra publication-title: SIAM J. Matrix Anal. Appl. – volume: 21 start-page: 782 year: 1999 ident: 10.1016/j.parco.2019.04.004_bib0040 article-title: On updating problems in latent semantic indexing publication-title: SIAM J. Sci. Comput. doi: 10.1137/S1064827597329266 – year: 1990 ident: 10.1016/j.parco.2019.04.004_bib0001 – year: 2013 ident: 10.1016/j.parco.2019.04.004_bib0019 article-title: An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware – volume: 39 start-page: 1 issue: 3 year: 2013 ident: 10.1016/j.parco.2019.04.004_sbref0023 article-title: High performance bidiagonal reduction using tile algorithm on homogeneous multicore architectures publication-title: ACM Trans. Math. Softw. doi: 10.1145/2450153.2450154 – volume: 43 start-page: 1 issue: 1 year: 2016 ident: 10.1016/j.parco.2019.04.004_bib0037 article-title: A high performance QDWH-SVD solver using hardware accelerators publication-title: ACM Trans. Math. Softw. doi: 10.1145/2894747 – volume: 397 start-page: 35 year: 2005 ident: 10.1016/j.parco.2019.04.004_bib0004 article-title: A new stable bidiagonal reduction algorithm publication-title: Linear Algebra Appl. doi: 10.1016/j.laa.2004.09.019 – volume: 16 start-page: 79 issue: 1 year: 1995 ident: 10.1016/j.parco.2019.04.004_bib0017 article-title: A divide-and-conquer algorithm for the bidiagonal SVD publication-title: SIAM J. Matrix Anal. Appl. doi: 10.1137/S0895479892242232 – volume: 31 start-page: 2700 year: 2010 ident: 10.1016/j.parco.2019.04.004_bib0030 article-title: Optimizing Halley’s iteration for computing the matrix polar decomposition publication-title: SIAM J. Matrix Anal. Appl. doi: 10.1137/090774999 – year: 1997 ident: 10.1016/j.parco.2019.04.004_bib0005 – volume: 39 issue: 3 year: 2013 ident: 10.1016/j.parco.2019.04.004_sbref0022 article-title: High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures publication-title: ACM Trans. Math. Softw. doi: 10.1145/2450153.2450154 – ident: 10.1016/j.parco.2019.04.004_bib0018 – volume: 21 start-page: 2257 year: 2000 ident: 10.1016/j.parco.2019.04.004_bib0035 article-title: Low-rank matrix approximation using the Lanczos bidiagonalization process with applications publication-title: SIAM J. Sci. Comput. doi: 10.1137/S1064827597327309 – start-page: 80 year: 1994 ident: 10.1016/j.parco.2019.04.004_bib0023 article-title: A New Parallel Algorithm for Computing the Singular Value Decomposition – volume: 29 start-page: 312 issue: 2 year: 2018 ident: 10.1016/j.parco.2019.04.004_bib0036 article-title: Asynchronous task-based polar decomposition on single node manycore architectures publication-title: IEEE Trans. Parallel Distrib. Syst. doi: 10.1109/TPDS.2017.2755655 – year: 1993 ident: 10.1016/j.parco.2019.04.004_bib0022 article-title: Parallel Singular Value Decomposition via the Polar Decomposition – volume: 97 start-page: 1 year: 1996 ident: 10.1016/j.parco.2019.04.004_bib0007 article-title: Scalapack: a portable linear algebra library for distributed memory computers-design issues and performance publication-title: Comput. Phys. Commun. doi: 10.1016/0010-4655(96)00017-3 |
| SSID | ssj0006480 |
| Score | 2.2062147 |
| Snippet | This paper introduces a high performance implementation of the Zolo-SVD algorithm on distributed memory systems, which is based on the polar decomposition (PD)... |
| SourceID | crossref elsevier |
| SourceType | Index Database Publisher |
| StartPage | 57 |
| SubjectTerms | Distributed parallel algorithm Polar decomposition QDWH ScaLAPACK Zolotarev |
| Title | A high performance implementation of Zolo-SVD algorithm on distributed memory systems |
| URI | https://dx.doi.org/10.1016/j.parco.2019.04.004 |
| Volume | 86 |
| WOSCitedRecordID | wos000472685800006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1872-7336 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0006480 issn: 0167-8191 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3fT9swELYm2AMv_NhAFNjkh72xoCZ2E-exGkyDB4QETIWXKHHiEtSmVZsg-O9359hpOhCCB16i6qq6ie_z3dm5-46QH4ASLgAZTipT5nAv5o5A0s8AXL0fdyV3hdTNJoLzczEYhBem9mSu2wkERSEeH8Pph6oaZKBsLJ19h7qbQUEAn0HpcAW1w_VNiu8fIgUx8hE3FQH52GaJ2_jwFmyec_n3-DAeDSezvLwb41uDFFl0sQEWRKFjTMF9MkzP83YMexHPsAHLSKejV6X1fZjVo1MDLu-yYjisJgtppbGSNxg61oKbqlBZ3j52wEonYY8dzEkkWFjc7bVNqWjbwpp42njVuiHEM3tdHx3cH01B1ViK6YaaeLbuSLzMjv2f12pyCW2a2n2kB4lwkKjLI80Su-oFvRCM3Wr_9GRw1rhon-uWes0zWDoqnfj37F5eDllaYcjVJlk3-wfar_W-RT5lxReyYXtzUGOqv5LrPkUY0BYM6DIM6ERRCwPawIDCFy0Y0BoG1MBgm1z_Prn69ccxLTQc6fVY6SSe4rBlTaRMVaa6ImGu8r0wgIXLkFkbzHEv4wpWZQzG2PVc8Hayp2Ab6_mCuSnbISvFpMh2CZW-8pkfMjdLJI-ZSCC2Tz2QYik3TGiH_LSzFE1rppToFd10iG9nMjLBXh3ERYCN1364977_2SdrC_gekJVyVmXfyGf5UObz2XcDjH-ln3fG |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+high+performance+implementation+of+Zolo-SVD+algorithm+on+distributed+memory+systems&rft.jtitle=Parallel+computing&rft.au=Li%2C+Shengguo&rft.au=Liu%2C+Jie&rft.au=Du%2C+Yunfei&rft.date=2019-08-01&rft.issn=0167-8191&rft.volume=86&rft.spage=57&rft.epage=65&rft_id=info:doi/10.1016%2Fj.parco.2019.04.004&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_parco_2019_04_004 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-8191&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-8191&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-8191&client=summon |