A high performance implementation of Zolo-SVD algorithm on distributed memory systems

This paper introduces a high performance implementation of the Zolo-SVD algorithm on distributed memory systems, which is based on the polar decomposition (PD) algorithm via the Zolotarev’s function (Zolo-PD), originally proposed by Nakatsukasa and Freund [SIAM Review, 2016]. Our implementation high...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Parallel computing Ročník 86; s. 57 - 65
Hlavní autoři: Li, Shengguo, Liu, Jie, Du, Yunfei
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.08.2019
Témata:
ISSN:0167-8191, 1872-7336
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract This paper introduces a high performance implementation of the Zolo-SVD algorithm on distributed memory systems, which is based on the polar decomposition (PD) algorithm via the Zolotarev’s function (Zolo-PD), originally proposed by Nakatsukasa and Freund [SIAM Review, 2016]. Our implementation highly relies on the routines of ScaLAPACK and therefore it is portable. Compared with the other PD algorithms such as the QR-based dynamically weighted Halley method (QDWH-PD), Zolo-PD is naturally parallelizable and has better scalability though performs more floating-point operations. When using many processors, Zolo-PD is usually 1.20 times faster than the QDWH-PD algorithm, and Zolo-SVD can be about two times faster than the ScaLAPACK routine PDGESVD. These numerical experiments are performed on Tianhe-2A supercomputer, one of the fastest supercomputers in the world, and the tested matrices include some sparse matrices from particular applications and some randomly generated dense matrices with different dimensions. Our QDWH-SVD and Zolo-SVD implementations are freely available at https://github.com/shengguolsg/Zolo-SVD.
AbstractList This paper introduces a high performance implementation of the Zolo-SVD algorithm on distributed memory systems, which is based on the polar decomposition (PD) algorithm via the Zolotarev’s function (Zolo-PD), originally proposed by Nakatsukasa and Freund [SIAM Review, 2016]. Our implementation highly relies on the routines of ScaLAPACK and therefore it is portable. Compared with the other PD algorithms such as the QR-based dynamically weighted Halley method (QDWH-PD), Zolo-PD is naturally parallelizable and has better scalability though performs more floating-point operations. When using many processors, Zolo-PD is usually 1.20 times faster than the QDWH-PD algorithm, and Zolo-SVD can be about two times faster than the ScaLAPACK routine PDGESVD. These numerical experiments are performed on Tianhe-2A supercomputer, one of the fastest supercomputers in the world, and the tested matrices include some sparse matrices from particular applications and some randomly generated dense matrices with different dimensions. Our QDWH-SVD and Zolo-SVD implementations are freely available at https://github.com/shengguolsg/Zolo-SVD.
Author Li, Shengguo
Liu, Jie
Du, Yunfei
Author_xml – sequence: 1
  givenname: Shengguo
  surname: Li
  fullname: Li, Shengguo
  email: nudtlsg@gmail.com
  organization: College of Computer Science, National University of Defense Technology, Changsha 410073, China
– sequence: 2
  givenname: Jie
  surname: Liu
  fullname: Liu, Jie
  organization: College of Computer Science, National University of Defense Technology, Changsha 410073, China
– sequence: 3
  givenname: Yunfei
  surname: Du
  fullname: Du, Yunfei
  email: duyunfei@mail.sysu.edu.cn
  organization: Department of Computer Science, Sun Yat-Sen University, Guangzhou 510006, China
BookMark eNp9kMtOwzAQRS0EEm3hC9j4BxL8SJxkwaIqT6kSCygLNpZjj1tXcRzZAal_T0pZs5rFzBnde-bovA89IHRDSU4JFbf7fFBRh5wR2uSkyAkpztCM1hXLKs7FOZpNV1VW04ZeonlKe0KIKGoyQ5sl3rntDg8QbYhe9Rqw80MHHvpRjS70OFj8GbqQvX3cY9VtQ3TjzuNpYVwao2u_RjDYgw_xgNMhjeDTFbqwqktw_TcXaPP48L56ztavTy-r5TrTrORj1jJbCCZarY0FS-qWUytYU4HWnJu2olVTQmGFIqrklDJqDOjSlkXNRM2p4QvET391DClFsHKIzqt4kJTIoxm5l79m5NGMJIWczEzU3YmCKdq3gyiTdjA1Ny6CHqUJ7l_-BzY-cP8
Cites_doi 10.1007/BF02287921
10.1145/2049662.2049663
10.1137/120876605
10.1016/j.parco.2011.05.002
10.1137/090771806
10.1137/1.9781611971446
10.1137/080731992
10.1137/16M1058467
10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
10.1016/S0024-3795(01)00569-9
10.1016/S0167-8191(99)00041-1
10.1137/050628301
10.1137/050636723
10.1137/0911052
10.1137/S1064827597329266
10.1145/2450153.2450154
10.1145/2894747
10.1016/j.laa.2004.09.019
10.1137/S0895479892242232
10.1137/090774999
10.1137/S1064827597327309
10.1109/TPDS.2017.2755655
10.1016/0010-4655(96)00017-3
ContentType Journal Article
Copyright 2019 Elsevier B.V.
Copyright_xml – notice: 2019 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.parco.2019.04.004
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-7336
EndPage 65
ExternalDocumentID 10_1016_j_parco_2019_04_004
S0167819118301807
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
123
1B1
1~.
1~5
29O
4.4
457
4G.
5VS
6OB
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABEFU
ABFNM
ABJNI
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
KOM
LG9
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SCC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
WH7
WUQ
XPP
ZMT
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c253t-b2f4626bccdfef08b31f6297ecc33db71795e4f6a0a531121ddec5f54826831d3
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000472685800006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0167-8191
IngestDate Sat Nov 29 04:06:56 EST 2025
Fri Feb 23 02:29:26 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords QDWH
Zolotarev
ScaLAPACK
Polar decomposition
Distributed parallel algorithm
68W10
65F15
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c253t-b2f4626bccdfef08b31f6297ecc33db71795e4f6a0a531121ddec5f54826831d3
PageCount 9
ParticipantIDs crossref_primary_10_1016_j_parco_2019_04_004
elsevier_sciencedirect_doi_10_1016_j_parco_2019_04_004
PublicationCentury 2000
PublicationDate August 2019
2019-08-00
PublicationDateYYYYMMDD 2019-08-01
PublicationDate_xml – month: 08
  year: 2019
  text: August 2019
PublicationDecade 2010
PublicationTitle Parallel computing
PublicationYear 2019
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Demmel, Kahan (bib0013) 1990; 11
Nakatsukasa, Higham (bib0033) 2013; 35
Zha, Simon (bib0040) 1999; 21
Sukkari, Ltaief, Keyes (bib0037) 2016; 43
M. Faverge, J. Langou, Y. Robert, J. Dongarra, Bidiagonalization with parallel tiled algorithms, 2016, ArXiv: 1611.06892v1.
Hotelling (bib0024) 1935; 1
Halko, Martinsson, Tropp (bib0020) 2011; 53
Nakatsukasa, Bai, Gygi (bib0030) 2010; 31
Akhiezer. (bib0001) 1990
Gu, Eisenstat (bib0017) 1995; 16
Davis, Hu (bib0009) 2011; 38
Simon, Zha (bib0035) 2000; 21
H. Ltaief, D. Sukkari, A. Esposito, Y. Nakatsukasa, D. Keyes, Massively parallel polar decomposition on distributed-memory systems. submitted to ACM transactions on parallel computing under revision, 2018
Demmel, Grigori, Hoemmen, Langou (bib0012) 2012; 34
Auckenthaler, Blum, Bungartz, Huckle, Johanni, Krämer, Lang, Lederer, Willems (bib0002) 2011; 37
Barlow, Bosner, Drmač (bib0004) 2005; 397
Großer, Lang (bib0014) 1999; 25
Iwen, Ong (bib0025) 2016; 37
Blackford, Choi, Cleary, D’Azevedo, Demmel, Dhillon, Dongarra, Hammarling, Henry, Petitet, Stanley, Walker, Whaley (bib0005) 1997
Deerwester, Dumais, Furnas, Landauer, Harshman (bib0010) 1990; 41
J. Demmel, Applied numerical linear algebra, 1997, SIAM, Philadelphia.
Higham, Papadimitriou (bib0023) 1994
Choi, Demmel, Dhillon, Dongarra, Ostrouchov, Petitet, Stanley, Walker, Whaley (bib0007) 1996; 97
Ltaif, Luszczek, Dongarra (bib0027) 2013; 39
Nakatsukasa, Higham (bib0032) 2013; 35
D. Sukkari, H. Ltaief, A. Esposito, D. Keyes, A QDWH-based SVD software framework on distributed-memory manycore systems. submitted to ACM transactions on mathematical software (under revision, 2017
Willems, Lang, Vömel (bib0039) 2006; 28
Nakatsukasa, Freund (bib0031) 2016; xx(x)
Haidar, Luszczek, Kurzak, Dongarra (bib0019) 2013
Higham (bib0021) 2008
.
Golub, Kahan (bib0016) 1965; 2
Ralha (bib0034) 2003; 358
Sukkari, Ltaief, Keyes (bib0038) 2016
Moore (bib0029) 1981; 1
Ltaief, Luszczek, Dongarra (bib0026) 2013; 39
Bosner, Barlow (bib0006) 2007; 29
Higham, Papadimitriou (bib0022) 1993
Ballard, Demmel, Holtz, Schwartz (bib0003) 2011; 21
Sukkari, Ltaief, Faverge, Keyes (bib0036) 2018; 29
Marek, Blum, Johanni, Havu, Lang, Auckenthaler, Heinecke, Bungartz, Lederer (bib0028) 2014; 26
Gu (10.1016/j.parco.2019.04.004_bib0017) 1995; 16
Hotelling (10.1016/j.parco.2019.04.004_bib0024) 1935; 1
Großer (10.1016/j.parco.2019.04.004_bib0014) 1999; 25
Marek (10.1016/j.parco.2019.04.004_bib0028) 2014; 26
Sukkari (10.1016/j.parco.2019.04.004_bib0036) 2018; 29
Sukkari (10.1016/j.parco.2019.04.004_bib0038) 2016
Auckenthaler (10.1016/j.parco.2019.04.004_bib0002) 2011; 37
10.1016/j.parco.2019.04.004_bib0018
Blackford (10.1016/j.parco.2019.04.004_bib0005) 1997
10.1016/j.parco.2019.04.004_bib0015
Iwen (10.1016/j.parco.2019.04.004_bib0025) 2016; 37
Zha (10.1016/j.parco.2019.04.004_bib0040) 1999; 21
Higham (10.1016/j.parco.2019.04.004_bib0022) 1993
10.1016/j.parco.2019.04.004_bib0011
Nakatsukasa (10.1016/j.parco.2019.04.004_bib0032) 2013; 35
Higham (10.1016/j.parco.2019.04.004_bib0023) 1994
Haidar (10.1016/j.parco.2019.04.004_bib0019) 2013
Willems (10.1016/j.parco.2019.04.004_bib0039) 2006; 28
Choi (10.1016/j.parco.2019.04.004_bib0007) 1996; 97
Nakatsukasa (10.1016/j.parco.2019.04.004_bib0030) 2010; 31
Bosner (10.1016/j.parco.2019.04.004_bib0006) 2007; 29
Nakatsukasa (10.1016/j.parco.2019.04.004_bib0031) 2016; xx(x)
Ralha (10.1016/j.parco.2019.04.004_bib0034) 2003; 358
Barlow (10.1016/j.parco.2019.04.004_bib0004) 2005; 397
Davis (10.1016/j.parco.2019.04.004_sbref0008) 2011; 38
Halko (10.1016/j.parco.2019.04.004_bib0020) 2011; 53
Ltaif (10.1016/j.parco.2019.04.004_sbref0023) 2013; 39
10.1016/j.parco.2019.04.004_bib0008
Higham (10.1016/j.parco.2019.04.004_bib0021) 2008
Moore (10.1016/j.parco.2019.04.004_sbref0025) 1981; 1
Golub (10.1016/j.parco.2019.04.004_bib0016) 1965; 2
Deerwester (10.1016/j.parco.2019.04.004_bib0010) 1990; 41
Demmel (10.1016/j.parco.2019.04.004_bib0012) 2012; 34
Ballard (10.1016/j.parco.2019.04.004_bib0003) 2011; 21
Demmel (10.1016/j.parco.2019.04.004_bib0013) 1990; 11
Simon (10.1016/j.parco.2019.04.004_bib0035) 2000; 21
Sukkari (10.1016/j.parco.2019.04.004_bib0037) 2016; 43
Akhiezer. (10.1016/j.parco.2019.04.004_bib0001) 1990
Nakatsukasa (10.1016/j.parco.2019.04.004_bib0033) 2013; 35
Ltaief (10.1016/j.parco.2019.04.004_sbref0022) 2013; 39
References_xml – volume: 38
  year: 2011
  ident: bib0009
  article-title: The Univeristy of Florida sparse matrix collection
  publication-title: ACM Trans. Math. Softw.
– start-page: 80
  year: 1994
  end-page: 84
  ident: bib0023
  article-title: A New Parallel Algorithm for Computing the Singular Value Decomposition
  publication-title: The Fifth SIAM Conference on Applied Linear Algebra
– volume: 397
  start-page: 35
  year: 2005
  end-page: 84
  ident: bib0004
  article-title: A new stable bidiagonal reduction algorithm
  publication-title: Linear Algebra Appl.
– volume: 11
  start-page: 873
  year: 1990
  end-page: 912
  ident: bib0013
  article-title: Accurate singular values of bidiagonal matrices
  publication-title: SIAM J. Sci. Comput.
– reference: M. Faverge, J. Langou, Y. Robert, J. Dongarra, Bidiagonalization with parallel tiled algorithms, 2016, ArXiv: 1611.06892v1.
– volume: 34
  start-page: A206
  year: 2012
  end-page: A239
  ident: bib0012
  article-title: Communication-optimal parallel and sequential QR and LU factorizations
  publication-title: SIAM J. Sci. Comput.
– year: 1997
  ident: bib0005
  article-title: ScaLAPACK Users’ Guide
– volume: 39
  year: 2013
  ident: bib0026
  article-title: High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures
  publication-title: ACM Trans. Math. Softw.
– volume: 29
  start-page: 927
  year: 2007
  end-page: 953
  ident: bib0006
  article-title: Block and parallel versions of one-sided bidiagonalization
  publication-title: SIAM J. Matrix Anal. Appl.
– volume: 29
  start-page: 312
  year: 2018
  end-page: 323
  ident: bib0036
  article-title: Asynchronous task-based polar decomposition on single node manycore architectures
  publication-title: IEEE Trans. Parallel Distrib. Syst.
– volume: 37
  start-page: 783
  year: 2011
  end-page: 794
  ident: bib0002
  article-title: Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations
  publication-title: Parallel Comput.
– volume: 21
  start-page: 782
  year: 1999
  end-page: 791
  ident: bib0040
  article-title: On updating problems in latent semantic indexing
  publication-title: SIAM J. Sci. Comput.
– volume: 21
  start-page: 2257
  year: 2000
  end-page: 2274
  ident: bib0035
  article-title: Low-rank matrix approximation using the Lanczos bidiagonalization process with applications
  publication-title: SIAM J. Sci. Comput.
– volume: 39
  start-page: 1
  year: 2013
  end-page: 22
  ident: bib0027
  article-title: High performance bidiagonal reduction using tile algorithm on homogeneous multicore architectures
  publication-title: ACM Trans. Math. Softw.
– volume: 25
  start-page: 969
  year: 1999
  end-page: 986
  ident: bib0014
  article-title: Efficient parallel reduction to bidiagonal form
  publication-title: Parallel Comput.
– volume: 16
  start-page: 79
  year: 1995
  end-page: 92
  ident: bib0017
  article-title: A divide-and-conquer algorithm for the bidiagonal SVD
  publication-title: SIAM J. Matrix Anal. Appl.
– volume: 35
  start-page: A1325
  year: 2013
  end-page: A1349
  ident: bib0033
  article-title: Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD
  publication-title: SIAM J. Sci. Comput.
– reference: J. Demmel, Applied numerical linear algebra, 1997, SIAM, Philadelphia.
– reference: H. Ltaief, D. Sukkari, A. Esposito, Y. Nakatsukasa, D. Keyes, Massively parallel polar decomposition on distributed-memory systems. submitted to ACM transactions on parallel computing under revision, 2018,
– volume: 358
  start-page: 219
  year: 2003
  end-page: 238
  ident: bib0034
  article-title: One-sided reduction to bidiagonal form
  publication-title: Linear Algebra Appl.
– volume: 97
  start-page: 1
  year: 1996
  end-page: 15
  ident: bib0007
  article-title: Scalapack: a portable linear algebra library for distributed memory computers-design issues and performance
  publication-title: Comput. Phys. Commun.
– start-page: 605
  year: 2016
  end-page: 616
  ident: bib0038
  article-title: High Performance Polar Decomposition on Distributed Memory Systems
  publication-title: Euro-Par 2016, LNCS, volume 9833
– year: 2008
  ident: bib0021
  article-title: Functions of Matrices: Theory and Computation
– volume: 37
  start-page: 1699
  year: 2016
  end-page: 1718
  ident: bib0025
  article-title: A distributed and incremental SVD algorithm for agglomerative data analysis on large networks
  publication-title: SIAM J. Matrix Anal. Appl.
– year: 1990
  ident: bib0001
  article-title: Elements of the Theory of Elliptic Functions
– volume: 26
  start-page: 1
  year: 2014
  end-page: 15
  ident: bib0028
  article-title: The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science
  publication-title: J. Phys.: Condens. Matter
– volume: 41
  start-page: 391
  year: 1990
  end-page: 407
  ident: bib0010
  article-title: Indexing by latent semantic analysis
  publication-title: J. Soc. Inf. Sci.
– year: 2013
  ident: bib0019
  article-title: An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware
  publication-title: In William Gropp, editor,
– reference: .
– year: 1993
  ident: bib0022
  article-title: Parallel Singular Value Decomposition via the Polar Decomposition
  publication-title: Technical Report Numerical Analysis Report 239, Manchester Centre for Computational Mathematics, Manchester, England
– volume: 21
  start-page: 562
  year: 2011
  end-page: 580
  ident: bib0003
  article-title: Minimizing communication in numerical linear algebra
  publication-title: SIAM J. Matrix Anal. Appl.
– volume: 53
  start-page: 217
  year: 2011
  end-page: 288
  ident: bib0020
  article-title: Finding structure with randomness probabilistic algorithms for constructing approximate matrix decompositions
  publication-title: SIAM Rev.
– volume: xx(x)
  start-page: xx
  year: 2016
  end-page: xxx
  ident: bib0031
  article-title: Computing fundamental matrix decompositions accurately via the matrix sign function in two iterations: the power of Zolotarev’s functions
  publication-title: SIAM Rev.
– volume: 43
  start-page: 1
  year: 2016
  end-page: 25
  ident: bib0037
  article-title: A high performance QDWH-SVD solver using hardware accelerators
  publication-title: ACM Trans. Math. Softw.
– volume: 28
  start-page: 907
  year: 2006
  end-page: 926
  ident: bib0039
  article-title: Computing the bidiagonal SVD using multiple relatively robust representations
  publication-title: SIAM J. Matrix Anal. Appl.
– volume: 31
  start-page: 2700
  year: 2010
  end-page: 2720
  ident: bib0030
  article-title: Optimizing Halley’s iteration for computing the matrix polar decomposition
  publication-title: SIAM J. Matrix Anal. Appl.
– volume: 1
  start-page: 27
  year: 1935
  end-page: 35
  ident: bib0024
  article-title: Simplified calculation of principal components
  publication-title: Psychometrica
– volume: 1
  year: 1981
  ident: bib0029
  article-title: Principal component analysis in linear systems: controllability, observability, and model reduction
  publication-title: IEEE Trans. Autom. Control
– reference: D. Sukkari, H. Ltaief, A. Esposito, D. Keyes, A QDWH-based SVD software framework on distributed-memory manycore systems. submitted to ACM transactions on mathematical software (under revision, 2017,
– volume: 35
  start-page: A1325
  year: 2013
  end-page: A1349
  ident: bib0032
  article-title: Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD
  publication-title: SIAM J. Sci. Comput.
– volume: 2
  start-page: 205
  year: 1965
  end-page: 224
  ident: bib0016
  article-title: Calculating the singular values and pseudo-inverse of a matrix
  publication-title: SIAM J. Numer. Anal.
– volume: 1
  start-page: 27
  year: 1935
  ident: 10.1016/j.parco.2019.04.004_bib0024
  article-title: Simplified calculation of principal components
  publication-title: Psychometrica
  doi: 10.1007/BF02287921
– volume: 38
  issue: 1
  year: 2011
  ident: 10.1016/j.parco.2019.04.004_sbref0008
  article-title: The Univeristy of Florida sparse matrix collection
  publication-title: ACM Trans. Math. Softw.
  doi: 10.1145/2049662.2049663
– volume: 35
  start-page: A1325
  issue: 3
  year: 2013
  ident: 10.1016/j.parco.2019.04.004_bib0032
  article-title: Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/120876605
– volume: xx(x)
  start-page: xx
  year: 2016
  ident: 10.1016/j.parco.2019.04.004_bib0031
  article-title: Computing fundamental matrix decompositions accurately via the matrix sign function in two iterations: the power of Zolotarev’s functions
  publication-title: SIAM Rev.
– volume: 26
  start-page: 1
  year: 2014
  ident: 10.1016/j.parco.2019.04.004_bib0028
  article-title: The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science
  publication-title: J. Phys.: Condens. Matter
– volume: 37
  start-page: 783
  issue: 12
  year: 2011
  ident: 10.1016/j.parco.2019.04.004_bib0002
  article-title: Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations
  publication-title: Parallel Comput.
  doi: 10.1016/j.parco.2011.05.002
– volume: 35
  start-page: A1325
  issue: 3
  year: 2013
  ident: 10.1016/j.parco.2019.04.004_bib0033
  article-title: Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/120876605
– volume: 2
  start-page: 205
  year: 1965
  ident: 10.1016/j.parco.2019.04.004_bib0016
  article-title: Calculating the singular values and pseudo-inverse of a matrix
  publication-title: SIAM J. Numer. Anal.
– volume: 53
  start-page: 217
  year: 2011
  ident: 10.1016/j.parco.2019.04.004_bib0020
  article-title: Finding structure with randomness probabilistic algorithms for constructing approximate matrix decompositions
  publication-title: SIAM Rev.
  doi: 10.1137/090771806
– ident: 10.1016/j.parco.2019.04.004_bib0011
  doi: 10.1137/1.9781611971446
– volume: 34
  start-page: A206
  year: 2012
  ident: 10.1016/j.parco.2019.04.004_bib0012
  article-title: Communication-optimal parallel and sequential QR and LU factorizations
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/080731992
– volume: 37
  start-page: 1699
  issue: 4
  year: 2016
  ident: 10.1016/j.parco.2019.04.004_bib0025
  article-title: A distributed and incremental SVD algorithm for agglomerative data analysis on large networks
  publication-title: SIAM J. Matrix Anal. Appl.
  doi: 10.1137/16M1058467
– ident: 10.1016/j.parco.2019.04.004_bib0015
– start-page: 605
  year: 2016
  ident: 10.1016/j.parco.2019.04.004_bib0038
  article-title: High Performance Polar Decomposition on Distributed Memory Systems
– volume: 41
  start-page: 391
  year: 1990
  ident: 10.1016/j.parco.2019.04.004_bib0010
  article-title: Indexing by latent semantic analysis
  publication-title: J. Soc. Inf. Sci.
  doi: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
– volume: 1
  year: 1981
  ident: 10.1016/j.parco.2019.04.004_sbref0025
  article-title: Principal component analysis in linear systems: controllability, observability, and model reduction
  publication-title: IEEE Trans. Autom. Control
– volume: 358
  start-page: 219
  year: 2003
  ident: 10.1016/j.parco.2019.04.004_bib0034
  article-title: One-sided reduction to bidiagonal form
  publication-title: Linear Algebra Appl.
  doi: 10.1016/S0024-3795(01)00569-9
– volume: 25
  start-page: 969
  year: 1999
  ident: 10.1016/j.parco.2019.04.004_bib0014
  article-title: Efficient parallel reduction to bidiagonal form
  publication-title: Parallel Comput.
  doi: 10.1016/S0167-8191(99)00041-1
– volume: 28
  start-page: 907
  issue: 4
  year: 2006
  ident: 10.1016/j.parco.2019.04.004_bib0039
  article-title: Computing the bidiagonal SVD using multiple relatively robust representations
  publication-title: SIAM J. Matrix Anal. Appl.
  doi: 10.1137/050628301
– ident: 10.1016/j.parco.2019.04.004_bib0008
– volume: 29
  start-page: 927
  issue: 3
  year: 2007
  ident: 10.1016/j.parco.2019.04.004_bib0006
  article-title: Block and parallel versions of one-sided bidiagonalization
  publication-title: SIAM J. Matrix Anal. Appl.
  doi: 10.1137/050636723
– volume: 11
  start-page: 873
  year: 1990
  ident: 10.1016/j.parco.2019.04.004_bib0013
  article-title: Accurate singular values of bidiagonal matrices
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/0911052
– year: 2008
  ident: 10.1016/j.parco.2019.04.004_bib0021
– volume: 21
  start-page: 562
  issue: 2
  year: 2011
  ident: 10.1016/j.parco.2019.04.004_bib0003
  article-title: Minimizing communication in numerical linear algebra
  publication-title: SIAM J. Matrix Anal. Appl.
– volume: 21
  start-page: 782
  year: 1999
  ident: 10.1016/j.parco.2019.04.004_bib0040
  article-title: On updating problems in latent semantic indexing
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/S1064827597329266
– year: 1990
  ident: 10.1016/j.parco.2019.04.004_bib0001
– year: 2013
  ident: 10.1016/j.parco.2019.04.004_bib0019
  article-title: An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware
– volume: 39
  start-page: 1
  issue: 3
  year: 2013
  ident: 10.1016/j.parco.2019.04.004_sbref0023
  article-title: High performance bidiagonal reduction using tile algorithm on homogeneous multicore architectures
  publication-title: ACM Trans. Math. Softw.
  doi: 10.1145/2450153.2450154
– volume: 43
  start-page: 1
  issue: 1
  year: 2016
  ident: 10.1016/j.parco.2019.04.004_bib0037
  article-title: A high performance QDWH-SVD solver using hardware accelerators
  publication-title: ACM Trans. Math. Softw.
  doi: 10.1145/2894747
– volume: 397
  start-page: 35
  year: 2005
  ident: 10.1016/j.parco.2019.04.004_bib0004
  article-title: A new stable bidiagonal reduction algorithm
  publication-title: Linear Algebra Appl.
  doi: 10.1016/j.laa.2004.09.019
– volume: 16
  start-page: 79
  issue: 1
  year: 1995
  ident: 10.1016/j.parco.2019.04.004_bib0017
  article-title: A divide-and-conquer algorithm for the bidiagonal SVD
  publication-title: SIAM J. Matrix Anal. Appl.
  doi: 10.1137/S0895479892242232
– volume: 31
  start-page: 2700
  year: 2010
  ident: 10.1016/j.parco.2019.04.004_bib0030
  article-title: Optimizing Halley’s iteration for computing the matrix polar decomposition
  publication-title: SIAM J. Matrix Anal. Appl.
  doi: 10.1137/090774999
– year: 1997
  ident: 10.1016/j.parco.2019.04.004_bib0005
– volume: 39
  issue: 3
  year: 2013
  ident: 10.1016/j.parco.2019.04.004_sbref0022
  article-title: High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures
  publication-title: ACM Trans. Math. Softw.
  doi: 10.1145/2450153.2450154
– ident: 10.1016/j.parco.2019.04.004_bib0018
– volume: 21
  start-page: 2257
  year: 2000
  ident: 10.1016/j.parco.2019.04.004_bib0035
  article-title: Low-rank matrix approximation using the Lanczos bidiagonalization process with applications
  publication-title: SIAM J. Sci. Comput.
  doi: 10.1137/S1064827597327309
– start-page: 80
  year: 1994
  ident: 10.1016/j.parco.2019.04.004_bib0023
  article-title: A New Parallel Algorithm for Computing the Singular Value Decomposition
– volume: 29
  start-page: 312
  issue: 2
  year: 2018
  ident: 10.1016/j.parco.2019.04.004_bib0036
  article-title: Asynchronous task-based polar decomposition on single node manycore architectures
  publication-title: IEEE Trans. Parallel Distrib. Syst.
  doi: 10.1109/TPDS.2017.2755655
– year: 1993
  ident: 10.1016/j.parco.2019.04.004_bib0022
  article-title: Parallel Singular Value Decomposition via the Polar Decomposition
– volume: 97
  start-page: 1
  year: 1996
  ident: 10.1016/j.parco.2019.04.004_bib0007
  article-title: Scalapack: a portable linear algebra library for distributed memory computers-design issues and performance
  publication-title: Comput. Phys. Commun.
  doi: 10.1016/0010-4655(96)00017-3
SSID ssj0006480
Score 2.2062147
Snippet This paper introduces a high performance implementation of the Zolo-SVD algorithm on distributed memory systems, which is based on the polar decomposition (PD)...
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 57
SubjectTerms Distributed parallel algorithm
Polar decomposition
QDWH
ScaLAPACK
Zolotarev
Title A high performance implementation of Zolo-SVD algorithm on distributed memory systems
URI https://dx.doi.org/10.1016/j.parco.2019.04.004
Volume 86
WOSCitedRecordID wos000472685800006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-7336
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0006480
  issn: 0167-8191
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3fT9swELYm2AMv_NhAFNjkh72xoCZ2E-exGkyDB4QETIWXKHHiEtSmVZsg-O9359hpOhCCB16i6qq6ie_z3dm5-46QH4ASLgAZTipT5nAv5o5A0s8AXL0fdyV3hdTNJoLzczEYhBem9mSu2wkERSEeH8Pph6oaZKBsLJ19h7qbQUEAn0HpcAW1w_VNiu8fIgUx8hE3FQH52GaJ2_jwFmyec_n3-DAeDSezvLwb41uDFFl0sQEWRKFjTMF9MkzP83YMexHPsAHLSKejV6X1fZjVo1MDLu-yYjisJgtppbGSNxg61oKbqlBZ3j52wEonYY8dzEkkWFjc7bVNqWjbwpp42njVuiHEM3tdHx3cH01B1ViK6YaaeLbuSLzMjv2f12pyCW2a2n2kB4lwkKjLI80Su-oFvRCM3Wr_9GRw1rhon-uWes0zWDoqnfj37F5eDllaYcjVJlk3-wfar_W-RT5lxReyYXtzUGOqv5LrPkUY0BYM6DIM6ERRCwPawIDCFy0Y0BoG1MBgm1z_Prn69ccxLTQc6fVY6SSe4rBlTaRMVaa6ImGu8r0wgIXLkFkbzHEv4wpWZQzG2PVc8Hayp2Ab6_mCuSnbISvFpMh2CZW-8pkfMjdLJI-ZSCC2Tz2QYik3TGiH_LSzFE1rppToFd10iG9nMjLBXh3ERYCN1364977_2SdrC_gekJVyVmXfyGf5UObz2XcDjH-ln3fG
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+high+performance+implementation+of+Zolo-SVD+algorithm+on+distributed+memory+systems&rft.jtitle=Parallel+computing&rft.au=Li%2C+Shengguo&rft.au=Liu%2C+Jie&rft.au=Du%2C+Yunfei&rft.date=2019-08-01&rft.issn=0167-8191&rft.volume=86&rft.spage=57&rft.epage=65&rft_id=info:doi/10.1016%2Fj.parco.2019.04.004&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_parco_2019_04_004
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-8191&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-8191&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-8191&client=summon