A high performance implementation of Zolo-SVD algorithm on distributed memory systems

This paper introduces a high performance implementation of the Zolo-SVD algorithm on distributed memory systems, which is based on the polar decomposition (PD) algorithm via the Zolotarev’s function (Zolo-PD), originally proposed by Nakatsukasa and Freund [SIAM Review, 2016]. Our implementation high...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Parallel computing Ročník 86; s. 57 - 65
Hlavní autoři: Li, Shengguo, Liu, Jie, Du, Yunfei
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.08.2019
Témata:
ISSN:0167-8191, 1872-7336
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:This paper introduces a high performance implementation of the Zolo-SVD algorithm on distributed memory systems, which is based on the polar decomposition (PD) algorithm via the Zolotarev’s function (Zolo-PD), originally proposed by Nakatsukasa and Freund [SIAM Review, 2016]. Our implementation highly relies on the routines of ScaLAPACK and therefore it is portable. Compared with the other PD algorithms such as the QR-based dynamically weighted Halley method (QDWH-PD), Zolo-PD is naturally parallelizable and has better scalability though performs more floating-point operations. When using many processors, Zolo-PD is usually 1.20 times faster than the QDWH-PD algorithm, and Zolo-SVD can be about two times faster than the ScaLAPACK routine PDGESVD. These numerical experiments are performed on Tianhe-2A supercomputer, one of the fastest supercomputers in the world, and the tested matrices include some sparse matrices from particular applications and some randomly generated dense matrices with different dimensions. Our QDWH-SVD and Zolo-SVD implementations are freely available at https://github.com/shengguolsg/Zolo-SVD.
ISSN:0167-8191
1872-7336
DOI:10.1016/j.parco.2019.04.004