On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	SC21: International Conference for High Performance Computing, Networking, Storage and Analysis s. 1 - 15
Hlavní autori:	Kwasniewski, Grzegorz, Kabic, Marko, Ben-Nun, Tal, Ziogas, Alexandros Nikolaos, Saethre, Jens Eirik, Gaillard, Andre, Schneider, Timo, Besta, Maciej, Kozhevnikov, Anton, VandeVondele, Joost, Hoefler, Torsten
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	ACM 14.11.2021
Predmet:	Codes communication complexity Distributed linear algebra algorithms Layout Libraries matrix factorization Schedules Scientific computing Supercomputers Three-dimensional displays
ISSN:	2167-4337
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Abstract	Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically communication-optimal 2.5D decomposition. We first establish a theoretical framework for deriving parallel I/O lower bounds for linear algebra kernels, and then utilize its insights to derive Cholesky and LU schedules, both communicating N^{3}/(P\sqrt{M}) elements per processor, where M is the local memory size. The empirical results match our theoretical analysis: our implementations communicate significantly less than Intel MKL, SLATE, and the asymptotically communication-optimal CANDMC and CAPITAL libraries. Our code outperforms these state-of-the-art libraries in almost all tested scenarios, with matrix sizes ranging from 2,048 to 524,288 on up to 512 CPU nodes of the Piz Daint supercomputer, decreasing the time-to-solution by up to three times. Our code is ScaLAPAck-compatible and available as an open-source library.
AbstractList	Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically communication-optimal 2.5D decomposition. We first establish a theoretical framework for deriving parallel I/O lower bounds for linear algebra kernels, and then utilize its insights to derive Cholesky and LU schedules, both communicating N^{3}/(P\sqrt{M}) elements per processor, where M is the local memory size. The empirical results match our theoretical analysis: our implementations communicate significantly less than Intel MKL, SLATE, and the asymptotically communication-optimal CANDMC and CAPITAL libraries. Our code outperforms these state-of-the-art libraries in almost all tested scenarios, with matrix sizes ranging from 2,048 to 524,288 on up to 512 CPU nodes of the Piz Daint supercomputer, decreasing the time-to-solution by up to three times. Our code is ScaLAPAck-compatible and available as an open-source library.
Author	VandeVondele, Joost Ben-Nun, Tal Gaillard, Andre Besta, Maciej Hoefler, Torsten Kabic, Marko Ziogas, Alexandros Nikolaos Saethre, Jens Eirik Schneider, Timo Kwasniewski, Grzegorz Kozhevnikov, Anton
Author_xml	– sequence: 1 givenname: Grzegorz surname: Kwasniewski fullname: Kwasniewski, Grzegorz organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 2 givenname: Marko surname: Kabic fullname: Kabic, Marko organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 3 givenname: Tal surname: Ben-Nun fullname: Ben-Nun, Tal organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 4 givenname: Alexandros Nikolaos surname: Ziogas fullname: Ziogas, Alexandros Nikolaos organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 5 givenname: Jens Eirik surname: Saethre fullname: Saethre, Jens Eirik organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 6 givenname: Andre surname: Gaillard fullname: Gaillard, Andre organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 7 givenname: Timo surname: Schneider fullname: Schneider, Timo organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 8 givenname: Maciej surname: Besta fullname: Besta, Maciej organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 9 givenname: Anton surname: Kozhevnikov fullname: Kozhevnikov, Anton organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 10 givenname: Joost surname: VandeVondele fullname: VandeVondele, Joost organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 11 givenname: Torsten surname: Hoefler fullname: Hoefler, Torsten organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland
BookMark	eNotjjFPwzAUhA0CiVI6M7D4D6S1_ezYYasqChWBMMCGVD1jB4xMUjkeWn49keh0uu9Op7skZ13feUKuOZtzLtUCpDKG6zlIXfJSn5BZpc0YMDBSCn5KJmLEhQTQF2Q2DN-MMWE0B8Em5L3paP7y9AUTxugj3Swa2uxy-MEY8oH2La1D5zHRZfz0NiF99KnzcbilzyMtjlX6hDmFPV3jR-5T-MUc-m64IuctxsHPjjolb-u719VDUTf3m9WyLlBolQt0aL1z3JdojdJGcGXAiUo7VbrRWKhaYC0KK1FXyjGUlrnWGGm0NQAwJTf_u8F7v92l8VE6bKuKMy4A_gAS0lSR
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1145/3458817.3476167
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9781450384421 1450384420
EISSN	2167-4337
EndPage	15
ExternalDocumentID	9910123
Genre	orig-research
GrantInformation_xml	– fundername: Swiss National Science Foundation grantid: 185778 funderid: 10.13039/501100001711 – fundername: European Research Council (ERC) funderid: 10.13039/100010663
GroupedDBID	6IE 6IF 6IH 6IK 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL
ID	FETCH-LOGICAL-a275t-adabedd1e6ab857821583d297d56d215b39f30fa2b4a795d0a4b0df88487b8333
IEDL.DBID	RIE
ISICitedReferencesCount	8
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000946520100083&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:18:42 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a275t-adabedd1e6ab857821583d297d56d215b39f30fa2b4a795d0a4b0df88487b8333
PageCount	15
ParticipantIDs	ieee_primary_9910123
PublicationCentury	2000
PublicationDate	2021-Nov.-14
PublicationDateYYYYMMDD	2021-11-14
PublicationDate_xml	– month: 11 year: 2021 text: 2021-Nov.-14 day: 14
PublicationDecade	2020
PublicationTitle	SC21: International Conference for High Performance Computing, Networking, Storage and Analysis
PublicationTitleAbbrev	SC
PublicationYear	2021
Publisher	ACM
Publisher_xml	– name: ACM
SSID	ssj0002871320 ssj0003204180
Score	1.8967717
Snippet	Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal,...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Codes communication complexity Distributed linear algebra algorithms Layout Libraries matrix factorization Schedules Scientific computing Supercomputers Three-dimensional displays
Title	On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations
URI	https://ieeexplore.ieee.org/document/9910123
WOSCitedRecordID	wos000946520100083&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8MgGP2yLR48Td2Mv8PBo2wt0ALejHHRGLcdNNnBZIECxmR2pm7GP19g3YyJF29Aemgo9HsffO89gHNXcCOkYNhYLTHTRGLFiMVOOuLXUFCwE9Fsgg-HYjKR4wZcbLgw1tpYfGZ7oRnv8s28WIajsr7HMgECNKHJeb7iam3OUwLypzX0CX3fZqlIajWflGV9GkiZKe9R5lP3_LedSowmg_b_3mMHuj-0PDTeBJxdaNhyD9prXwZUb9MOPI9K5HEdGqsqOKXM0F1_hEb-3_AWQTeaO-RTUL_E0dXsJVwco3tblT5IXqKhH8X1o-ghyPd_oUG05FnzNbvwNLh5vL7FtYsCVoRnC6yM0taY1OZKiyBen2aCGiK5yXLjO5pKRxOniGaKy8wkiunEOCF8KqMFpXQfWuW8tAeAtE6TwiaFz6g1KzRRHlyRRCuXEr_NrT6ETpis6ftKKGNaz9PR38PHsE1CgUioqWMn0FpUS3sKW8Xn4vWjOotf9xvvzaRx
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwGP2CaKInVDD-tgePFra2Y603YyQQYHDAhIMJadfWmOAwE4x_vu0cGBMv3tpmh6Vr972v_d57ANc2jTUXnGFtlMBMEYElIwZbYYlbQ17BjhdmE3GS8OlUjCtws-HCGGOK4jPT9M3iLl8v0pU_Kms5LOMhwBZse-eskq21OVHx2J-W4Mf3XZuFPCj1fEIWtainZYZxkzKXvLd_G6oU8aRT-9-b7EPjh5iHxpuQcwAVkx1Cbe3MgMqNWoenUYYcskNjmXuvlDnqtUZo5P4OrwXsRguLXBLqFjm6mz_7q2PUN3nmwuQtStwoLh9FQy_g_4k6hSnPmrHZgMfOw-S-i0sfBSxJHC2x1FIZrUPTlop7-fow4lQTEeuorV1HUWFpYCVRTMYi0oFkKtCWc5fMKE4pPYJqtsjMMSClwiA1QepyasVSRaSDVyRQ0obEbXSjTqDuJ2v29i2VMSvn6fTv4SvY7U6Gg9mgl_TPYI_4chFfYcfOobrMV-YCdtKP5ct7fll86S-xP6e6
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC21%3A+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=On+the+Parallel+I%2FO+Optimality+of+Linear+Algebra+Kernels%3A+Near-Optimal+Matrix+Factorizations&rft.au=Kwasniewski%2C+Grzegorz&rft.au=Kabic%2C+Marko&rft.au=Ben-Nun%2C+Tal&rft.au=Ziogas%2C+Alexandros+Nikolaos&rft.date=2021-11-14&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=15&rft_id=info:doi/10.1145%2F3458817.3476167&rft.externalDocID=9910123