On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:SC21: International Conference for High Performance Computing, Networking, Storage and Analysis s. 1 - 15
Hlavní autori: Kwasniewski, Grzegorz, Kabic, Marko, Ben-Nun, Tal, Ziogas, Alexandros Nikolaos, Saethre, Jens Eirik, Gaillard, Andre, Schneider, Timo, Besta, Maciej, Kozhevnikov, Anton, VandeVondele, Joost, Hoefler, Torsten
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: ACM 14.11.2021
Predmet:
ISSN:2167-4337
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically communication-optimal 2.5D decomposition. We first establish a theoretical framework for deriving parallel I/O lower bounds for linear algebra kernels, and then utilize its insights to derive Cholesky and LU schedules, both communicating N^{3}/(P\sqrt{M}) elements per processor, where M is the local memory size. The empirical results match our theoretical analysis: our implementations communicate significantly less than Intel MKL, SLATE, and the asymptotically communication-optimal CANDMC and CAPITAL libraries. Our code outperforms these state-of-the-art libraries in almost all tested scenarios, with matrix sizes ranging from 2,048 to 524,288 on up to 512 CPU nodes of the Piz Daint supercomputer, decreasing the time-to-solution by up to three times. Our code is ScaLAPAck-compatible and available as an open-source library.
AbstractList Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically communication-optimal 2.5D decomposition. We first establish a theoretical framework for deriving parallel I/O lower bounds for linear algebra kernels, and then utilize its insights to derive Cholesky and LU schedules, both communicating N^{3}/(P\sqrt{M}) elements per processor, where M is the local memory size. The empirical results match our theoretical analysis: our implementations communicate significantly less than Intel MKL, SLATE, and the asymptotically communication-optimal CANDMC and CAPITAL libraries. Our code outperforms these state-of-the-art libraries in almost all tested scenarios, with matrix sizes ranging from 2,048 to 524,288 on up to 512 CPU nodes of the Piz Daint supercomputer, decreasing the time-to-solution by up to three times. Our code is ScaLAPAck-compatible and available as an open-source library.
Author VandeVondele, Joost
Ben-Nun, Tal
Gaillard, Andre
Besta, Maciej
Hoefler, Torsten
Kabic, Marko
Ziogas, Alexandros Nikolaos
Saethre, Jens Eirik
Schneider, Timo
Kwasniewski, Grzegorz
Kozhevnikov, Anton
Author_xml – sequence: 1
  givenname: Grzegorz
  surname: Kwasniewski
  fullname: Kwasniewski, Grzegorz
  organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland
– sequence: 2
  givenname: Marko
  surname: Kabic
  fullname: Kabic, Marko
  organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland
– sequence: 3
  givenname: Tal
  surname: Ben-Nun
  fullname: Ben-Nun, Tal
  organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland
– sequence: 4
  givenname: Alexandros Nikolaos
  surname: Ziogas
  fullname: Ziogas, Alexandros Nikolaos
  organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland
– sequence: 5
  givenname: Jens Eirik
  surname: Saethre
  fullname: Saethre, Jens Eirik
  organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland
– sequence: 6
  givenname: Andre
  surname: Gaillard
  fullname: Gaillard, Andre
  organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland
– sequence: 7
  givenname: Timo
  surname: Schneider
  fullname: Schneider, Timo
  organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland
– sequence: 8
  givenname: Maciej
  surname: Besta
  fullname: Besta, Maciej
  organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland
– sequence: 9
  givenname: Anton
  surname: Kozhevnikov
  fullname: Kozhevnikov, Anton
  organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland
– sequence: 10
  givenname: Joost
  surname: VandeVondele
  fullname: VandeVondele, Joost
  organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland
– sequence: 11
  givenname: Torsten
  surname: Hoefler
  fullname: Hoefler, Torsten
  organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland
BookMark eNotjjFPwzAUhA0CiVI6M7D4D6S1_ezYYasqChWBMMCGVD1jB4xMUjkeWn49keh0uu9Op7skZ13feUKuOZtzLtUCpDKG6zlIXfJSn5BZpc0YMDBSCn5KJmLEhQTQF2Q2DN-MMWE0B8Em5L3paP7y9AUTxugj3Swa2uxy-MEY8oH2La1D5zHRZfz0NiF99KnzcbilzyMtjlX6hDmFPV3jR-5T-MUc-m64IuctxsHPjjolb-u719VDUTf3m9WyLlBolQt0aL1z3JdojdJGcGXAiUo7VbrRWKhaYC0KK1FXyjGUlrnWGGm0NQAwJTf_u8F7v92l8VE6bKuKMy4A_gAS0lSR
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3458817.3476167
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781450384421
1450384420
EISSN 2167-4337
EndPage 15
ExternalDocumentID 9910123
Genre orig-research
GrantInformation_xml – fundername: Swiss National Science Foundation
  grantid: 185778
  funderid: 10.13039/501100001711
– fundername: European Research Council (ERC)
  funderid: 10.13039/100010663
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-a275t-adabedd1e6ab857821583d297d56d215b39f30fa2b4a795d0a4b0df88487b8333
IEDL.DBID RIE
ISICitedReferencesCount 8
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000946520100083&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:18:42 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a275t-adabedd1e6ab857821583d297d56d215b39f30fa2b4a795d0a4b0df88487b8333
PageCount 15
ParticipantIDs ieee_primary_9910123
PublicationCentury 2000
PublicationDate 2021-Nov.-14
PublicationDateYYYYMMDD 2021-11-14
PublicationDate_xml – month: 11
  year: 2021
  text: 2021-Nov.-14
  day: 14
PublicationDecade 2020
PublicationTitle SC21: International Conference for High Performance Computing, Networking, Storage and Analysis
PublicationTitleAbbrev SC
PublicationYear 2021
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0002871320
ssj0003204180
Score 1.8967717
Snippet Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal,...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Codes
communication complexity
Distributed linear algebra algorithms
Layout
Libraries
matrix factorization
Schedules
Scientific computing
Supercomputers
Three-dimensional displays
Title On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations
URI https://ieeexplore.ieee.org/document/9910123
WOSCitedRecordID wos000946520100083&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8MgGP2yLR48Td2Mv8PBo2wt0ALejHHRGLcdNNnBZIECxmR2pm7GP19g3YyJF29Aemgo9HsffO89gHNXcCOkYNhYLTHTRGLFiMVOOuLXUFCwE9Fsgg-HYjKR4wZcbLgw1tpYfGZ7oRnv8s28WIajsr7HMgECNKHJeb7iam3OUwLypzX0CX3fZqlIajWflGV9GkiZKe9R5lP3_LedSowmg_b_3mMHuj-0PDTeBJxdaNhyD9prXwZUb9MOPI9K5HEdGqsqOKXM0F1_hEb-3_AWQTeaO-RTUL_E0dXsJVwco3tblT5IXqKhH8X1o-ghyPd_oUG05FnzNbvwNLh5vL7FtYsCVoRnC6yM0taY1OZKiyBen2aCGiK5yXLjO5pKRxOniGaKy8wkiunEOCF8KqMFpXQfWuW8tAeAtE6TwiaFz6g1KzRRHlyRRCuXEr_NrT6ETpis6ftKKGNaz9PR38PHsE1CgUioqWMn0FpUS3sKW8Xn4vWjOotf9xvvzaRx
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwGP2CaKInVDD-tgePFra2Y603YyQQYHDAhIMJadfWmOAwE4x_vu0cGBMv3tpmh6Vr972v_d57ANc2jTUXnGFtlMBMEYElIwZbYYlbQ17BjhdmE3GS8OlUjCtws-HCGGOK4jPT9M3iLl8v0pU_Kms5LOMhwBZse-eskq21OVHx2J-W4Mf3XZuFPCj1fEIWtainZYZxkzKXvLd_G6oU8aRT-9-b7EPjh5iHxpuQcwAVkx1Cbe3MgMqNWoenUYYcskNjmXuvlDnqtUZo5P4OrwXsRguLXBLqFjm6mz_7q2PUN3nmwuQtStwoLh9FQy_g_4k6hSnPmrHZgMfOw-S-i0sfBSxJHC2x1FIZrUPTlop7-fow4lQTEeuorV1HUWFpYCVRTMYi0oFkKtCWc5fMKE4pPYJqtsjMMSClwiA1QepyasVSRaSDVyRQ0obEbXSjTqDuJ2v29i2VMSvn6fTv4SvY7U6Gg9mgl_TPYI_4chFfYcfOobrMV-YCdtKP5ct7fll86S-xP6e6
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC21%3A+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=On+the+Parallel+I%2FO+Optimality+of+Linear+Algebra+Kernels%3A+Near-Optimal+Matrix+Factorizations&rft.au=Kwasniewski%2C+Grzegorz&rft.au=Kabic%2C+Marko&rft.au=Ben-Nun%2C+Tal&rft.au=Ziogas%2C+Alexandros+Nikolaos&rft.date=2021-11-14&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=15&rft_id=info:doi/10.1145%2F3458817.3476167&rft.externalDocID=9910123