On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations
Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically...
Uložené v:
| Vydané v: | SC21: International Conference for High Performance Computing, Networking, Storage and Analysis s. 1 - 15 |
|---|---|
| Hlavní autori: | , , , , , , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
ACM
14.11.2021
|
| Predmet: | |
| ISSN: | 2167-4337 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically communication-optimal 2.5D decomposition. We first establish a theoretical framework for deriving parallel I/O lower bounds for linear algebra kernels, and then utilize its insights to derive Cholesky and LU schedules, both communicating N^{3}/(P\sqrt{M}) elements per processor, where M is the local memory size. The empirical results match our theoretical analysis: our implementations communicate significantly less than Intel MKL, SLATE, and the asymptotically communication-optimal CANDMC and CAPITAL libraries. Our code outperforms these state-of-the-art libraries in almost all tested scenarios, with matrix sizes ranging from 2,048 to 524,288 on up to 512 CPU nodes of the Piz Daint supercomputer, decreasing the time-to-solution by up to three times. Our code is ScaLAPAck-compatible and available as an open-source library. |
|---|---|
| AbstractList | Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically communication-optimal 2.5D decomposition. We first establish a theoretical framework for deriving parallel I/O lower bounds for linear algebra kernels, and then utilize its insights to derive Cholesky and LU schedules, both communicating N^{3}/(P\sqrt{M}) elements per processor, where M is the local memory size. The empirical results match our theoretical analysis: our implementations communicate significantly less than Intel MKL, SLATE, and the asymptotically communication-optimal CANDMC and CAPITAL libraries. Our code outperforms these state-of-the-art libraries in almost all tested scenarios, with matrix sizes ranging from 2,048 to 524,288 on up to 512 CPU nodes of the Piz Daint supercomputer, decreasing the time-to-solution by up to three times. Our code is ScaLAPAck-compatible and available as an open-source library. |
| Author | VandeVondele, Joost Ben-Nun, Tal Gaillard, Andre Besta, Maciej Hoefler, Torsten Kabic, Marko Ziogas, Alexandros Nikolaos Saethre, Jens Eirik Schneider, Timo Kwasniewski, Grzegorz Kozhevnikov, Anton |
| Author_xml | – sequence: 1 givenname: Grzegorz surname: Kwasniewski fullname: Kwasniewski, Grzegorz organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 2 givenname: Marko surname: Kabic fullname: Kabic, Marko organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 3 givenname: Tal surname: Ben-Nun fullname: Ben-Nun, Tal organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 4 givenname: Alexandros Nikolaos surname: Ziogas fullname: Ziogas, Alexandros Nikolaos organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 5 givenname: Jens Eirik surname: Saethre fullname: Saethre, Jens Eirik organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 6 givenname: Andre surname: Gaillard fullname: Gaillard, Andre organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 7 givenname: Timo surname: Schneider fullname: Schneider, Timo organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 8 givenname: Maciej surname: Besta fullname: Besta, Maciej organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 9 givenname: Anton surname: Kozhevnikov fullname: Kozhevnikov, Anton organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 10 givenname: Joost surname: VandeVondele fullname: VandeVondele, Joost organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland – sequence: 11 givenname: Torsten surname: Hoefler fullname: Hoefler, Torsten organization: ETH Zurich Swiss National Computing Center,Department of Computer Science,Switzerland |
| BookMark | eNotjjFPwzAUhA0CiVI6M7D4D6S1_ezYYasqChWBMMCGVD1jB4xMUjkeWn49keh0uu9Op7skZ13feUKuOZtzLtUCpDKG6zlIXfJSn5BZpc0YMDBSCn5KJmLEhQTQF2Q2DN-MMWE0B8Em5L3paP7y9AUTxugj3Swa2uxy-MEY8oH2La1D5zHRZfz0NiF99KnzcbilzyMtjlX6hDmFPV3jR-5T-MUc-m64IuctxsHPjjolb-u719VDUTf3m9WyLlBolQt0aL1z3JdojdJGcGXAiUo7VbrRWKhaYC0KK1FXyjGUlrnWGGm0NQAwJTf_u8F7v92l8VE6bKuKMy4A_gAS0lSR |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3458817.3476167 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781450384421 1450384420 |
| EISSN | 2167-4337 |
| EndPage | 15 |
| ExternalDocumentID | 9910123 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Swiss National Science Foundation grantid: 185778 funderid: 10.13039/501100001711 – fundername: European Research Council (ERC) funderid: 10.13039/100010663 |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-a275t-adabedd1e6ab857821583d297d56d215b39f30fa2b4a795d0a4b0df88487b8333 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 8 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000946520100083&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:18:42 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a275t-adabedd1e6ab857821583d297d56d215b39f30fa2b4a795d0a4b0df88487b8333 |
| PageCount | 15 |
| ParticipantIDs | ieee_primary_9910123 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-Nov.-14 |
| PublicationDateYYYYMMDD | 2021-11-14 |
| PublicationDate_xml | – month: 11 year: 2021 text: 2021-Nov.-14 day: 14 |
| PublicationDecade | 2020 |
| PublicationTitle | SC21: International Conference for High Performance Computing, Networking, Storage and Analysis |
| PublicationTitleAbbrev | SC |
| PublicationYear | 2021 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0002871320 ssj0003204180 |
| Score | 1.8967717 |
| Snippet | Matrix factorizations are among the most important building blocks of scientific computing. However, state-of-the-art libraries are not communication-optimal,... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Codes communication complexity Distributed linear algebra algorithms Layout Libraries matrix factorization Schedules Scientific computing Supercomputers Three-dimensional displays |
| Title | On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations |
| URI | https://ieeexplore.ieee.org/document/9910123 |
| WOSCitedRecordID | wos000946520100083&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8MgGP2yLR48Td2Mv8PBo2wt0ALejHHRGLcdNNnBZIECxmR2pm7GP19g3YyJF29Aemgo9HsffO89gHNXcCOkYNhYLTHTRGLFiMVOOuLXUFCwE9Fsgg-HYjKR4wZcbLgw1tpYfGZ7oRnv8s28WIajsr7HMgECNKHJeb7iam3OUwLypzX0CX3fZqlIajWflGV9GkiZKe9R5lP3_LedSowmg_b_3mMHuj-0PDTeBJxdaNhyD9prXwZUb9MOPI9K5HEdGqsqOKXM0F1_hEb-3_AWQTeaO-RTUL_E0dXsJVwco3tblT5IXqKhH8X1o-ghyPd_oUG05FnzNbvwNLh5vL7FtYsCVoRnC6yM0taY1OZKiyBen2aCGiK5yXLjO5pKRxOniGaKy8wkiunEOCF8KqMFpXQfWuW8tAeAtE6TwiaFz6g1KzRRHlyRRCuXEr_NrT6ETpis6ftKKGNaz9PR38PHsE1CgUioqWMn0FpUS3sKW8Xn4vWjOotf9xvvzaRx |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwGP2CaKInVDD-tgePFra2Y603YyQQYHDAhIMJadfWmOAwE4x_vu0cGBMv3tpmh6Vr972v_d57ANc2jTUXnGFtlMBMEYElIwZbYYlbQ17BjhdmE3GS8OlUjCtws-HCGGOK4jPT9M3iLl8v0pU_Kms5LOMhwBZse-eskq21OVHx2J-W4Mf3XZuFPCj1fEIWtainZYZxkzKXvLd_G6oU8aRT-9-b7EPjh5iHxpuQcwAVkx1Cbe3MgMqNWoenUYYcskNjmXuvlDnqtUZo5P4OrwXsRguLXBLqFjm6mz_7q2PUN3nmwuQtStwoLh9FQy_g_4k6hSnPmrHZgMfOw-S-i0sfBSxJHC2x1FIZrUPTlop7-fow4lQTEeuorV1HUWFpYCVRTMYi0oFkKtCWc5fMKE4pPYJqtsjMMSClwiA1QepyasVSRaSDVyRQ0obEbXSjTqDuJ2v29i2VMSvn6fTv4SvY7U6Gg9mgl_TPYI_4chFfYcfOobrMV-YCdtKP5ct7fll86S-xP6e6 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=SC21%3A+International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis&rft.atitle=On+the+Parallel+I%2FO+Optimality+of+Linear+Algebra+Kernels%3A+Near-Optimal+Matrix+Factorizations&rft.au=Kwasniewski%2C+Grzegorz&rft.au=Kabic%2C+Marko&rft.au=Ben-Nun%2C+Tal&rft.au=Ziogas%2C+Alexandros+Nikolaos&rft.date=2021-11-14&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=15&rft_id=info:doi/10.1145%2F3458817.3476167&rft.externalDocID=9910123 |