Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters
This paper presents a unified communication optimization frame-work for sparse triangular solve (SpTRSV) algorithms on CPU and GPU clusters. The framework builds upon a 3D communication-avoiding (CA) layout of P_{x}\times P_{y}\times P_{z} processes that divides a sparse matrix into P_{z} submatrice...
Uloženo v:
| Vydáno v: | International Conference for High Performance Computing, Networking, Storage and Analysis (Online) s. 1 - 15 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
11.11.2023
|
| Témata: | |
| ISSN: | 2167-4337 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | This paper presents a unified communication optimization frame-work for sparse triangular solve (SpTRSV) algorithms on CPU and GPU clusters. The framework builds upon a 3D communication-avoiding (CA) layout of P_{x}\times P_{y}\times P_{z} processes that divides a sparse matrix into P_{z} submatrices, each handled by a P_{x}\times P_{y}2\mathrm{D} grid with block-cyclic distribution. We propose three communication optimization strategies: First, a new 3D SpTRSV algorithm is developed, which trades the inter-grid communication and synchronization with replicated computation. This design requires only one inter-grid synchronization, and the inter-grid communication is efficiently implemented with sparse allreduce operations. Second, broadcast and reduction communication trees are used to reduce message latency of the intra-grid 2D communication on CPU clus-ters. Finally, we leverage GPU-initiated one-sided communication to implement the communication trees on GPU clusters. With these nested inter- and intra-grid communication optimization strategies, the proposed 3D SpTRSV algorithm can attain up to 3.45x speedups compared to the baseline 3D SpTRSV algorithm using up to 2048 Cori Haswell CPU cores. In addition, the proposed GPU 3D Sp-TRSV algorithm can achieve up to 6.5x speedups compared to the proposed CPU 3D SpTRSV algorithm with P_{z} up to 64. Finally it is remarkable that the proposed GPU 3D SpTRSV can scale to 256 GPUs using the Perlmutter system while the existing 2D SpTRSV algorithm can only scale up to 4 GPUs. |
|---|---|
| AbstractList | This paper presents a unified communication optimization frame-work for sparse triangular solve (SpTRSV) algorithms on CPU and GPU clusters. The framework builds upon a 3D communication-avoiding (CA) layout of P_{x}\times P_{y}\times P_{z} processes that divides a sparse matrix into P_{z} submatrices, each handled by a P_{x}\times P_{y}2\mathrm{D} grid with block-cyclic distribution. We propose three communication optimization strategies: First, a new 3D SpTRSV algorithm is developed, which trades the inter-grid communication and synchronization with replicated computation. This design requires only one inter-grid synchronization, and the inter-grid communication is efficiently implemented with sparse allreduce operations. Second, broadcast and reduction communication trees are used to reduce message latency of the intra-grid 2D communication on CPU clus-ters. Finally, we leverage GPU-initiated one-sided communication to implement the communication trees on GPU clusters. With these nested inter- and intra-grid communication optimization strategies, the proposed 3D SpTRSV algorithm can attain up to 3.45x speedups compared to the baseline 3D SpTRSV algorithm using up to 2048 Cori Haswell CPU cores. In addition, the proposed GPU 3D Sp-TRSV algorithm can achieve up to 6.5x speedups compared to the proposed CPU 3D SpTRSV algorithm with P_{z} up to 64. Finally it is remarkable that the proposed GPU 3D SpTRSV can scale to 256 GPUs using the Perlmutter system while the existing 2D SpTRSV algorithm can only scale up to 4 GPUs. |
| Author | Liu, Yang Sao, Piyush Ding, Nan Williams, Samuel Li, Xiaoye Sherry |
| Author_xml | – sequence: 1 givenname: Yang surname: Liu fullname: Liu, Yang email: liuyangzhuan@lbl.gov organization: Lawrence Berkeley National Laboratory,Berkeley,CA,USA – sequence: 2 givenname: Nan surname: Ding fullname: Ding, Nan email: nanding@lbl.gov organization: Lawrence Berkeley National Laboratory,Berkeley,CA,USA – sequence: 3 givenname: Piyush surname: Sao fullname: Sao, Piyush email: saopk@ornl.gov organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA – sequence: 4 givenname: Samuel surname: Williams fullname: Williams, Samuel email: swwilliams@lbl.gov organization: Lawrence Berkeley National Laboratory,Berkeley,CA,USA – sequence: 5 givenname: Xiaoye Sherry surname: Li fullname: Li, Xiaoye Sherry email: xsli@lbl.gov organization: Lawrence Berkeley National Laboratory,Berkeley,CA,USA |
| BookMark | eNotjF1LwzAYRqMoOOeuvfEif6DzzUeT9FKKTmEwYdv1SNs3I9KmJWkF_fUW5tXhPByee3IT-oCEPDJYMybzZ5Ebpo1cCwUaCn5FVoUujITZ2OzXZMGZ0pkUQt-RVUpfACA4SGZgQapj8M5jQ8u-66bgazv6PtDdMPrO_15kP0Y74tljoq6PdD_YmJAeorfhPLV2Xvr2GyOd0_LzSG1o6GZm2U5pxJgeyK2zbcLVP5fk-PZ6KN-z7W7zUb5sMyuMHjOr60ZIhkLbHKTNubIIDUJdF9oBN5VhjXI1KKW5NE6rxpjKFVibHHnFQCzJ0-XXI-JpiL6z8efEQJocjBB_kp5Xyw |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3581784.3607092 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore Digital Libary (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798400701092 |
| EISSN | 2167-4337 |
| EndPage | 15 |
| ExternalDocumentID | 10485083 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: U.S. Department of Energy grantid: DE-AC05-OOOR22725 funderid: 10.13039/100000015 – fundername: U.S. Department of Energy funderid: 10.13039/100000015 |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-a387t-a7cd341e37a504a526ae0de0cc97f028b81d6fc0667248f76d88bf9ec85e2b103 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001461755900036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:09:33 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a387t-a7cd341e37a504a526ae0de0cc97f028b81d6fc0667248f76d88bf9ec85e2b103 |
| OpenAccessLink | https://dl.acm.org/doi/pdf/10.1145/3581784.3607092 |
| PageCount | 15 |
| ParticipantIDs | ieee_primary_10485083 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-Nov.-11 |
| PublicationDateYYYYMMDD | 2023-11-11 |
| PublicationDate_xml | – month: 11 year: 2023 text: 2023-Nov.-11 day: 11 |
| PublicationDecade | 2020 |
| PublicationTitle | International Conference for High Performance Computing, Networking, Storage and Analysis (Online) |
| PublicationTitleAbbrev | SC |
| PublicationYear | 2023 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0003204180 ssib053141430 |
| Score | 1.8728038 |
| Snippet | This paper presents a unified communication optimization frame-work for sparse triangular solve (SpTRSV) algorithms on CPU and GPU clusters. The framework... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Clustering algorithms communication optimization communication-avoiding algorithm Graphics processing units High performance computing Layout NVSH-MEM Parallel processing Scalability sparse matrix SpTRSV supernodal method Three-dimensional displays triangular solve |
| Title | Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters |
| URI | https://ieeexplore.ieee.org/document/10485083 |
| WOSCitedRecordID | wos001461755900036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELVoxYFTWYrY5QPXlHhJ7JwrCgdUKtGi3irbmUhIkFbdvp-ZNGE5cOCUyLKUyB77vcTz3jB2i9PqyHctQnCSkRYyRF5BHOVOesQLk6ZVtsXrkxkO7XSajWqxeqWFAYAq-Qx6dFud5efzsKFfZbjCtSX78hZrGWN2Yq0mePChWjRW4rQNKxlrYePazkfo5I6svozVPZVinNPZ5496KhWcDDr_fJFD1v0W5vHRF-QcsT0oj1mnqczA64V6wjxSyQLJJf-l_-DPuD981MJL3vjSwoojceUvC_zEBT7GgCypPD22zClrmmPX_mjCXZnzB7z23zfkrbDqssngftx_jOpqCpFT1qwjZ0KOkAXKuCTWLpGpgziHOITMFMgyPDLXtAiU9Cq1LUyaW-uLDIJNQHoRq1PWLuclnDGuMu0D8so0c5l25PiSyMIXDvFemKDyc9alMZstdoYZs2a4Lv5ov2QHVMWdJH5CXLH2ermBa7Yftuu31fKmmuZPoe2m6A |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELWgIMGpLEXs-MA1JV4SO-eKUkQplWhRb5XtTCQkSKoufD_jNGE5cOCUyLKUyB77vcTz3hByjdNqvO9agODEA8m4C6yAMEgNt4gXKo7LbIuXvhoM9GSSDCuxeqmFAYAy-Qza_rY8y08Lt_K_ynCFS-3tyzfJViQlZ2u5Vh0--FjJajNxvxELHkqmw8rQh8noxpt9KS3bIsZI96efPyqqlIDSbf7zVfZI61uaR4dfoLNPNiA_IM26NgOtluohsUgmM6SX9JcChD7hDvFeSS9p7UwLC4rUlT7P8CMX6AhDMvcF6rGl8HnTFLt2hmNq8pTe4bXztvLuCosWGXdvR51eUNVTCIzQahkY5VIELRDKRKE0EY8NhCmEziUqQ55hkbvGmfNpr1zqTMWp1jZLwOkIuGWhOCKNvMjhmFCRSOuQWcaJSaTxni8Rz2xmEPGZciI9IS0_ZtPZ2jJjWg_X6R_tV2SnN3rsT_v3g4czsutrunvBH2PnpLGcr-CCbLuP5etifllO-Sd-4aov |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis+%28Online%29&rft.atitle=Unified+Communication+Optimization+Strategies+for+Sparse+Triangular+Solver+on+CPU+and+GPU+Clusters&rft.au=Liu%2C+Yang&rft.au=Ding%2C+Nan&rft.au=Sao%2C+Piyush&rft.au=Williams%2C+Samuel&rft.date=2023-11-11&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=15&rft_id=info:doi/10.1145%2F3581784.3607092&rft.externalDocID=10485083 |