Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters

This paper presents a unified communication optimization frame-work for sparse triangular solve (SpTRSV) algorithms on CPU and GPU clusters. The framework builds upon a 3D communication-avoiding (CA) layout of P_{x}\times P_{y}\times P_{z} processes that divides a sparse matrix into P_{z} submatrice...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:International Conference for High Performance Computing, Networking, Storage and Analysis (Online) s. 1 - 15
Hlavní autoři: Liu, Yang, Ding, Nan, Sao, Piyush, Williams, Samuel, Li, Xiaoye Sherry
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 11.11.2023
Témata:
ISSN:2167-4337
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract This paper presents a unified communication optimization frame-work for sparse triangular solve (SpTRSV) algorithms on CPU and GPU clusters. The framework builds upon a 3D communication-avoiding (CA) layout of P_{x}\times P_{y}\times P_{z} processes that divides a sparse matrix into P_{z} submatrices, each handled by a P_{x}\times P_{y}2\mathrm{D} grid with block-cyclic distribution. We propose three communication optimization strategies: First, a new 3D SpTRSV algorithm is developed, which trades the inter-grid communication and synchronization with replicated computation. This design requires only one inter-grid synchronization, and the inter-grid communication is efficiently implemented with sparse allreduce operations. Second, broadcast and reduction communication trees are used to reduce message latency of the intra-grid 2D communication on CPU clus-ters. Finally, we leverage GPU-initiated one-sided communication to implement the communication trees on GPU clusters. With these nested inter- and intra-grid communication optimization strategies, the proposed 3D SpTRSV algorithm can attain up to 3.45x speedups compared to the baseline 3D SpTRSV algorithm using up to 2048 Cori Haswell CPU cores. In addition, the proposed GPU 3D Sp-TRSV algorithm can achieve up to 6.5x speedups compared to the proposed CPU 3D SpTRSV algorithm with P_{z} up to 64. Finally it is remarkable that the proposed GPU 3D SpTRSV can scale to 256 GPUs using the Perlmutter system while the existing 2D SpTRSV algorithm can only scale up to 4 GPUs.
AbstractList This paper presents a unified communication optimization frame-work for sparse triangular solve (SpTRSV) algorithms on CPU and GPU clusters. The framework builds upon a 3D communication-avoiding (CA) layout of P_{x}\times P_{y}\times P_{z} processes that divides a sparse matrix into P_{z} submatrices, each handled by a P_{x}\times P_{y}2\mathrm{D} grid with block-cyclic distribution. We propose three communication optimization strategies: First, a new 3D SpTRSV algorithm is developed, which trades the inter-grid communication and synchronization with replicated computation. This design requires only one inter-grid synchronization, and the inter-grid communication is efficiently implemented with sparse allreduce operations. Second, broadcast and reduction communication trees are used to reduce message latency of the intra-grid 2D communication on CPU clus-ters. Finally, we leverage GPU-initiated one-sided communication to implement the communication trees on GPU clusters. With these nested inter- and intra-grid communication optimization strategies, the proposed 3D SpTRSV algorithm can attain up to 3.45x speedups compared to the baseline 3D SpTRSV algorithm using up to 2048 Cori Haswell CPU cores. In addition, the proposed GPU 3D Sp-TRSV algorithm can achieve up to 6.5x speedups compared to the proposed CPU 3D SpTRSV algorithm with P_{z} up to 64. Finally it is remarkable that the proposed GPU 3D SpTRSV can scale to 256 GPUs using the Perlmutter system while the existing 2D SpTRSV algorithm can only scale up to 4 GPUs.
Author Liu, Yang
Sao, Piyush
Ding, Nan
Williams, Samuel
Li, Xiaoye Sherry
Author_xml – sequence: 1
  givenname: Yang
  surname: Liu
  fullname: Liu, Yang
  email: liuyangzhuan@lbl.gov
  organization: Lawrence Berkeley National Laboratory,Berkeley,CA,USA
– sequence: 2
  givenname: Nan
  surname: Ding
  fullname: Ding, Nan
  email: nanding@lbl.gov
  organization: Lawrence Berkeley National Laboratory,Berkeley,CA,USA
– sequence: 3
  givenname: Piyush
  surname: Sao
  fullname: Sao, Piyush
  email: saopk@ornl.gov
  organization: Oak Ridge National Laboratory,Oak Ridge,TN,USA
– sequence: 4
  givenname: Samuel
  surname: Williams
  fullname: Williams, Samuel
  email: swwilliams@lbl.gov
  organization: Lawrence Berkeley National Laboratory,Berkeley,CA,USA
– sequence: 5
  givenname: Xiaoye Sherry
  surname: Li
  fullname: Li, Xiaoye Sherry
  email: xsli@lbl.gov
  organization: Lawrence Berkeley National Laboratory,Berkeley,CA,USA
BookMark eNotjF1LwzAYRqMoOOeuvfEif6DzzUeT9FKKTmEwYdv1SNs3I9KmJWkF_fUW5tXhPByee3IT-oCEPDJYMybzZ5Ebpo1cCwUaCn5FVoUujITZ2OzXZMGZ0pkUQt-RVUpfACA4SGZgQapj8M5jQ8u-66bgazv6PtDdMPrO_15kP0Y74tljoq6PdD_YmJAeorfhPLV2Xvr2GyOd0_LzSG1o6GZm2U5pxJgeyK2zbcLVP5fk-PZ6KN-z7W7zUb5sMyuMHjOr60ZIhkLbHKTNubIIDUJdF9oBN5VhjXI1KKW5NE6rxpjKFVibHHnFQCzJ0-XXI-JpiL6z8efEQJocjBB_kp5Xyw
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3581784.3607092
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore Digital Libary (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798400701092
EISSN 2167-4337
EndPage 15
ExternalDocumentID 10485083
Genre orig-research
GrantInformation_xml – fundername: U.S. Department of Energy
  grantid: DE-AC05-OOOR22725
  funderid: 10.13039/100000015
– fundername: U.S. Department of Energy
  funderid: 10.13039/100000015
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-a387t-a7cd341e37a504a526ae0de0cc97f028b81d6fc0667248f76d88bf9ec85e2b103
IEDL.DBID RIE
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001461755900036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:09:33 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a387t-a7cd341e37a504a526ae0de0cc97f028b81d6fc0667248f76d88bf9ec85e2b103
OpenAccessLink https://dl.acm.org/doi/pdf/10.1145/3581784.3607092
PageCount 15
ParticipantIDs ieee_primary_10485083
PublicationCentury 2000
PublicationDate 2023-Nov.-11
PublicationDateYYYYMMDD 2023-11-11
PublicationDate_xml – month: 11
  year: 2023
  text: 2023-Nov.-11
  day: 11
PublicationDecade 2020
PublicationTitle International Conference for High Performance Computing, Networking, Storage and Analysis (Online)
PublicationTitleAbbrev SC
PublicationYear 2023
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0003204180
ssib053141430
Score 1.8728038
Snippet This paper presents a unified communication optimization frame-work for sparse triangular solve (SpTRSV) algorithms on CPU and GPU clusters. The framework...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Clustering algorithms
communication optimization
communication-avoiding algorithm
Graphics processing units
High performance computing
Layout
NVSH-MEM
Parallel processing
Scalability
sparse matrix
SpTRSV
supernodal method
Three-dimensional displays
triangular solve
Title Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters
URI https://ieeexplore.ieee.org/document/10485083
WOSCitedRecordID wos001461755900036&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELVoxYFTWYrY5QPXlHhJ7JwrCgdUKtGi3irbmUhIkFbdvp-ZNGE5cOCUyLKUyB77vcTz3jB2i9PqyHctQnCSkRYyRF5BHOVOesQLk6ZVtsXrkxkO7XSajWqxeqWFAYAq-Qx6dFud5efzsKFfZbjCtSX78hZrGWN2Yq0mePChWjRW4rQNKxlrYePazkfo5I6svozVPZVinNPZ5496KhWcDDr_fJFD1v0W5vHRF-QcsT0oj1mnqczA64V6wjxSyQLJJf-l_-DPuD981MJL3vjSwoojceUvC_zEBT7GgCypPD22zClrmmPX_mjCXZnzB7z23zfkrbDqssngftx_jOpqCpFT1qwjZ0KOkAXKuCTWLpGpgziHOITMFMgyPDLXtAiU9Cq1LUyaW-uLDIJNQHoRq1PWLuclnDGuMu0D8so0c5l25PiSyMIXDvFemKDyc9alMZstdoYZs2a4Lv5ov2QHVMWdJH5CXLH2ermBa7Yftuu31fKmmuZPoe2m6A
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELWgIMGpLEXs-MA1JV4SO-eKUkQplWhRb5XtTCQkSKoufD_jNGE5cOCUyLKUyB77vcTz3hByjdNqvO9agODEA8m4C6yAMEgNt4gXKo7LbIuXvhoM9GSSDCuxeqmFAYAy-Qza_rY8y08Lt_K_ynCFS-3tyzfJViQlZ2u5Vh0--FjJajNxvxELHkqmw8rQh8noxpt9KS3bIsZI96efPyqqlIDSbf7zVfZI61uaR4dfoLNPNiA_IM26NgOtluohsUgmM6SX9JcChD7hDvFeSS9p7UwLC4rUlT7P8CMX6AhDMvcF6rGl8HnTFLt2hmNq8pTe4bXztvLuCosWGXdvR51eUNVTCIzQahkY5VIELRDKRKE0EY8NhCmEziUqQ55hkbvGmfNpr1zqTMWp1jZLwOkIuGWhOCKNvMjhmFCRSOuQWcaJSaTxni8Rz2xmEPGZciI9IS0_ZtPZ2jJjWg_X6R_tV2SnN3rsT_v3g4czsutrunvBH2PnpLGcr-CCbLuP5etifllO-Sd-4aov
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=International+Conference+for+High+Performance+Computing%2C+Networking%2C+Storage+and+Analysis+%28Online%29&rft.atitle=Unified+Communication+Optimization+Strategies+for+Sparse+Triangular+Solver+on+CPU+and+GPU+Clusters&rft.au=Liu%2C+Yang&rft.au=Ding%2C+Nan&rft.au=Sao%2C+Piyush&rft.au=Williams%2C+Samuel&rft.date=2023-11-11&rft.pub=ACM&rft.eissn=2167-4337&rft.spage=1&rft.epage=15&rft_id=info:doi/10.1145%2F3581784.3607092&rft.externalDocID=10485083