PIAR: Path-Improved Adaptive Routing for Dragonfly Networks

For the next-generation exascale supercomputing communication systems, Dragonfly topology offers strong scalability, low latency, and cost efficiency. Dragonfly networks have already been implemented in current supercomputers and will continue to expand in future systems. Adaptive routing in Dragonf...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings / IEEE International Conference on Cluster Computing S. 1 - 11
Hauptverfasser: Wang, Zhenghao, Wang, Qiang, Lai, Mingche, Xu, Jiaqing, Xu, Jinbo, Xie, Min, Chen, Guo
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 02.09.2025
Schlagworte:
ISSN:2168-9253
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract For the next-generation exascale supercomputing communication systems, Dragonfly topology offers strong scalability, low latency, and cost efficiency. Dragonfly networks have already been implemented in current supercomputers and will continue to expand in future systems. Adaptive routing in Dragonfly topologies is critical for network performance. The traditional UGAL routing algorithm, which uses the valiant mechanism to select non-minimal paths, does not adequately consider the impact of high hops in non-minimal paths, often unnecessarily increasing the average path length, thereby increasing network latency and load. Furthermore, UGAL inaccurately estimates the congestion of the entire routing path based on local information, leading to suboptimal routing decisions that limit the algorithm's performance. In this paper, we propose PIAR, a novel pathimproved adaptive routing algorithm. PIAR dynamically selects paths based on the status of local and global channels, prioritizing non-minimal paths with fewer hops to reduce network latency and load, thereby improving network performance. Additionally, we present the microarchitecture of the routing computation unit. Our evaluation results demonstrate that, compared with advanced algorithms such as PAR _{\text {PH }} , TPR, and UGAL LE, PIAR achieves an average throughput improvement of 19.2 % and reduces latency by up to \mathbf{1 3. 4 \%} under the single synthetic traffic. Under mixed traffic, PIAR achieves an average throughput improvement of \mathbf{2 3. 6 \%} and reduces the latency by up to \mathbf{3 3. 8 \%} . For application workloads, PIAR achieves an average reduction of 24.0 % in packet latency.
AbstractList For the next-generation exascale supercomputing communication systems, Dragonfly topology offers strong scalability, low latency, and cost efficiency. Dragonfly networks have already been implemented in current supercomputers and will continue to expand in future systems. Adaptive routing in Dragonfly topologies is critical for network performance. The traditional UGAL routing algorithm, which uses the valiant mechanism to select non-minimal paths, does not adequately consider the impact of high hops in non-minimal paths, often unnecessarily increasing the average path length, thereby increasing network latency and load. Furthermore, UGAL inaccurately estimates the congestion of the entire routing path based on local information, leading to suboptimal routing decisions that limit the algorithm's performance. In this paper, we propose PIAR, a novel pathimproved adaptive routing algorithm. PIAR dynamically selects paths based on the status of local and global channels, prioritizing non-minimal paths with fewer hops to reduce network latency and load, thereby improving network performance. Additionally, we present the microarchitecture of the routing computation unit. Our evaluation results demonstrate that, compared with advanced algorithms such as PAR _{\text {PH }} , TPR, and UGAL LE, PIAR achieves an average throughput improvement of 19.2 % and reduces latency by up to \mathbf{1 3. 4 \%} under the single synthetic traffic. Under mixed traffic, PIAR achieves an average throughput improvement of \mathbf{2 3. 6 \%} and reduces the latency by up to \mathbf{3 3. 8 \%} . For application workloads, PIAR achieves an average reduction of 24.0 % in packet latency.
Author Xie, Min
Chen, Guo
Wang, Zhenghao
Wang, Qiang
Xu, Jinbo
Lai, Mingche
Xu, Jiaqing
Author_xml – sequence: 1
  givenname: Zhenghao
  surname: Wang
  fullname: Wang, Zhenghao
  email: zh_wang@nudt.edu.cn
  organization: College of Computer Science and Technology, National University of Defense Technology,Changsha,China
– sequence: 2
  givenname: Qiang
  surname: Wang
  fullname: Wang, Qiang
  email: qiangwang@nudt.edu.cn
  organization: College of Computer Science and Technology, National University of Defense Technology,Changsha,China
– sequence: 3
  givenname: Mingche
  surname: Lai
  fullname: Lai, Mingche
  email: mingchelai@nudt.edu.cn
  organization: College of Computer Science and Technology, National University of Defense Technology,Changsha,China
– sequence: 4
  givenname: Jiaqing
  surname: Xu
  fullname: Xu, Jiaqing
  email: xujiaqing@nudt.edu.cn
  organization: College of Computer Science and Technology, National University of Defense Technology,Changsha,China
– sequence: 5
  givenname: Jinbo
  surname: Xu
  fullname: Xu, Jinbo
  email: xujinbo@nudt.edu.cn
  organization: College of Computer Science and Technology, National University of Defense Technology,Changsha,China
– sequence: 6
  givenname: Min
  surname: Xie
  fullname: Xie, Min
  email: xiemin@nudt.edu.cn
  organization: College of Computer Science and Technology, National University of Defense Technology,Changsha,China
– sequence: 7
  givenname: Guo
  surname: Chen
  fullname: Chen, Guo
  email: guochen@hnu.edu.cn
  organization: College of Computer Science and Electronic Engineering, Hunan University,Changsha,China
BookMark eNo1j81Kw0AYAFdRsK19Aw_Be-puvuyfnkJaNRC0xHguu8m3NdomYRMrfXsF9TQwh4GZkrO2a5GQa0YXjFF9k-avL-Wq4BriaBHRiP9opkSsohMy11IrAMaBMq1OySRiQoU64nBBpsPwTilIoGJC7tZZUtwGazO-hdm-990B6yCpTT82BwyK7nNs2m3gOh8svdl2rdsdgyccvzr_MVySc2d2A87_OCPl_apMH8P8-SFLkzxsNIwhSJQQM2m5Rl6bmEuHQlQKXQwWKUWqLBpbCc4ccimti4Vz1spKUVVVAmbk6jfbIOKm983e-OPm_xW-ARs-S_c
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CLUSTER59342.2025.11186482
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798331530198
EISSN 2168-9253
EndPage 11
ExternalDocumentID 11186482
Genre orig-research
GrantInformation_xml – fundername: National Key Research and Development Program of China
  grantid: 2023YFB4403400
  funderid: 10.13039/501100012166
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i93t-37e73417b59e5da457fe66c8ef43be00e08beabc651fe577bf46ffbb7c808cc63
IEDL.DBID RIE
IngestDate Wed Oct 15 14:21:20 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-37e73417b59e5da457fe66c8ef43be00e08beabc651fe577bf46ffbb7c808cc63
PageCount 11
ParticipantIDs ieee_primary_11186482
PublicationCentury 2000
PublicationDate 2025-Sept.-2
PublicationDateYYYYMMDD 2025-09-02
PublicationDate_xml – month: 09
  year: 2025
  text: 2025-Sept.-2
  day: 02
PublicationDecade 2020
PublicationTitle Proceedings / IEEE International Conference on Cluster Computing
PublicationTitleAbbrev CLUSTER
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0037306
Score 2.3020515
Snippet For the next-generation exascale supercomputing communication systems, Dragonfly topology offers strong scalability, low latency, and cost efficiency....
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms adaptive routing
Adaptive systems
dragonfly topology
Heuristic algorithms
High-performance interconnection network
Network topology
Next generation networking
Routing
Scalability
Supercomputers
Throughput
Topology
Traffic control
Title PIAR: Path-Improved Adaptive Routing for Dragonfly Networks
URI https://ieeexplore.ieee.org/document/11186482
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JS0MxEA5aPHiqS0XrQg5e06YvL8vTU6kWhVIeWqG3kmUivbSli-C_N3mL4sGDtxCYECbJfJNkvhmEbpV1wgltiE0oI6nRlugsAeKM5spKJ1hZtWQkx2M1nWZ5RVYvuDAAUASfQSc2i798t7S7-FTWDedSiVQFi7svpSjJWrXZZWGriiqraI9m3cHo7TU4hDxjaeRbJbxTS_-qo1LAyLD5zwkcodYPIQ_n31BzjPZgcYKadUUGXB3QU3SfP_df7nAe3DpSPheAw32nV9Go4Rj9E8Rx8FPxw1q_R0bIJx6XgeCbFpoMHyeDJ1KVRyDzjG2DZQAZIEgangF3OuXSgxBWgU-ZAUqBKgPaWMF7HriUxqfCe2OkVVRZK9gZaiyWCzhHmIELl1MB3gTE1t7q4CMEmDKCeh5G4heoFXUxW5UJMGa1Gtp_9F-iw6jxIhQruUKN7XoH1-jAfmznm_VNsWxfxo-Zng
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA6igp7qo-LbHLxuu93Na_VUqqXFdVm0Qm8lj4n00pY-BP-9yT4UDx68hcCEMEnmmyTzzSB0K7RhhkkV6CiMA6KkDmQSQWCUpEJzw-KyaknKs0yMx0lekdULLgwAFMFn0PLN4i_fzPXGP5W13bkUjAhncXcoIVFY0rVqwxu7zcqqvKKdMGn30rdX5xLSJCaecRXRVi3_q5JKAST9xj-ncICaP5Q8nH-DzSHagtkRatQ1GXB1RI_RfT7svtzh3Dl2QflgAAZ3jVx4s4Z9_I8Tx85TxQ9L-e45IZ84K0PBV0006j-OeoOgKpAQTJN47WwDcAdCXNEEqJGEcguMaQGWxArCEEKhQCrNaMcC5VxZwqxVimsRCq1ZfIK2Z_MZnCIcg3HXUwZWOcyWVkvnJTigUiy01I1Ez1DT62KyKFNgTGo1nP_Rf4P2BqPndJIOs6cLtO-1XwRmRZdoe73cwBXa1R_r6Wp5XSzhF_WknOU
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+IEEE+International+Conference+on+Cluster+Computing&rft.atitle=PIAR%3A+Path-Improved+Adaptive+Routing+for+Dragonfly+Networks&rft.au=Wang%2C+Zhenghao&rft.au=Wang%2C+Qiang&rft.au=Lai%2C+Mingche&rft.au=Xu%2C+Jiaqing&rft.date=2025-09-02&rft.pub=IEEE&rft.eissn=2168-9253&rft.spage=1&rft.epage=11&rft_id=info:doi/10.1109%2FCLUSTER59342.2025.11186482&rft.externalDocID=11186482