PIAR: Path-Improved Adaptive Routing for Dragonfly Networks

For the next-generation exascale supercomputing communication systems, Dragonfly topology offers strong scalability, low latency, and cost efficiency. Dragonfly networks have already been implemented in current supercomputers and will continue to expand in future systems. Adaptive routing in Dragonf...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings / IEEE International Conference on Cluster Computing s. 1 - 11
Hlavní autoři: Wang, Zhenghao, Wang, Qiang, Lai, Mingche, Xu, Jiaqing, Xu, Jinbo, Xie, Min, Chen, Guo
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 02.09.2025
Témata:
ISSN:2168-9253
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:For the next-generation exascale supercomputing communication systems, Dragonfly topology offers strong scalability, low latency, and cost efficiency. Dragonfly networks have already been implemented in current supercomputers and will continue to expand in future systems. Adaptive routing in Dragonfly topologies is critical for network performance. The traditional UGAL routing algorithm, which uses the valiant mechanism to select non-minimal paths, does not adequately consider the impact of high hops in non-minimal paths, often unnecessarily increasing the average path length, thereby increasing network latency and load. Furthermore, UGAL inaccurately estimates the congestion of the entire routing path based on local information, leading to suboptimal routing decisions that limit the algorithm's performance. In this paper, we propose PIAR, a novel pathimproved adaptive routing algorithm. PIAR dynamically selects paths based on the status of local and global channels, prioritizing non-minimal paths with fewer hops to reduce network latency and load, thereby improving network performance. Additionally, we present the microarchitecture of the routing computation unit. Our evaluation results demonstrate that, compared with advanced algorithms such as PAR _{\text {PH }} , TPR, and UGAL LE, PIAR achieves an average throughput improvement of 19.2 % and reduces latency by up to \mathbf{1 3. 4 \%} under the single synthetic traffic. Under mixed traffic, PIAR achieves an average throughput improvement of \mathbf{2 3. 6 \%} and reduces the latency by up to \mathbf{3 3. 8 \%} . For application workloads, PIAR achieves an average reduction of 24.0 % in packet latency.
ISSN:2168-9253
DOI:10.1109/CLUSTER59342.2025.11186482