HyperX Topology: First At-Scale Implementation and Comparison to the Fat-Tree

The de-facto standard topology for modern HPC systems and data-centers are Folded Clos networks, commonly known as Fat-Trees. The number of network endpoints in these systems is steadily in-creasing. The switch radix increase is not keeping up, forcing an increased path length in these multi-level t...

Full description

Saved in:
Bibliographic Details
Published in:SC19: International Conference for High Performance Computing, Networking, Storage and Analysis pp. 1 - 23
Main Authors: Domke, Jens, Matsuoka, Satoshi, Ivanov, Ivan R., Tsushima, Yuki, Yuki, Tomoya, Nomura, Akihiro, Miura, Shin'ichi, McDonald, Nic, Floyd, Dennis L., Dube, Nicolas
Format: Conference Proceeding
Language:English
Published: ACM 17.11.2019
Subjects:
ISSN:2167-4337
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The de-facto standard topology for modern HPC systems and data-centers are Folded Clos networks, commonly known as Fat-Trees. The number of network endpoints in these systems is steadily in-creasing. The switch radix increase is not keeping up, forcing an increased path length in these multi-level trees that will limit gains for latency-sensitive applications. Additionally, today's Fat-Trees force the extensive use of active optical cables which carries a pro-hibitive cost-structure at scale. To tackle these issues, researchers proposed various low-diameter topologies, such as Dragonfly. An-other novel, but only theoretically studied, option is the HyperX. We built the world's first 3 Pflop/s supercomputer with two separate networks, a 3-level Fat-Tree and a 12×8 HyperX. This dual-plane system allows us to perform a side-by-side comparison using a broad set of benchmarks. We show that the HyperX, together with our novel communication pattern-aware routing, can challenge the performance of, or even outperform, traditional Fat-Trees.
ISSN:2167-4337
DOI:10.1145/3295500.3356140