Examining the Viability of Row-Scale Disaggregation for Production Applications

Row-scale Composable Disaggregated Infrastructure (CDI) is a heterogeneous high performance computing (HPC) architecture that relocates the GPUs to a single chassis which CPU nodes can then request compute resources from. This is a distinctly different architecture from rack-scaled CDI as the GPUs a...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis s. 1613 - 1621
Hlavní autoři:	Shorts, Curtis, Grant, Ryan E.
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 17.11.2024
Témata:	cdi composable disaggregated infrastructure Computer architecture cuda Graphics processing units High performance computing hpc Kernel Mathematical models Optical fiber cables Production Resource management row-scaled cdi slack insertion Software Testing
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Row-scale Composable Disaggregated Infrastructure (CDI) is a heterogeneous high performance computing (HPC) architecture that relocates the GPUs to a single chassis which CPU nodes can then request compute resources from. This is a distinctly different architecture from rack-scaled CDI as the GPUs are accessed over a network rather than existing in the same PCIe domain as the CPUs. Row-scale CDI expands the benefits and flexibility of rack-scaled CDI, while introducing new challenges. For example, with row-scale CDI, one must account for the effects of "slack", a latency in the CPU-to-GPU communication times due to network delays. This work seeks to assess potential challenges with row-scale CDI to determine which factors are most important to consider when deploying a CDI system. Our strong scaling application analyses reveal that there are two types of HPC workloads that may benefit from row-scale CDI; those that are CPU dominant and periodically call on the GPU to do highly parallel tasks and those that are GPU dominant and primarily rely on the CPU to coordinate work. We perform comparisons between the kernel and data transfer characteristics of each application to a slack proxy application which allowed for the development of a mathematical model to predict the performance penalty different applications can face as a result of slack. To illustrate this we profile two applications using our proposed method and find that they pessimistically would see a less than a 1% performance penalty above the effects of crossing the network in an environment which induced 100 µs of slack, or a distance of 20 km at the speed of light in a fibre optic network cable. This demonstrates that both row-scale and cluster-scale CDI are viable technologies from an application performance perspective.
DOI:	10.1109/SCW63240.2024.00201