GPARS: Graph predictive algorithm for efficient resource scheduling in heterogeneous GPU clusters

Efficient resource scheduling in heterogeneous graphics processing unit (GPU) clusters are critical for maximizing system performance and optimizing resource utilization. However, prior research in resource scheduling algorithms typically employed machine learning (ML) algorithms to estimate job dur...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Future generation computer systems Ročník 152; s. 127 - 137
Hlavní autoři:	Wang, Sheng, Chen, Shiping, Shi, Yumei
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 01.03.2024
Témata:	Graph attention networks Heterogeneous GPU clusters Job duration Resource scheduling Resource utilization Waiting time Graph attention networks Resource scheduling Resource utilization Heterogeneous GPU clusters Job duration Waiting time
ISSN:	0167-739X, 1872-7115
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Efficient resource scheduling in heterogeneous graphics processing unit (GPU) clusters are critical for maximizing system performance and optimizing resource utilization. However, prior research in resource scheduling algorithms typically employed machine learning (ML) algorithms to estimate job durations or GPU utilization in the cluster based on training progress and task speed. Regrettably, these studies often overlooked the performance variations among different GPU types within these clusters, as well as the presence of spatiotemporal correlations among jobs. To address these limitations, this paper introduces the graph predictive algorithm for efficient resource scheduling (GPARS) designed specifically for heterogeneous clusters. GPARS leverages spatiotemporal correlations among jobs and utilizes graph attention networks (GANs) for precise job duration prediction. Building upon the prediction results, we develop a dynamic objective function to allocate suitable GPU types for newly submitted jobs. By conducting a comprehensive analysis of Alibaba’s heterogeneous GPU cluster, we delve into the impact of GPU capacity and type on job completion time (JCT) and resource utilization. Our evaluation, using real traces from Alibaba and Philly, substantiates the effectiveness of GPARS. It achieves a remarkable 10.29% reduction in waiting time and an average improvement of 7.47% in resource utilization compared to the original scheduling method. These findings underscore GPARS’s superior performance in enhancing resource scheduling within heterogeneous GPU clusters. •After analyzing Alibaba’s GPU cluster, we study GPU capacity and type’s impact on job completion time and resource use. Our findings highlight overcrowding in weaker GPUs and load imbalance in high-end machines.•Our GAN approach outperforms, achieving impressive RMSE and MAE: 0.0237 and 0.0073. Our study confirms spatiotemporal correlations in job durations within the cluster.•Leveraging our predictions, we introduce GPARS, an efficient scheduling approach for job-GPU allocation. Evaluated with Alibaba and Philly traces, GPARS substantially cuts waiting time by 10.29
ISSN:	0167-739X 1872-7115
DOI:	10.1016/j.future.2023.10.022