A High-Level API for Dynamic Load Balancing in Large-Scale Parameter Sweeps

Parameter sweep studies, such as virtual drug screening, may exhibit significant load imbalance during batched execution on large-scale clusters, resulting in idle and thus wasted resources. Work stealing is a popular method for dynamic load balancing in such scenarios. However, we demonstrate that...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of parallel programming Jg. 53; H. 4; S. 25
Hauptverfasser: Salzmann, Philip, Thoman, Peter, Fahringer, Thomas
Format: Journal Article
Sprache:Englisch
Veröffentlicht: New York Springer US 01.08.2025
Springer Nature B.V
Schlagworte:
ISSN:0885-7458, 1573-7640
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Parameter sweep studies, such as virtual drug screening, may exhibit significant load imbalance during batched execution on large-scale clusters, resulting in idle and thus wasted resources. Work stealing is a popular method for dynamic load balancing in such scenarios. However, we demonstrate that work stealing alone falls short in cases where a few ranks generate long running high-cost jobs, particularly towards the end of a computation. To address this challenge, we propose high-cost probing, a mechanism for distributing high-cost jobs across workers early during program execution by leveraging user-provided cost hints. We extend the Celerity programming model for distributed accelerator computing with a high-level API tailored to expressing parameter sweep-style workflows that may benefit from high-cost probing. We demonstrate the effectiveness of our approach on synthetic benchmarks as well as a real-world virtual screening application with a highly irregular workload, achieving a 73 percentage point reduction in load imbalance on 128 GPUs.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0885-7458
1573-7640
DOI:10.1007/s10766-025-00804-4