A High-Level API for Dynamic Load Balancing in Large-Scale Parameter Sweeps

Parameter sweep studies, such as virtual drug screening, may exhibit significant load imbalance during batched execution on large-scale clusters, resulting in idle and thus wasted resources. Work stealing is a popular method for dynamic load balancing in such scenarios. However, we demonstrate that...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:International journal of parallel programming Ročník 53; číslo 4; s. 25
Hlavní autoři: Salzmann, Philip, Thoman, Peter, Fahringer, Thomas
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Springer US 01.08.2025
Springer Nature B.V
Témata:
ISSN:0885-7458, 1573-7640
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Parameter sweep studies, such as virtual drug screening, may exhibit significant load imbalance during batched execution on large-scale clusters, resulting in idle and thus wasted resources. Work stealing is a popular method for dynamic load balancing in such scenarios. However, we demonstrate that work stealing alone falls short in cases where a few ranks generate long running high-cost jobs, particularly towards the end of a computation. To address this challenge, we propose high-cost probing, a mechanism for distributing high-cost jobs across workers early during program execution by leveraging user-provided cost hints. We extend the Celerity programming model for distributed accelerator computing with a high-level API tailored to expressing parameter sweep-style workflows that may benefit from high-cost probing. We demonstrate the effectiveness of our approach on synthetic benchmarks as well as a real-world virtual screening application with a highly irregular workload, achieving a 73 percentage point reduction in load imbalance on 128 GPUs.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0885-7458
1573-7640
DOI:10.1007/s10766-025-00804-4