Towards an Optimized Heterogeneous Distributed Task Scheduler in OpenMP Cluster

This paper addresses the challenges of optimizing task scheduling for a distributed, task-based execution model in OpenMP for cluster computing environments. Traditional OpenMP implementations are primarily designed for shared-memory parallelism and offer limited control over task scheduling. Howeve...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis S. 1894 - 1903
Hauptverfasser:	Neveu, Remy, Ceccato, Rodrigo, Leite, Gustavo, Araujo, Guido, Diaz, Jose M. Monsalve, Yviquel, Herve
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 17.11.2024
Schlagworte:	Cluster computing Distributed architectures Dynamic scheduling Iterative methods Large-scale systems Optimal scheduling Parallel processing Parallel programming Parallel systems Resource management Runtime Scheduling algorithms Scheduling and task partitioning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper addresses the challenges of optimizing task scheduling for a distributed, task-based execution model in OpenMP for cluster computing environments. Traditional OpenMP implementations are primarily designed for shared-memory parallelism and offer limited control over task scheduling. However, improved scheduling mechanisms are critical to achieving performance and portability in distributed and heterogeneous environments. OpenMP Cluster (OMPC) was introduced to overcome these limitations, extending OpenMP with the Heterogeneous Earliest Finish Time (HEFT) task scheduling algorithm tailored for large-scale systems. To improve scheduling and enable better system utilization, the runtime system must resolve challenges such as changes in the application balance, amount of parallelism, and varying communication latencies.This work presents three key contributions: first, the refactoring of the OMPC runtime to unify task scheduling across devices and hosts; second, the optimization of the HEFT-based scheduling algorithm to ensure efficient task execution in distributed environments; and third, an extensive evaluation of Work Stealing and HEFT scheduling mechanisms in real-world clusters. While the HEFT implementation in OMPC is not fully optimized, this work provides a significant step toward improving distributed task scheduling in cluster computing, offering insights and incremental advancements that support the development of scalable and high-performance applications. Results show improvements of up to 24% in scheduling time while opening up to more extensions in the scheduling methods.
DOI:	10.1109/SCW63240.2024.00239