On the Application Task Granularity and the Interplay with the Scheduling Overhead in Many-Core Shared Memory Systems

Task-based programming models are considered one of the most promising programming model approaches for exascale supercomputers because of their ability to dynamically react to changing conditions and reassign work to processing elements. One question, however, remains unsolved: what should the task...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings / IEEE International Conference on Cluster Computing S. 428 - 437
Hauptverfasser: Akhmetova, Dana, Kestor, Gokcen, Gioiosa, Roberto, Markidis, Stefano, Laure, Erwin
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.09.2015
Schlagworte:
ISSN:1552-5244
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Task-based programming models are considered one of the most promising programming model approaches for exascale supercomputers because of their ability to dynamically react to changing conditions and reassign work to processing elements. One question, however, remains unsolved: what should the task granularity of task-based applications be? Fine-grained tasks offer more opportunities to balance the system and generally result in higher system utilization. However, they also induce in large scheduling overhead. The impact of scheduling overhead on coarse-grained tasks is lower, but large systems may result imbalanced and underutilized. In this work we propose a methodology to analyze the interplay between application task granularity and scheduling overhead. Our methodology is based on three main points: 1) a novel task algorithm that analyzes an application directed acyclic graph (DAG) and aggregates tasks, 2) a fast and precise emulator to analyze the application behavior on systems with up to 1,024 cores, 3) a comprehensive sensitivity analysis of application performance and scheduling overhead breakdown. Our results show that there is an optimal task granularity between 1.2×10 4 and 10×10 4 cycles for the representative schedulers. Moreover, our analysis indicates that a suitable scheduler for exascale task-based applications should employ a best-effort local scheduler and a sophisticated remote scheduler to move tasks across worker threads.
ISSN:1552-5244
DOI:10.1109/CLUSTER.2015.65