Pushing the Boundaries of Small Tasks: Scalable Low-Overhead Data-Flow Programming in TTG

Shared memory parallel programming models strive to provide low-overhead execution environments. Task-based programming models, in particular, are well-suited to cope with the ubiquitous multi- and many-core systems since they allow applications to express all available concurrency to a scheduler, w...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings / IEEE International Conference on Cluster Computing s. 117 - 128
Hlavní autori: Schuchart, Joseph, Nookala, Poornima, Herault, Thomas, Valeev, Edward F., Bosilca, George
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.09.2022
Predmet:
ISSN:2168-9253
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Shared memory parallel programming models strive to provide low-overhead execution environments. Task-based programming models, in particular, are well-suited to cope with the ubiquitous multi- and many-core systems since they allow applications to express all available concurrency to a scheduler, which is tasked with exploiting the available hardware resources. It is general consensus that atomic operations should be preferred over locks and mutexes to avoid inter-thread serialization and the resulting loss in efficiency. However, even atomic operations may serialize threads if not used judiciously. In this work, we will discuss several optimizations applied to TTG and the underlying PaRSEC runtime system aiming at removing contentious atomic operations to reduce the overhead of task management to a few hundred clock cycles. The result is an optimized data-flow programming system that seamlessly scales from a single node to distributed execution and which is able to compete with OpenMP in shared memory.
ISSN:2168-9253
DOI:10.1109/CLUSTER51413.2022.00026