MPI detach — Towards automatic asynchronous local completion

When aiming for large-scale parallel computing, waiting time due to network latency, synchronization, and load imbalance are the primary opponents of high parallel efficiency. A common approach to hide latency with computation is the use of non-blocking communication. In the presence of a consistent...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Parallel computing Jg. 109; S. 102859
Hauptverfasser: Protze, Joachim, Hermanns, Marc-André, Müller, Matthias S., Nguyen, Van Man, Jaeger, Julien, Saillard, Emmanuelle, Carribault, Patrick, Barthou, Denis
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.03.2022
Elsevier
Schlagworte:
ISSN:0167-8191, 1872-7336
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:When aiming for large-scale parallel computing, waiting time due to network latency, synchronization, and load imbalance are the primary opponents of high parallel efficiency. A common approach to hide latency with computation is the use of non-blocking communication. In the presence of a consistent load imbalance, synchronization cost is just the visible symptom of the load imbalance. Tasking approaches as in OpenMP, TBB, OmpSs, or C++20 coroutines promise to expose a higher degree of concurrency, which can be distributed on available execution units and significantly increase load balance. Available MPI non-blocking functionality does not integrate seamlessly into such tasking parallelization. In this work, we present a slim extension of the MPI interface to allow seamless integration of non-blocking communication with available concepts of asynchronous execution in OpenMP and C++. Using our concept allows to span task dependency graphs for asynchronous execution over the full distributed memory application. We furthermore investigate compile-time analysis necessary to transform an application using blocking MPI communication into an application integrating OpenMP tasks with our proposed MPI interface extension. •MPI interface extensions to transfer request completion back to the MPI library.•callback-driven notification of asynchronous completion back to the application.•prototype implementation of the interface independent of the MPI implementation.•integration of MPI communication into OpenMP task programming.•compile-time analysis to convert blocking communication into non-blocking.
ISSN:0167-8191
1872-7336
DOI:10.1016/j.parco.2021.102859