A novel MPI+MPI hybrid approach combining MPI-3 shared memory windows and C11/C++11 memory model

•Efficient management of MPI-3 shared memory windows with C++11 memory model.•Comparable performance with state of the art implementations for collective operations.•Improved process synchronization and reduced variance of execution times.•Significant performance gain over flat-MPI for the ghost upd...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of Parallel and Distributed Computing Jg. 157; S. 125 - 144
Hauptverfasser: Quaranta, Lionel, Maddegedara, Lalith
Format: Journal Article
Sprache:Englisch
Japanisch
Veröffentlicht: Elsevier Inc 01.11.2021
Elsevier BV
Schlagworte:
ISSN:0743-7315, 1096-0848
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Efficient management of MPI-3 shared memory windows with C++11 memory model.•Comparable performance with state of the art implementations for collective operations.•Improved process synchronization and reduced variance of execution times.•Significant performance gain over flat-MPI for the ghost update.•Performance advantage over higher-level RMA synchronization primitives. The increase of the number of cores in processors used in modern cluster architectures advocates hybrid parallel programming, combining Message Passing Interface (MPI) for internode operations and a shared memory treatment of intranode operations. We propose an MPI+MPI hybrid approach to parallel programming in which shared memory operations are managed by the combination of MPI shared memory windows introduced with MPI-3, C11/C++11 atomic operations and the associated multi-thread memory model. We illustrate the method on fundamental parallel operations (barrier, reduction) and on the ghost update, which is prevalent in many parallel numerical methods. The performance tests on Reedbush-U and Oakbridge-CX systems show that using the C11/C++11 memory model to manage shared memory windows can achieve levels of performance comparable to state of the art MPI implementations, while reducing the variance of execution times as well as increasing the level of synchronization between processes, especially in multiple nodes environments. It also reduces significantly the execution time of ghost updates compared to flat MPI, and the synchronization of shared data with the C++11 memory model is observed to be more efficient than other synchronization methods based on RMA utilities.
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2021.06.008