A novel MPI+MPI hybrid approach combining MPI-3 shared memory windows and C11/C++11 memory model
•Efficient management of MPI-3 shared memory windows with C++11 memory model.•Comparable performance with state of the art implementations for collective operations.•Improved process synchronization and reduced variance of execution times.•Significant performance gain over flat-MPI for the ghost upd...
Gespeichert in:
| Veröffentlicht in: | Journal of Parallel and Distributed Computing Jg. 157; S. 125 - 144 |
|---|---|
| Hauptverfasser: | , |
| Format: | Journal Article |
| Sprache: | Englisch Japanisch |
| Veröffentlicht: |
Elsevier Inc
01.11.2021
Elsevier BV |
| Schlagworte: | |
| ISSN: | 0743-7315, 1096-0848 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | •Efficient management of MPI-3 shared memory windows with C++11 memory model.•Comparable performance with state of the art implementations for collective operations.•Improved process synchronization and reduced variance of execution times.•Significant performance gain over flat-MPI for the ghost update.•Performance advantage over higher-level RMA synchronization primitives.
The increase of the number of cores in processors used in modern cluster architectures advocates hybrid parallel programming, combining Message Passing Interface (MPI) for internode operations and a shared memory treatment of intranode operations. We propose an MPI+MPI hybrid approach to parallel programming in which shared memory operations are managed by the combination of MPI shared memory windows introduced with MPI-3, C11/C++11 atomic operations and the associated multi-thread memory model. We illustrate the method on fundamental parallel operations (barrier, reduction) and on the ghost update, which is prevalent in many parallel numerical methods. The performance tests on Reedbush-U and Oakbridge-CX systems show that using the C11/C++11 memory model to manage shared memory windows can achieve levels of performance comparable to state of the art MPI implementations, while reducing the variance of execution times as well as increasing the level of synchronization between processes, especially in multiple nodes environments. It also reduces significantly the execution time of ghost updates compared to flat MPI, and the synchronization of shared data with the C++11 memory model is observed to be more efficient than other synchronization methods based on RMA utilities. |
|---|---|
| ISSN: | 0743-7315 1096-0848 |
| DOI: | 10.1016/j.jpdc.2021.06.008 |