Chasing Away RAts: Semantics and evaluation for relaxed atomics on heterogeneous systems

An unambiguous and easy-to-understand memory consistency model is crucial for ensuring correct synchronization and guiding future design of heterogeneous systems. In a widely adopted approach, the memory model guarantees sequential consistency (SC) as long as programmers obey certain rules. The popu...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) s. 161 - 174
Hlavní autoři:	Sinclair, Matthew D., Alsop, Johnathan, Adve, Sarita V.
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	ACM 01.06.2017
Témata:	Benchmark testing C++ languages data-race-free models GPGPU Graphics processing units Java memory consistency Optimization relaxed atomics Semantics Synchronization
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	An unambiguous and easy-to-understand memory consistency model is crucial for ensuring correct synchronization and guiding future design of heterogeneous systems. In a widely adopted approach, the memory model guarantees sequential consistency (SC) as long as programmers obey certain rules. The popular data-race-free-0 (DRF0) model exemplifies this SC-centric approach by requiring programmers to avoid data races. Recent industry models, however, have extended such SC-centric models to incorporate relaxed atomics. These extensions can improve performance, but are difficult to specify formally and use correctly. This work addresses the impact of relaxed atomics on consistency models for heterogeneous systems in two ways. First, we introduce a new model, Data-Race-Free-Relaxed (DRFrlx), that extends DRF0 to provide SC-centric semantics for the common use cases of relaxed atomics. Second, we evaluate the performance of relaxed atomics in CPU-GPU systems for these use cases. We find mixed results - for most cases, relaxed atomics provide only a small benefit in execution time, but for some cases, they help significantly (e.g., up to 51% for DRFrlx over DRF0).
DOI:	10.1145/3079856.3080206