The Random Address Shift to Reduce the Memory Access Congestion on the Discrete Memory Machine

The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of memory access of the streaming multiprocessor on CUDA-enabled GPUs. The DMM has w memory banks that constitute a shared memory, and w threads in a warp try to access them at the same time. Howeve...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:International Symposium on Computing and Networking (Online) s. 95 - 103
Hlavní autoři: Nakano, Koji, Matsumae, Susumu, Ito, Yasuaki
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.12.2013
Témata:
ISSN:2379-1888
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of memory access of the streaming multiprocessor on CUDA-enabled GPUs. The DMM has w memory banks that constitute a shared memory, and w threads in a warp try to access them at the same time. However, memory access requests destined for the same memory bank are processed sequentially. Hence, it is very important for developing efficient algorithms to reduce the memory access congestion, the maximum number of memory access requests destined for the same bank. The memory access congestion takes value between 1 and w. The main contribution of this paper is to present a novel algorithmic technique called the random address shift that reduces the memory access congestion. We show that the memory access congestion is expected O(log w/log log w) for any memory access requests including malicious ones by a warp of w threads. The simulation results show that the expected congestion for w=32 threads is only 3.436. Since the malicious memory access requests destined for the same bank take congestion 32, our random address shift technique substantially reduces the memory access congestion. We have applied the random address shift technique to matrix transpose algorithms. The experimental results on GeForce GTX Titan show that the random address shift technique is practical and can accelerate the straightforward matrix transpose algorithms by a factor of 5.
ISSN:2379-1888
DOI:10.1109/CANDAR.2013.21