Augmentation of Programs with CUDA Streams

A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture, data can be streamed while the computation is still on. Exploiting this feature requires careful orchestration of data transfer and computati...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications s. 855 - 856
Hlavní autori:	Sharmistha, Amilkanthwar, M., Balachandran, S.
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 01.07.2012
Predmet:	CUDA Graphics processing unit Kernel Optimization Parallel processing Pluto Tiles
ISBN:	1467316318, 9781467316316
ISSN:	2158-9178
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	A program that is run on a General Purpose Graphics Processing Unit (GPGPU) has to stall if the data is not resident in the GPGPU. With CUDA 2.0 architecture, data can be streamed while the computation is still on. Exploiting this feature requires careful orchestration of data transfer and computation which typically requires a significant effort from the programmer. We propose an approach of transforming C programs to programs that can make use of CUDA streams. We identify the regions where data transfer and computation can be overlapped by using a polyhedral framework called PLUTO[2]. We use the PLUTO framework to do automatic tiling of source code and use the streaming capabilities to overlap data transfer with computation. Our results show an average speedup of 1.5X over CUDA programs without streaming optimizations.
ISBN:	1467316318 9781467316316
ISSN:	2158-9178
DOI:	10.1109/ISPA.2012.132