Exploiting concurrent kernel execution on graphic processing units
Graphics processing units (CPUs) have been accepted as a powerful and viable coprocessor solution in high-performance computing domain. In order to maximize the benefit of CPUs for a multicore platform, a mechanism is needed for CPU threads in a parallel application to share this computing resource...
Uložené v:
| Vydané v: | 2011 International Conference on High Performance Computing and Simulation s. 24 - 32 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English Japanese |
| Vydavateľské údaje: |
IEEE
01.07.2011
|
| Predmet: | |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Graphics processing units (CPUs) have been accepted as a powerful and viable coprocessor solution in high-performance computing domain. In order to maximize the benefit of CPUs for a multicore platform, a mechanism is needed for CPU threads in a parallel application to share this computing resource for efficient execution. NVIDIA's Fermi architecture pioneers the feature of concurrent kernel execution; however, only kernels of the same thread context can execute in parallel. In order to get the best use of a GPU device in a multi-threaded application environment, this paper explores the techniques to effectively share a context, i.e., context funneling, which could be done either manually at application level, or automatically at the GPU runtime starting from CUDA v4.0. For synthetic microbenchmark tests, we find that both funneling mechanisms are more capable of exploring the benefit of concurrent kernel execution than traditional context switching, therefore improving the overall application performance. We also find that the manual funneling mechanism provides the highest performance and more explicit control, while CUDA v4.0 provides better productivity with good performance. Finally, we assess the impact of such techniques on a compact application benchmark, SSCA#3 - SAR sensor processing. |
|---|---|
| DOI: | 10.1109/HPCSim.2011.5999803 |