Algorithmic strategies for optimizing the parallel reduction primitive in CUDA
Many general-purpose applications exploit Graphics Processing Units (GPUs) by executing a set of well-known dataparallel primitives. Those primitives are usually invoked from the host many times, so their throughput has a great impact on the performance of the overall system. Thus, the study of nove...
Saved in:
| Published in: | 2012 International Conference on High Performance Computing and Simulation pp. 511 - 519 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.07.2012
|
| Subjects: | |
| ISBN: | 9781467323598, 1467323594 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!

