Towards a performance-portable description of geometric multigrid algorithms using a domain-specific language

High Performance Computing (HPC) systems are nowadays more and more heterogeneous. Different processor types can be found on a single node including accelerators such as Graphics Processing Units (GPUs). To cope with the challenge of programming such complex systems, this work presents a domain-spec...

Full description

Saved in:
Bibliographic Details
Published in:Journal of parallel and distributed computing Vol. 74; no. 12; pp. 3191 - 3201
Main Authors: Membarth, Richard, Reiche, Oliver, Schmitt, Christian, Hannig, Frank, Teich, Jürgen, Stürmer, Markus, Köstler, Harald
Format: Journal Article
Language:English
Published: Elsevier Inc 01.12.2014
Subjects:
ISSN:0743-7315, 1096-0848
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:High Performance Computing (HPC) systems are nowadays more and more heterogeneous. Different processor types can be found on a single node including accelerators such as Graphics Processing Units (GPUs). To cope with the challenge of programming such complex systems, this work presents a domain-specific approach to automatically generate code tailored to different processor types. Low-level CUDA and OpenCL code is generated from a high-level description of an algorithm specified in a Domain-Specific Language (DSL) instead of writing hand-tuned code for GPU accelerators. The DSL is part of the Heterogeneous Image Processing Acceleration (HIPAcc) framework and was extended in this work to handle grid hierarchies in order to model different cycle types. Language constructs are introduced to process and represent data at different resolutions. This allows to describe image processing algorithms that work on image pyramids as well as multigrid methods in the stencil domain. By decoupling the algorithm from its schedule, the proposed approach allows to generate efficient stencil code implementations. Our results show that similar performance compared to hand-tuned codes can be achieved. •DSL extension to handle image pyramids and grid hierarchies.•DSL extension to model different multigrid cycle types.•Generated GPU code shows similar performance compared to hand-tuned implementation.•We apply the algorithm to high dynamic range compression of 2D X-ray images.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2014.08.008