Modeling Algorithmic Skeletons for Automatic Parallelization Using Template Metaprogramming

This article presents a framework for algorithmic skeletons that aims at representing a whole algorithm, both its sequential and possibly parallelizable parts, in order to enable making global decisions about its implementation. With our modeling, a skeleton is described by an algorithmic structure...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2019 International Conference on High Performance Computing & Simulation (HPCS) S. 265 - 272
Hauptverfasser: Pereda, Alexis, Hill, David R.C., Mazel, Claude, Bachelet, Bruno
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.07.2019
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This article presents a framework for algorithmic skeletons that aims at representing a whole algorithm, both its sequential and possibly parallelizable parts, in order to enable making global decisions about its implementation. With our modeling, a skeleton is described by an algorithmic structure and a data flow graph, built from the composition of bones and other skeletons. We introduce this notion of bones which represents elementary sequential or parallel patterns whose implementation is available (from the library or designed by well-aware developers), whereas skeletons are automatically implemented from their description. The proposed design, implemented with Template Metaprogramming (TMP), able to operate both at compile- and run-time, allows implementing new bones, describing new skeletons, or simply the instantiation of a skeleton by providing muscles in the form of sequential functions.Once a skeleton is instantiated, one can decide to generate either a sequential or a parallel code of the algorithm. To optimize the parallelization process, we propose orchestrators, in the form of C++ templates that can analyze a skeleton at compile-time and tune its execution.A C++ library-based solution is presented, and its mechanisms and usage are illustrated by implementing a GRASPxELS algorithm, a common OR metaheuristic, that enables two levels of parallelism. Performance results are shown to assert that this approach presents negligible run-time overhead.
DOI:10.1109/HPCS48598.2019.9188128