Self adaptive run time scheduling for the automatic parallelization of loops with the C2I14TC/SL compiler

In this paper we suggest a new approach for solving the hyperplane problem, also known as awavefronta computation. In direct contrast to most approaches that reduce the problem to an integer programming one or use several heuristic approaches, we gather information at compile time and delegate the s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Parallel computing Jg. 39; H. 10; S. 603 - 614
Hauptverfasser:	Saougkos, Dimitris, Manis, George
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	01.10.2013
Schlagworte:	Algorithms Compilers Computation Diffusion Hyperplanes Mathematical models Parallel processing Run time (computers)
ISSN:	0167-8191
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper we suggest a new approach for solving the hyperplane problem, also known as awavefronta computation. In direct contrast to most approaches that reduce the problem to an integer programming one or use several heuristic approaches, we gather information at compile time and delegate the solution to run time. We present an adaptive technique which intuitively calculates which new threads will be able to be executed in the next computation cycle based on which threads are executed in the current one. Moving the solution to the run time environment provides us with higher versatility alongside a perfect solution of the underlying hyperplane pattern being discovered without the need to perform any prior calculations. The main contribution of this paper is the presentation of the self adaptive algorithm, an algorithm which does not need to know the tile size (which controls the granularity of parallelism) beforehand. Instead, the algorithm itself adapts the tile size while the program is running in order to achieve optimal efficiency. Experimental results show that if we have a sufficient number of parallel processing elements to diffuse the scheduleras workload, its overhead becomes low enough that it is overshadowed by the net gain in parallelism. For the implementation of the algorithm we suggest, and for our experimentations our parallelizing compiler C2 mu TC/SL is used, a C parallelizing compiler which maps sequential programs on the SVP processor and model.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 content type line 23 ObjectType-Feature-2
ISSN:	0167-8191
DOI:	10.1016/j.parco.2013.07.001