Automated Scheduling Algorithm Selection and Chunk Parameter Calculation in OpenMP

Increasing node and cores-per-node counts in supercomputers render scheduling and load balancing critical for exploiting parallelism. OpenMP applications can achieve high performance via careful selection of scheduling kind and chunk parameters on a per-loop, per-application, and per-system basis fr...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems Vol. 33; no. 12; pp. 4383 - 4394
Main Authors: Mohammed, Ali, Korndorfer, Jonas H. Muller, Eleliemy, Ahmed, Ciorba, Florina M.
Format: Journal Article
Language:English
Published: New York IEEE 01.12.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1045-9219, 1558-2183
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Increasing node and cores-per-node counts in supercomputers render scheduling and load balancing critical for exploiting parallelism. OpenMP applications can achieve high performance via careful selection of scheduling kind and chunk parameters on a per-loop, per-application, and per-system basis from a portfolio of advanced scheduling algorithms (Korndörfer et al. , 2022). This selection approach is time-consuming, challenging, and may need to change during execution. We propose Auto4OMP , a novel approach for automated load balancing of OpenMP applications. With Auto4OMP, we introduce three scheduling algorithm selection methods and an expert-defined chunk parameter for OpenMP's schedule clause's kind and chunk , respectively. Auto4OMP extends the OpenMP schedule(auto) and chunk parameter implementation in LLVM's OpenMP runtime library to automatically select a scheduling algorithm and calculate a chunk parameter during execution. Loop characteristics are inferred in Auto4OMP from the loop execution over the application's time-steps. The experiments performed in this work show that Auto4OMP improves applications performance by up to <inline-formula><tex-math notation="LaTeX">11\%</tex-math> <mml:math><mml:mrow><mml:mn>11</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="ciorba-ieq1-3189270.gif"/> </inline-formula> compared to LLVM's schedule(auto) implementation and outperforms manual selection. Auto4OMP improves MPI+OpenMP applications performance by explicitly minimizing thread- and implicitly reducing process-load imbalance.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2022.3189270