Efficient Execution of OpenMP on GPUs

OpenMP is the preferred choice for CPU parallelism in High-Performance-Computing (HPC) applications written in C, C++, or Fortran. As HPC systems became heterogeneous, OpenMP introduced support for accelerator offloading via the target directive. This allowed porting existing (CPU) code onto GPUs, i...

Full description

Saved in:
Bibliographic Details
Published in:2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) pp. 41 - 52
Main Authors: Huber, Joseph, Cornelius, Melanie, Georgakoudis, Giorgis, Tian, Shilei, Diaz, Jose M Monsalve, Dinel, Kuter, Chapman, Barbara, Doerfert, Johannes
Format: Conference Proceeding
Language:English
Published: IEEE 02.04.2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract OpenMP is the preferred choice for CPU parallelism in High-Performance-Computing (HPC) applications written in C, C++, or Fortran. As HPC systems became heterogeneous, OpenMP introduced support for accelerator offloading via the target directive. This allowed porting existing (CPU) code onto GPUs, including well established CPU parallelism paradigms. However, there are architectural differences between CPU and GPU execution which make common patterns, like forking and joining threads, single threaded execution, or sharing of local (stack) variables, in general costly on the latter. So far it was left to the user to identify and avoid non-efficient code patterns, most commonly by writing their OpenMP offloading codes in a kernel-language style which resembles CUDA more than it does traditional OpenMP.In this work we present OpenMP-aware program analyses and optimizations that allow efficient execution of the generic, CPU-centric parallelism model provided by OpenMP on GPUs. Our implementation in LLVM/Clang maps various common OpenMP patterns found in real world applications efficiently to the GPU. As static analysis is inherently limited we provide actionable and informative feedback to the user about the performed and missed optimizations, together with ways for the user to annotate the program for better results. Our extensive evaluation using several HPC proxy applications shows significantly improved GPU kernel times and reduction in resources requirements, such as GPU registers.
AbstractList OpenMP is the preferred choice for CPU parallelism in High-Performance-Computing (HPC) applications written in C, C++, or Fortran. As HPC systems became heterogeneous, OpenMP introduced support for accelerator offloading via the target directive. This allowed porting existing (CPU) code onto GPUs, including well established CPU parallelism paradigms. However, there are architectural differences between CPU and GPU execution which make common patterns, like forking and joining threads, single threaded execution, or sharing of local (stack) variables, in general costly on the latter. So far it was left to the user to identify and avoid non-efficient code patterns, most commonly by writing their OpenMP offloading codes in a kernel-language style which resembles CUDA more than it does traditional OpenMP.In this work we present OpenMP-aware program analyses and optimizations that allow efficient execution of the generic, CPU-centric parallelism model provided by OpenMP on GPUs. Our implementation in LLVM/Clang maps various common OpenMP patterns found in real world applications efficiently to the GPU. As static analysis is inherently limited we provide actionable and informative feedback to the user about the performed and missed optimizations, together with ways for the user to annotate the program for better results. Our extensive evaluation using several HPC proxy applications shows significantly improved GPU kernel times and reduction in resources requirements, such as GPU registers.
Author Chapman, Barbara
Diaz, Jose M Monsalve
Huber, Joseph
Tian, Shilei
Doerfert, Johannes
Cornelius, Melanie
Dinel, Kuter
Georgakoudis, Giorgis
Author_xml – sequence: 1
  givenname: Joseph
  surname: Huber
  fullname: Huber, Joseph
  email: huberjn@ornl.gov
  organization: Oak Ridge National Laboratory,Oak Ridge,USA
– sequence: 2
  givenname: Melanie
  surname: Cornelius
  fullname: Cornelius, Melanie
  email: mdooley1@hawk.iit.edu
  organization: Illinois Institute of Technology,Chicago,USA
– sequence: 3
  givenname: Giorgis
  surname: Georgakoudis
  fullname: Georgakoudis, Giorgis
  email: georgakoudis1@llnl.gov
  organization: Lawrence Livermore National Laboratory,Livermore,USA
– sequence: 4
  givenname: Shilei
  surname: Tian
  fullname: Tian, Shilei
  email: shilei.tian@stonybrook.edu
  organization: Stony Brook University,Stony Brook,USA
– sequence: 5
  givenname: Jose M Monsalve
  surname: Diaz
  fullname: Diaz, Jose M Monsalve
  email: jmonsalvediaz@anl.gov
  organization: Argonne National Laboratory,Lemont,USA
– sequence: 6
  givenname: Kuter
  surname: Dinel
  fullname: Dinel, Kuter
  email: kuterdinel@gmail.com
  organization: Düzce University,Düzce,Turkey
– sequence: 7
  givenname: Barbara
  surname: Chapman
  fullname: Chapman, Barbara
  email: barbara.chapman@stonybrook.edu
  organization: Stony Brook University,Stony Brook,USA
– sequence: 8
  givenname: Johannes
  surname: Doerfert
  fullname: Doerfert, Johannes
  email: jdoerfert@anl.gov
  organization: Argonne National Laboratory,Lemont,USA
BookMark eNotj8tKw0AUQEdQ0D6-QIRsXCbeed3MXUqIUaiki3ZdJtM7MKCT0kTQv1ewq8PZHDgLcZ3HzEI8SKikBHpqut5qAlUpUKqi2khFcCUWEtEasM64W7GepjSAcajrGuSdeGxjTCFxnov2m8PXnMZcjLHoT5zft8WfdNv9tBI30X9MvL5wKfYv7a55LTd999Y8b0qv0MwlaxMA0YUgA1pFxg4UvAvokY_DUBv2QEeSpMCBCRjJs5Xak47BEJBeivv_bmLmw-mcPv3553A50b9YPj9L
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CGO53902.2022.9741290
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1665405848
9781665405843
EndPage 52
ExternalDocumentID 9741290
Genre orig-research
GrantInformation_xml – fundername: Battelle
  funderid: 10.13039/100000993
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
GUFHI
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a264t-e34c0668cc1c652945b9ca8c6a6edbb74ea09d91920804c6f9ae513a93fc49093
IEDL.DBID RIE
ISICitedReferencesCount 22
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000827636600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:36:12 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a264t-e34c0668cc1c652945b9ca8c6a6edbb74ea09d91920804c6f9ae513a93fc49093
OpenAccessLink https://www.osti.gov/biblio/1890075
PageCount 12
ParticipantIDs ieee_primary_9741290
PublicationCentury 2000
PublicationDate 2022-April-2
PublicationDateYYYYMMDD 2022-04-02
PublicationDate_xml – month: 04
  year: 2022
  text: 2022-April-2
  day: 02
PublicationDecade 2020
PublicationTitle 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
PublicationTitleAbbrev CGO
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib048637701
Score 2.308359
Snippet OpenMP is the preferred choice for CPU parallelism in High-Performance-Computing (HPC) applications written in C, C++, or Fortran. As HPC systems became...
SourceID ieee
SourceType Publisher
StartPage 41
SubjectTerms Analytical models
Codes
GPU
Graphics processing units
LLVM
Offloading
OpenMP
Optimization
Parallel processing
Runtime
Semantics
Static analysis
Title Efficient Execution of OpenMP on GPUs
URI https://ieeexplore.ieee.org/document/9741290
WOSCitedRecordID wos000827636600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4A8eBJDRjf2YPeLLTbbrs9E8CD4h4k4Ua6s7OJFzAIxp9vW1aMiRdvbZNJM31kvs7MNwW4RU6pVKZkqUFiStSa5d6QMlRVXdpcVqKKO_1optN8PrdFC-73XBgiisln1A_NGMuvVrgNrrKBx77BbdKGtjFmx9X6Pjsq19IYLhqSjuB2MJw8Z_5FH9hWadpvZH99ohJtyPjof7MfQ--HjJcUezNzAi1aduFuFEs_eIFk9EkYT0-yqpOQH_JUJL4zKWbvPZiNRy_DB9b8eMCcByYbRlKhxwA5okCdpVZlpUWXo3aaqrI0ihy3lfWozAM9hbq2jjIhnZU1KsutPIXOcrWkM0hCocA6I-sBgVX-1jrunAlx1lIb7kR2Dt2g4uJtV9Ri0Wh38ffwJRyGVYwpK-kVdDbrLV3DAX5sXt_XN3EnvgB5jIb7
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qFfSk0opv96A3tya72WRzLn2Ibd1DC72V7OwseGmlD_Hnm6RrRfDiLQkMYfJgvszMNwG4R0ZRLFQeRgopFLyUYWoNaYiiKHOdxgUv_E4P1GiUTqc6q8HjjgtDRD75jFqu6WP5xQI3zlX2ZLGvc5vswX4iRMS3bK3v0yNSGSvFeEXT4Uw_tXuviX3TO75VFLUq6V_fqHgr0j3-3_wn0Pyh4wXZztCcQo3mDXjo-OIPViDofBL68xMsysBliAyzwHZ62WTVhEm3M273w-rPg9BYaLIOKRZoUUCKyFEmkRZJrtGkKI2kIs-VIMN0oS0us1BPoCy1oYTHRsclCs10fAb1-WJO5xC4UoFlQtpCAi3svTXMGOUirblUzPDkAhpOxdn7tqzFrNLu8u_hOzjsj4eD2eB59HIFR25FfQJLdA319XJDN3CAH-u31fLW78oXtyuKQg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE%2FACM+International+Symposium+on+Code+Generation+and+Optimization+%28CGO%29&rft.atitle=Efficient+Execution+of+OpenMP+on+GPUs&rft.au=Huber%2C+Joseph&rft.au=Cornelius%2C+Melanie&rft.au=Georgakoudis%2C+Giorgis&rft.au=Tian%2C+Shilei&rft.date=2022-04-02&rft.pub=IEEE&rft.spage=41&rft.epage=52&rft_id=info:doi/10.1109%2FCGO53902.2022.9741290&rft.externalDocID=9741290