Efficient Execution of OpenMP on GPUs

OpenMP is the preferred choice for CPU parallelism in High-Performance-Computing (HPC) applications written in C, C++, or Fortran. As HPC systems became heterogeneous, OpenMP introduced support for accelerator offloading via the target directive. This allowed porting existing (CPU) code onto GPUs, i...

Full description

Saved in:

Bibliographic Details
Published in:	2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) pp. 41 - 52
Main Authors:	Huber, Joseph, Cornelius, Melanie, Georgakoudis, Giorgis, Tian, Shilei, Diaz, Jose M Monsalve, Dinel, Kuter, Chapman, Barbara, Doerfert, Johannes
Format:	Conference Proceeding
Language:	English
Published:	IEEE 02.04.2022
Subjects:	Analytical models Codes GPU Graphics processing units LLVM Offloading OpenMP Optimization Parallel processing Runtime Semantics Static analysis
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	OpenMP is the preferred choice for CPU parallelism in High-Performance-Computing (HPC) applications written in C, C++, or Fortran. As HPC systems became heterogeneous, OpenMP introduced support for accelerator offloading via the target directive. This allowed porting existing (CPU) code onto GPUs, including well established CPU parallelism paradigms. However, there are architectural differences between CPU and GPU execution which make common patterns, like forking and joining threads, single threaded execution, or sharing of local (stack) variables, in general costly on the latter. So far it was left to the user to identify and avoid non-efficient code patterns, most commonly by writing their OpenMP offloading codes in a kernel-language style which resembles CUDA more than it does traditional OpenMP.In this work we present OpenMP-aware program analyses and optimizations that allow efficient execution of the generic, CPU-centric parallelism model provided by OpenMP on GPUs. Our implementation in LLVM/Clang maps various common OpenMP patterns found in real world applications efficiently to the GPU. As static analysis is inherently limited we provide actionable and informative feedback to the user about the performed and missed optimizations, together with ways for the user to annotate the program for better results. Our extensive evaluation using several HPC proxy applications shows significantly improved GPU kernel times and reduction in resources requirements, such as GPU registers.
AbstractList	OpenMP is the preferred choice for CPU parallelism in High-Performance-Computing (HPC) applications written in C, C++, or Fortran. As HPC systems became heterogeneous, OpenMP introduced support for accelerator offloading via the target directive. This allowed porting existing (CPU) code onto GPUs, including well established CPU parallelism paradigms. However, there are architectural differences between CPU and GPU execution which make common patterns, like forking and joining threads, single threaded execution, or sharing of local (stack) variables, in general costly on the latter. So far it was left to the user to identify and avoid non-efficient code patterns, most commonly by writing their OpenMP offloading codes in a kernel-language style which resembles CUDA more than it does traditional OpenMP.In this work we present OpenMP-aware program analyses and optimizations that allow efficient execution of the generic, CPU-centric parallelism model provided by OpenMP on GPUs. Our implementation in LLVM/Clang maps various common OpenMP patterns found in real world applications efficiently to the GPU. As static analysis is inherently limited we provide actionable and informative feedback to the user about the performed and missed optimizations, together with ways for the user to annotate the program for better results. Our extensive evaluation using several HPC proxy applications shows significantly improved GPU kernel times and reduction in resources requirements, such as GPU registers.
Author	Chapman, Barbara Diaz, Jose M Monsalve Huber, Joseph Tian, Shilei Doerfert, Johannes Cornelius, Melanie Dinel, Kuter Georgakoudis, Giorgis
Author_xml	– sequence: 1 givenname: Joseph surname: Huber fullname: Huber, Joseph email: huberjn@ornl.gov organization: Oak Ridge National Laboratory,Oak Ridge,USA – sequence: 2 givenname: Melanie surname: Cornelius fullname: Cornelius, Melanie email: mdooley1@hawk.iit.edu organization: Illinois Institute of Technology,Chicago,USA – sequence: 3 givenname: Giorgis surname: Georgakoudis fullname: Georgakoudis, Giorgis email: georgakoudis1@llnl.gov organization: Lawrence Livermore National Laboratory,Livermore,USA – sequence: 4 givenname: Shilei surname: Tian fullname: Tian, Shilei email: shilei.tian@stonybrook.edu organization: Stony Brook University,Stony Brook,USA – sequence: 5 givenname: Jose M Monsalve surname: Diaz fullname: Diaz, Jose M Monsalve email: jmonsalvediaz@anl.gov organization: Argonne National Laboratory,Lemont,USA – sequence: 6 givenname: Kuter surname: Dinel fullname: Dinel, Kuter email: kuterdinel@gmail.com organization: Düzce University,Düzce,Turkey – sequence: 7 givenname: Barbara surname: Chapman fullname: Chapman, Barbara email: barbara.chapman@stonybrook.edu organization: Stony Brook University,Stony Brook,USA – sequence: 8 givenname: Johannes surname: Doerfert fullname: Doerfert, Johannes email: jdoerfert@anl.gov organization: Argonne National Laboratory,Lemont,USA
BookMark	eNotj8tKw0AUQEdQ0D6-QIRsXCbeed3MXUqIUaiki3ZdJtM7MKCT0kTQv1ewq8PZHDgLcZ3HzEI8SKikBHpqut5qAlUpUKqi2khFcCUWEtEasM64W7GepjSAcajrGuSdeGxjTCFxnov2m8PXnMZcjLHoT5zft8WfdNv9tBI30X9MvL5wKfYv7a55LTd999Y8b0qv0MwlaxMA0YUgA1pFxg4UvAvokY_DUBv2QEeSpMCBCRjJs5Xak47BEJBeivv_bmLmw-mcPv3553A50b9YPj9L
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/CGO53902.2022.9741290
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1665405848 9781665405843
EndPage	52
ExternalDocumentID	9741290
Genre	orig-research
GrantInformation_xml	– fundername: Battelle funderid: 10.13039/100000993
GroupedDBID	6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK GUFHI LHSKQ RIE RIL
ID	FETCH-LOGICAL-a264t-e34c0668cc1c652945b9ca8c6a6edbb74ea09d91920804c6f9ae513a93fc49093
IEDL.DBID	RIE
ISICitedReferencesCount	22
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000827636600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:36:12 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a264t-e34c0668cc1c652945b9ca8c6a6edbb74ea09d91920804c6f9ae513a93fc49093
OpenAccessLink	https://www.osti.gov/biblio/1890075
PageCount	12
ParticipantIDs	ieee_primary_9741290
PublicationCentury	2000
PublicationDate	2022-April-2
PublicationDateYYYYMMDD	2022-04-02
PublicationDate_xml	– month: 04 year: 2022 text: 2022-April-2 day: 02
PublicationDecade	2020
PublicationTitle	2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
PublicationTitleAbbrev	CGO
PublicationYear	2022
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssib048637701
Score	2.308359
Snippet	OpenMP is the preferred choice for CPU parallelism in High-Performance-Computing (HPC) applications written in C, C++, or Fortran. As HPC systems became...
SourceID	ieee
SourceType	Publisher
StartPage	41
SubjectTerms	Analytical models Codes GPU Graphics processing units LLVM Offloading OpenMP Optimization Parallel processing Runtime Semantics Static analysis
Title	Efficient Execution of OpenMP on GPUs
URI	https://ieeexplore.ieee.org/document/9741290
WOSCitedRecordID	wos000827636600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4A8eBJDRjf2YPeLLTbbrs9E8CD4h4k4Ua6s7OJFzAIxp9vW1aMiRdvbZNJM31kvs7MNwW4RU6pVKZkqUFiStSa5d6QMlRVXdpcVqKKO_1optN8PrdFC-73XBgiisln1A_NGMuvVrgNrrKBx77BbdKGtjFmx9X6Pjsq19IYLhqSjuB2MJw8Z_5FH9hWadpvZH99ohJtyPjof7MfQ--HjJcUezNzAi1aduFuFEs_eIFk9EkYT0-yqpOQH_JUJL4zKWbvPZiNRy_DB9b8eMCcByYbRlKhxwA5okCdpVZlpUWXo3aaqrI0ihy3lfWozAM9hbq2jjIhnZU1KsutPIXOcrWkM0hCocA6I-sBgVX-1jrunAlx1lIb7kR2Dt2g4uJtV9Ri0Wh38ffwJRyGVYwpK-kVdDbrLV3DAX5sXt_XN3EnvgB5jIb7
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qFfSk0opv96A3tya72WRzLn2Ibd1DC72V7OwseGmlD_Hnm6RrRfDiLQkMYfJgvszMNwG4R0ZRLFQeRgopFLyUYWoNaYiiKHOdxgUv_E4P1GiUTqc6q8HjjgtDRD75jFqu6WP5xQI3zlX2ZLGvc5vswX4iRMS3bK3v0yNSGSvFeEXT4Uw_tXuviX3TO75VFLUq6V_fqHgr0j3-3_wn0Pyh4wXZztCcQo3mDXjo-OIPViDofBL68xMsysBliAyzwHZ62WTVhEm3M273w-rPg9BYaLIOKRZoUUCKyFEmkRZJrtGkKI2kIs-VIMN0oS0us1BPoCy1oYTHRsclCs10fAb1-WJO5xC4UoFlQtpCAi3svTXMGOUirblUzPDkAhpOxdn7tqzFrNLu8u_hOzjsj4eD2eB59HIFR25FfQJLdA319XJDN3CAH-u31fLW78oXtyuKQg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE%2FACM+International+Symposium+on+Code+Generation+and+Optimization+%28CGO%29&rft.atitle=Efficient+Execution+of+OpenMP+on+GPUs&rft.au=Huber%2C+Joseph&rft.au=Cornelius%2C+Melanie&rft.au=Georgakoudis%2C+Giorgis&rft.au=Tian%2C+Shilei&rft.date=2022-04-02&rft.pub=IEEE&rft.spage=41&rft.epage=52&rft_id=info:doi/10.1109%2FCGO53902.2022.9741290&rft.externalDocID=9741290