Efficient Execution of OpenMP on GPUs
OpenMP is the preferred choice for CPU parallelism in High-Performance-Computing (HPC) applications written in C, C++, or Fortran. As HPC systems became heterogeneous, OpenMP introduced support for accelerator offloading via the target directive. This allowed porting existing (CPU) code onto GPUs, i...
Saved in:
| Published in: | 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) pp. 41 - 52 |
|---|---|
| Main Authors: | , , , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
02.04.2022
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | OpenMP is the preferred choice for CPU parallelism in High-Performance-Computing (HPC) applications written in C, C++, or Fortran. As HPC systems became heterogeneous, OpenMP introduced support for accelerator offloading via the target directive. This allowed porting existing (CPU) code onto GPUs, including well established CPU parallelism paradigms. However, there are architectural differences between CPU and GPU execution which make common patterns, like forking and joining threads, single threaded execution, or sharing of local (stack) variables, in general costly on the latter. So far it was left to the user to identify and avoid non-efficient code patterns, most commonly by writing their OpenMP offloading codes in a kernel-language style which resembles CUDA more than it does traditional OpenMP.In this work we present OpenMP-aware program analyses and optimizations that allow efficient execution of the generic, CPU-centric parallelism model provided by OpenMP on GPUs. Our implementation in LLVM/Clang maps various common OpenMP patterns found in real world applications efficiently to the GPU. As static analysis is inherently limited we provide actionable and informative feedback to the user about the performed and missed optimizations, together with ways for the user to annotate the program for better results. Our extensive evaluation using several HPC proxy applications shows significantly improved GPU kernel times and reduction in resources requirements, such as GPU registers. |
|---|---|
| AbstractList | OpenMP is the preferred choice for CPU parallelism in High-Performance-Computing (HPC) applications written in C, C++, or Fortran. As HPC systems became heterogeneous, OpenMP introduced support for accelerator offloading via the target directive. This allowed porting existing (CPU) code onto GPUs, including well established CPU parallelism paradigms. However, there are architectural differences between CPU and GPU execution which make common patterns, like forking and joining threads, single threaded execution, or sharing of local (stack) variables, in general costly on the latter. So far it was left to the user to identify and avoid non-efficient code patterns, most commonly by writing their OpenMP offloading codes in a kernel-language style which resembles CUDA more than it does traditional OpenMP.In this work we present OpenMP-aware program analyses and optimizations that allow efficient execution of the generic, CPU-centric parallelism model provided by OpenMP on GPUs. Our implementation in LLVM/Clang maps various common OpenMP patterns found in real world applications efficiently to the GPU. As static analysis is inherently limited we provide actionable and informative feedback to the user about the performed and missed optimizations, together with ways for the user to annotate the program for better results. Our extensive evaluation using several HPC proxy applications shows significantly improved GPU kernel times and reduction in resources requirements, such as GPU registers. |
| Author | Chapman, Barbara Diaz, Jose M Monsalve Huber, Joseph Tian, Shilei Doerfert, Johannes Cornelius, Melanie Dinel, Kuter Georgakoudis, Giorgis |
| Author_xml | – sequence: 1 givenname: Joseph surname: Huber fullname: Huber, Joseph email: huberjn@ornl.gov organization: Oak Ridge National Laboratory,Oak Ridge,USA – sequence: 2 givenname: Melanie surname: Cornelius fullname: Cornelius, Melanie email: mdooley1@hawk.iit.edu organization: Illinois Institute of Technology,Chicago,USA – sequence: 3 givenname: Giorgis surname: Georgakoudis fullname: Georgakoudis, Giorgis email: georgakoudis1@llnl.gov organization: Lawrence Livermore National Laboratory,Livermore,USA – sequence: 4 givenname: Shilei surname: Tian fullname: Tian, Shilei email: shilei.tian@stonybrook.edu organization: Stony Brook University,Stony Brook,USA – sequence: 5 givenname: Jose M Monsalve surname: Diaz fullname: Diaz, Jose M Monsalve email: jmonsalvediaz@anl.gov organization: Argonne National Laboratory,Lemont,USA – sequence: 6 givenname: Kuter surname: Dinel fullname: Dinel, Kuter email: kuterdinel@gmail.com organization: Düzce University,Düzce,Turkey – sequence: 7 givenname: Barbara surname: Chapman fullname: Chapman, Barbara email: barbara.chapman@stonybrook.edu organization: Stony Brook University,Stony Brook,USA – sequence: 8 givenname: Johannes surname: Doerfert fullname: Doerfert, Johannes email: jdoerfert@anl.gov organization: Argonne National Laboratory,Lemont,USA |
| BookMark | eNotj8tKw0AUQEdQ0D6-QIRsXCbeed3MXUqIUaiki3ZdJtM7MKCT0kTQv1ewq8PZHDgLcZ3HzEI8SKikBHpqut5qAlUpUKqi2khFcCUWEtEasM64W7GepjSAcajrGuSdeGxjTCFxnov2m8PXnMZcjLHoT5zft8WfdNv9tBI30X9MvL5wKfYv7a55LTd999Y8b0qv0MwlaxMA0YUgA1pFxg4UvAvokY_DUBv2QEeSpMCBCRjJs5Xak47BEJBeivv_bmLmw-mcPv3553A50b9YPj9L |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/CGO53902.2022.9741290 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 1665405848 9781665405843 |
| EndPage | 52 |
| ExternalDocumentID | 9741290 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Battelle funderid: 10.13039/100000993 |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK GUFHI LHSKQ RIE RIL |
| ID | FETCH-LOGICAL-a264t-e34c0668cc1c652945b9ca8c6a6edbb74ea09d91920804c6f9ae513a93fc49093 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 22 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000827636600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:36:12 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a264t-e34c0668cc1c652945b9ca8c6a6edbb74ea09d91920804c6f9ae513a93fc49093 |
| OpenAccessLink | https://www.osti.gov/biblio/1890075 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_9741290 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-April-2 |
| PublicationDateYYYYMMDD | 2022-04-02 |
| PublicationDate_xml | – month: 04 year: 2022 text: 2022-April-2 day: 02 |
| PublicationDecade | 2020 |
| PublicationTitle | 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) |
| PublicationTitleAbbrev | CGO |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib048637701 |
| Score | 2.308359 |
| Snippet | OpenMP is the preferred choice for CPU parallelism in High-Performance-Computing (HPC) applications written in C, C++, or Fortran. As HPC systems became... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 41 |
| SubjectTerms | Analytical models Codes GPU Graphics processing units LLVM Offloading OpenMP Optimization Parallel processing Runtime Semantics Static analysis |
| Title | Efficient Execution of OpenMP on GPUs |
| URI | https://ieeexplore.ieee.org/document/9741290 |
| WOSCitedRecordID | wos000827636600004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4A8eBJDRjf2YPeLLTbbrs9E8CD4h4k4Ua6s7OJFzAIxp9vW1aMiRdvbZNJM31kvs7MNwW4RU6pVKZkqUFiStSa5d6QMlRVXdpcVqKKO_1optN8PrdFC-73XBgiisln1A_NGMuvVrgNrrKBx77BbdKGtjFmx9X6Pjsq19IYLhqSjuB2MJw8Z_5FH9hWadpvZH99ohJtyPjof7MfQ--HjJcUezNzAi1aduFuFEs_eIFk9EkYT0-yqpOQH_JUJL4zKWbvPZiNRy_DB9b8eMCcByYbRlKhxwA5okCdpVZlpUWXo3aaqrI0ihy3lfWozAM9hbq2jjIhnZU1KsutPIXOcrWkM0hCocA6I-sBgVX-1jrunAlx1lIb7kR2Dt2g4uJtV9Ri0Wh38ffwJRyGVYwpK-kVdDbrLV3DAX5sXt_XN3EnvgB5jIb7 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qFfSk0opv96A3tya72WRzLn2Ibd1DC72V7OwseGmlD_Hnm6RrRfDiLQkMYfJgvszMNwG4R0ZRLFQeRgopFLyUYWoNaYiiKHOdxgUv_E4P1GiUTqc6q8HjjgtDRD75jFqu6WP5xQI3zlX2ZLGvc5vswX4iRMS3bK3v0yNSGSvFeEXT4Uw_tXuviX3TO75VFLUq6V_fqHgr0j3-3_wn0Pyh4wXZztCcQo3mDXjo-OIPViDofBL68xMsysBliAyzwHZ62WTVhEm3M273w-rPg9BYaLIOKRZoUUCKyFEmkRZJrtGkKI2kIs-VIMN0oS0us1BPoCy1oYTHRsclCs10fAb1-WJO5xC4UoFlQtpCAi3svTXMGOUirblUzPDkAhpOxdn7tqzFrNLu8u_hOzjsj4eD2eB59HIFR25FfQJLdA319XJDN3CAH-u31fLW78oXtyuKQg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE%2FACM+International+Symposium+on+Code+Generation+and+Optimization+%28CGO%29&rft.atitle=Efficient+Execution+of+OpenMP+on+GPUs&rft.au=Huber%2C+Joseph&rft.au=Cornelius%2C+Melanie&rft.au=Georgakoudis%2C+Giorgis&rft.au=Tian%2C+Shilei&rft.date=2022-04-02&rft.pub=IEEE&rft.spage=41&rft.epage=52&rft_id=info:doi/10.1109%2FCGO53902.2022.9741290&rft.externalDocID=9741290 |