Integer Sum Reduction with OpenMP on an AMD MI100 GPU
Sum reduction is a primitive operation in parallel computing. Device offload support allows a user to use OpenMP directives to take advantage of a highly capable GPU. In this paper, we present the integer sum reduction annotated with the OpenMP directives and evaluate the performance impacts of tuna...
Uloženo v:
| Vydáno v: | 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) s. 496 - 499 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
01.05.2022
|
| Témata: | |
| ISBN: | 9781665497480 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Sum reduction is a primitive operation in parallel computing. Device offload support allows a user to use OpenMP directives to take advantage of a highly capable GPU. In this paper, we present the integer sum reduction annotated with the OpenMP directives and evaluate the performance impacts of tunable parameters with the AOMP and GCC compilers on an AMD MI100 GPU. In addition, we explain the implementations of the OpenMP reduction by the compilers. Sweeping over the pruned parameter space, we find that the speedup is approximately 20 with AOMP, and the reduction performance using AOMP is approximately 11% higher than that using GCC. However, the OpenMP offload performance is approximately 30% lower compared to the performance of the reductions written with rocThrust or hipCUB. |
|---|---|
| AbstractList | Sum reduction is a primitive operation in parallel computing. Device offload support allows a user to use OpenMP directives to take advantage of a highly capable GPU. In this paper, we present the integer sum reduction annotated with the OpenMP directives and evaluate the performance impacts of tunable parameters with the AOMP and GCC compilers on an AMD MI100 GPU. In addition, we explain the implementations of the OpenMP reduction by the compilers. Sweeping over the pruned parameter space, we find that the speedup is approximately 20 with AOMP, and the reduction performance using AOMP is approximately 11% higher than that using GCC. However, the OpenMP offload performance is approximately 30% lower compared to the performance of the reductions written with rocThrust or hipCUB. |
| Author | Jin, Zheming Vetter, Jeffrey S. |
| Author_xml | – sequence: 1 givenname: Zheming surname: Jin fullname: Jin, Zheming email: jinz@ornl.gov organization: Oak Ridge National Laboratory – sequence: 2 givenname: Jeffrey S. surname: Vetter fullname: Vetter, Jeffrey S. email: vetter@computer.org organization: Oak Ridge National Laboratory |
| BookMark | eNo1jlFLwzAUhSMq6GZ_gSD5A603SZPePI5NZ2FlxTl8HNlyqxWXjbZD_PcG1KdzzsfhcEbsIhwCMXYnIBMC7H1Zz-rVq9ZFXmQSpMwAAPGMjYQxOrcR63OW2AL_M8IVS_r-I_akVQKtvGa6DAO9UcdXpz1_Jn_aDe0h8K92eOfLI4Wq5jG6wCfVjFelAODzen3DLhv32VPyp2O2fnx4mT6li-W8nE4WaStQDKlF6ZQxtN0CWFko743HXJIrGiN9PAGEqKzRjQPVNJECUU7eRKfEzqsxu_3dbYloc-zaveu-NxaVNjmoH8GeRs4 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/IPDPSW55747.2022.00088 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 1665497475 9781665497473 |
| EndPage | 499 |
| ExternalDocumentID | 9835640 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK OCL RIB RIC RIE RIL |
| ID | FETCH-LOGICAL-i181t-982a366ebb009273dd6d842ea7f62d0290e883965fa03ff7f60ee4ed67f631cd3 |
| IEDL.DBID | RIE |
| ISBN | 9781665497480 |
| ISICitedReferencesCount | 2 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000855041000063&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:24:24 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i181t-982a366ebb009273dd6d842ea7f62d0290e883965fa03ff7f60ee4ed67f631cd3 |
| OpenAccessLink | https://www.osti.gov/biblio/1883905 |
| PageCount | 4 |
| ParticipantIDs | ieee_primary_9835640 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-May |
| PublicationDateYYYYMMDD | 2022-05-01 |
| PublicationDate_xml | – month: 05 year: 2022 text: 2022-May |
| PublicationDecade | 2020 |
| PublicationTitle | 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) |
| PublicationTitleAbbrev | IPDPSW |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0002931892 |
| Score | 1.8076737 |
| Snippet | Sum reduction is a primitive operation in parallel computing. Device offload support allows a user to use OpenMP directives to take advantage of a highly... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 496 |
| SubjectTerms | AMD GPU Conferences Distributed processing Graphics processing units Manuals OpenMP target offload Optimization Parallel processing Performance evaluation Reduction |
| Title | Integer Sum Reduction with OpenMP on an AMD MI100 GPU |
| URI | https://ieeexplore.ieee.org/document/9835640 |
| WOSCitedRecordID | wos000855041000063&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVKxcAEqEV8ywMjoY6dOM6IKIVKtIoohW6VY19Qh6aotPx-7tKoMLCw2R4s-2645_Pde4xdqUJ5GTmJHvD0zQg6sNbkQUhcW8IZUJV0wutTMhyaySTNGux62wsDAFXxGdzQsPrL9wu3plRZJ0W4oCN8oO8kid70am3zKRi2QpPKmsWJNHURKRtRNwWHIu30s242eotjBND4LpQVUScJrvxSVamCSm__f8c5YO2f7jyebePOIWtA2WIxpfbeYclH6zl_Jj5WsjinNCunopFBxnFqS3476PJBPxSCP2TjNhv37l_uHoNaEiGYYSheBamRVmkNeU5kSYkiPSgTSbBJoaVHIwgwCHl0XFihigJXBUAEXuNIhc6rI9YsFyUcMy4TXXgnco-AJnKxT0NQeYz7JkUeaWNPWIuuPP3YsF5M69ue_r18xvbIpptSwHPWXC3XcMF23ddq9rm8rFz1Ddh_jyw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWqggQToBbxjQdGQh07cZwRUUormiqiLXSrkvhSdSBFpeX3c5dGhYGFzfZg2XfDPZ_v3mPsRuXKSi-T6AFL34ygnSQxqeMS15bIDKhSOuG1HwwGZjIJ4xq73fbCAEBZfAZ3NCz_8u0iW1OqrBUiXNAePtB3SDmr6tbaZlQwcLkmlBWPE6nqIlY2omoLdkXY6sXtePjm-wih8WUoS6pOklz5patShpXOwf8OdMiaP_15PN5GniNWg6LBfEruzWDJh-t3_kKMrGRzTolWTmUjUcxxmhT8PmrzqOcKwZ_icZONO4-jh65TiSI4cwzGKyc0MlFaQ5oSXVKgSBHKeBKSINfSohEEGAQ92s8TofIcVwWAB1bjSLmZVcesXiwKOGFcBjq3mUgtQhov823ogkp93DfIU0-b5JQ16MrTjw3vxbS67dnfy9dsrzuK-tN-b_B8zvbJvpvCwAtWXy3XcMl2s6_V_HN5VbrtG4cTknU |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE+International+Parallel+and+Distributed+Processing+Symposium+Workshops+%28IPDPSW%29&rft.atitle=Integer+Sum+Reduction+with+OpenMP+on+an+AMD+MI100+GPU&rft.au=Jin%2C+Zheming&rft.au=Vetter%2C+Jeffrey+S.&rft.date=2022-05-01&rft.pub=IEEE&rft.isbn=9781665497480&rft.spage=496&rft.epage=499&rft_id=info:doi/10.1109%2FIPDPSW55747.2022.00088&rft.externalDocID=9835640 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781665497480/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781665497480/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781665497480/sc.gif&client=summon&freeimage=true |

