Integer Sum Reduction with OpenMP on an AMD MI100 GPU

Sum reduction is a primitive operation in parallel computing. Device offload support allows a user to use OpenMP directives to take advantage of a highly capable GPU. In this paper, we present the integer sum reduction annotated with the OpenMP directives and evaluate the performance impacts of tuna...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) s. 496 - 499
Hlavní autoři:	Jin, Zheming, Vetter, Jeffrey S.
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.05.2022
Témata:	AMD GPU Conferences Distributed processing Graphics processing units Manuals OpenMP target offload Optimization Parallel processing Performance evaluation Reduction
ISBN:	9781665497480
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Sum reduction is a primitive operation in parallel computing. Device offload support allows a user to use OpenMP directives to take advantage of a highly capable GPU. In this paper, we present the integer sum reduction annotated with the OpenMP directives and evaluate the performance impacts of tunable parameters with the AOMP and GCC compilers on an AMD MI100 GPU. In addition, we explain the implementations of the OpenMP reduction by the compilers. Sweeping over the pruned parameter space, we find that the speedup is approximately 20 with AOMP, and the reduction performance using AOMP is approximately 11% higher than that using GCC. However, the OpenMP offload performance is approximately 30% lower compared to the performance of the reductions written with rocThrust or hipCUB.
AbstractList	Sum reduction is a primitive operation in parallel computing. Device offload support allows a user to use OpenMP directives to take advantage of a highly capable GPU. In this paper, we present the integer sum reduction annotated with the OpenMP directives and evaluate the performance impacts of tunable parameters with the AOMP and GCC compilers on an AMD MI100 GPU. In addition, we explain the implementations of the OpenMP reduction by the compilers. Sweeping over the pruned parameter space, we find that the speedup is approximately 20 with AOMP, and the reduction performance using AOMP is approximately 11% higher than that using GCC. However, the OpenMP offload performance is approximately 30% lower compared to the performance of the reductions written with rocThrust or hipCUB.
Author	Jin, Zheming Vetter, Jeffrey S.
Author_xml	– sequence: 1 givenname: Zheming surname: Jin fullname: Jin, Zheming email: jinz@ornl.gov organization: Oak Ridge National Laboratory – sequence: 2 givenname: Jeffrey S. surname: Vetter fullname: Vetter, Jeffrey S. email: vetter@computer.org organization: Oak Ridge National Laboratory
BookMark	eNo1jlFLwzAUhSMq6GZ_gSD5A603SZPePI5NZ2FlxTl8HNlyqxWXjbZD_PcG1KdzzsfhcEbsIhwCMXYnIBMC7H1Zz-rVq9ZFXmQSpMwAAPGMjYQxOrcR63OW2AL_M8IVS_r-I_akVQKtvGa6DAO9UcdXpz1_Jn_aDe0h8K92eOfLI4Wq5jG6wCfVjFelAODzen3DLhv32VPyp2O2fnx4mT6li-W8nE4WaStQDKlF6ZQxtN0CWFko743HXJIrGiN9PAGEqKzRjQPVNJECUU7eRKfEzqsxu_3dbYloc-zaveu-NxaVNjmoH8GeRs4
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/IPDPSW55747.2022.00088
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1665497475 9781665497473
EndPage	499
ExternalDocumentID	9835640
Genre	orig-research
GroupedDBID	6IE 6IF 6IL 6IN AAWTH ABLEC ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK OCL RIB RIC RIE RIL
ID	FETCH-LOGICAL-i181t-982a366ebb009273dd6d842ea7f62d0290e883965fa03ff7f60ee4ed67f631cd3
IEDL.DBID	RIE
ISBN	9781665497480
ISICitedReferencesCount	2
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000855041000063&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:24:24 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i181t-982a366ebb009273dd6d842ea7f62d0290e883965fa03ff7f60ee4ed67f631cd3
OpenAccessLink	https://www.osti.gov/biblio/1883905
PageCount	4
ParticipantIDs	ieee_primary_9835640
PublicationCentury	2000
PublicationDate	2022-May
PublicationDateYYYYMMDD	2022-05-01
PublicationDate_xml	– month: 05 year: 2022 text: 2022-May
PublicationDecade	2020
PublicationTitle	2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
PublicationTitleAbbrev	IPDPSW
PublicationYear	2022
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0002931892
Score	1.8076737
Snippet	Sum reduction is a primitive operation in parallel computing. Device offload support allows a user to use OpenMP directives to take advantage of a highly...
SourceID	ieee
SourceType	Publisher
StartPage	496
SubjectTerms	AMD GPU Conferences Distributed processing Graphics processing units Manuals OpenMP target offload Optimization Parallel processing Performance evaluation Reduction
Title	Integer Sum Reduction with OpenMP on an AMD MI100 GPU
URI	https://ieeexplore.ieee.org/document/9835640
WOSCitedRecordID	wos000855041000063&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVKxcAEqEV8ywMjoY6dOM6IKIVKtIoohW6VY19Qh6aotPx-7tKoMLCw2R4s-2645_Pde4xdqUJ5GTmJHvD0zQg6sNbkQUhcW8IZUJV0wutTMhyaySTNGux62wsDAFXxGdzQsPrL9wu3plRZJ0W4oCN8oO8kid70am3zKRi2QpPKmsWJNHURKRtRNwWHIu30s242eotjBND4LpQVUScJrvxSVamCSm__f8c5YO2f7jyebePOIWtA2WIxpfbeYclH6zl_Jj5WsjinNCunopFBxnFqS3476PJBPxSCP2TjNhv37l_uHoNaEiGYYSheBamRVmkNeU5kSYkiPSgTSbBJoaVHIwgwCHl0XFihigJXBUAEXuNIhc6rI9YsFyUcMy4TXXgnco-AJnKxT0NQeYz7JkUeaWNPWIuuPP3YsF5M69ue_r18xvbIpptSwHPWXC3XcMF23ddq9rm8rFz1Ddh_jyw
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWqggQToBbxjQdGQh07cZwRUUormiqiLXSrkvhSdSBFpeX3c5dGhYGFzfZg2XfDPZ_v3mPsRuXKSi-T6AFL34ygnSQxqeMS15bIDKhSOuG1HwwGZjIJ4xq73fbCAEBZfAZ3NCz_8u0iW1OqrBUiXNAePtB3SDmr6tbaZlQwcLkmlBWPE6nqIlY2omoLdkXY6sXtePjm-wih8WUoS6pOklz5patShpXOwf8OdMiaP_15PN5GniNWg6LBfEruzWDJh-t3_kKMrGRzTolWTmUjUcxxmhT8PmrzqOcKwZ_icZONO4-jh65TiSI4cwzGKyc0MlFaQ5oSXVKgSBHKeBKSINfSohEEGAQ92s8TofIcVwWAB1bjSLmZVcesXiwKOGFcBjq3mUgtQhov823ogkp93DfIU0-b5JQ16MrTjw3vxbS67dnfy9dsrzuK-tN-b_B8zvbJvpvCwAtWXy3XcMl2s6_V_HN5VbrtG4cTknU
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE+International+Parallel+and+Distributed+Processing+Symposium+Workshops+%28IPDPSW%29&rft.atitle=Integer+Sum+Reduction+with+OpenMP+on+an+AMD+MI100+GPU&rft.au=Jin%2C+Zheming&rft.au=Vetter%2C+Jeffrey+S.&rft.date=2022-05-01&rft.pub=IEEE&rft.isbn=9781665497480&rft.spage=496&rft.epage=499&rft_id=info:doi/10.1109%2FIPDPSW55747.2022.00088&rft.externalDocID=9835640
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781665497480/lc.gif&client=summon&freeimage=true
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781665497480/mc.gif&client=summon&freeimage=true
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781665497480/sc.gif&client=summon&freeimage=true