Achieving Computation-Communication Overlap with Overdecomposition on GPU Systems

The landscape of high performance computing is shifting towards a collection of multi-GPU nodes, widening the gap between on-node compute and off-node communication capabilities. Consequently, the ability to tolerate communication latencies and maximize utilization of the compute hardware are becomi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	2020 IEEE/ACM Fifth International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) S. 1 - 10
Hauptverfasser:	Choi, Jaemin, Richards, David F., Kale, Laxmikant V.
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 01.11.2020
Schlagworte:	application program interfaces asynchronous task-based runtime Charm++ parallel programming system communication latencies computation-communication overlap compute hardware computer graphic equipment coprocessors Delays GPU computing graphics processing units Hardware high performance computing Jacobian matrices Kernel message passing modern GPU systems multiGPU nodes off-node communication capabilities on-node compute overde-composition overdecomposition parallel architectures parallel processing parallel programming Runtime Task analysis traditional CPU-based systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	The landscape of high performance computing is shifting towards a collection of multi-GPU nodes, widening the gap between on-node compute and off-node communication capabilities. Consequently, the ability to tolerate communication latencies and maximize utilization of the compute hardware are becoming increasingly important in achieving high performance. Overdecomposition has been successfully adopted on traditional CPU-based systems to achieve computation-communication overlap, significantly reducing the impact of communication on application performance. However, it has been unclear whether overdecomposition can provide the same benefits on modern GPU systems. In this work, we address the challenges in achieving computation-communication overlap with overdecomposition on GPU systems using the Charm++ parallel programming system. By prioritizing communication with CUDA streams in the application and supporting asynchronous progress of GPU operations in the Charm++ runtime system, we obtain improvements in overall performance of up to 50% and 47% with proxy applications Jacobi3D and MiniMD, respectively.
AbstractList	The landscape of high performance computing is shifting towards a collection of multi-GPU nodes, widening the gap between on-node compute and off-node communication capabilities. Consequently, the ability to tolerate communication latencies and maximize utilization of the compute hardware are becoming increasingly important in achieving high performance. Overdecomposition has been successfully adopted on traditional CPU-based systems to achieve computation-communication overlap, significantly reducing the impact of communication on application performance. However, it has been unclear whether overdecomposition can provide the same benefits on modern GPU systems. In this work, we address the challenges in achieving computation-communication overlap with overdecomposition on GPU systems using the Charm++ parallel programming system. By prioritizing communication with CUDA streams in the application and supporting asynchronous progress of GPU operations in the Charm++ runtime system, we obtain improvements in overall performance of up to 50% and 47% with proxy applications Jacobi3D and MiniMD, respectively.
Author	Richards, David F. Kale, Laxmikant V. Choi, Jaemin
Author_xml	– sequence: 1 givenname: Jaemin surname: Choi fullname: Choi, Jaemin email: jchoi157@illinois.edu organization: University of Illinois at Urbana-Champaign,Department of Computer Science,Urbana,Illinois – sequence: 2 givenname: David F. surname: Richards fullname: Richards, David F. email: richards12@llnl.gov organization: Center for Applied Scientific Computing, Lawrence Livermore National Laboratory,Livermore,California – sequence: 3 givenname: Laxmikant V. surname: Kale fullname: Kale, Laxmikant V. email: kale@illinois.edu organization: University of Illinois at Urbana-Champaign,Department of Computer Science,Urbana,Illinois
BookMark	eNotTl1Lw0AQPEEftPYXiJA_kLi7d7nrPZZQq1Bppfa5XHMbe9B8kKSV_ntDdBjYGWZY5kHcVnXFQjwjJIhgXxbbzQelaLVKCAgSGKBvxNSaGWqdKqKZgnvxOc-PgS-h-o6yumzOvetDXcWDLs9VyEcXrS_cnlwT_YT-OBrP-VCuuzDGA5ebXbS9dj2X3aO4K9yp4-n_nYjd6-Ire4tX6-V7Nl_FgUD2sSL06cxbf1A5olWpRQXWEenUsJTSKdKKYZjLRrGVHo3RHnVhDRwgL-REPP39Dcy8b9pQuva6txIMkpS_uvRNFw
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/ESPM251964.2020.00006
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9781665422840 166542284X
EndPage	10
ExternalDocumentID	9307123
Genre	orig-research
GrantInformation_xml	– fundername: Office of Science funderid: 10.13039/100006132 – fundername: U.S. DOE grantid: DE-AC05-000R22725 funderid: 10.13039/100000015 – fundername: Lawrence Livermore National Laboratory grantid: DE-AC52-07NA27344 (LLNL-CONF-814558) funderid: 10.13039/100006227 – fundername: U.S. Department of Energy (DOE) funderid: 10.13039/100000015
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i203t-421d58d9db4c1194591409a22657e333a4264e0978e74e93d1776d16f970b0cf3
IEDL.DBID	RIE
ISICitedReferencesCount	18
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000674882700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Thu Jun 29 18:39:04 EDT 2023
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-421d58d9db4c1194591409a22657e333a4264e0978e74e93d1776d16f970b0cf3
PageCount	10
ParticipantIDs	ieee_primary_9307123
PublicationCentury	2000
PublicationDate	2020-Nov.
PublicationDateYYYYMMDD	2020-11-01
PublicationDate_xml	– month: 11 year: 2020 text: 2020-Nov.
PublicationDecade	2020
PublicationTitle	2020 IEEE/ACM Fifth International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)
PublicationTitleAbbrev	ESPM2
PublicationYear	2020
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.851182
Snippet	The landscape of high performance computing is shifting towards a collection of multi-GPU nodes, widening the gap between on-node compute and off-node...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	application program interfaces asynchronous task-based runtime Charm++ parallel programming system communication latencies computation-communication overlap compute hardware computer graphic equipment coprocessors Delays GPU computing graphics processing units Hardware high performance computing Jacobian matrices Kernel message passing modern GPU systems multiGPU nodes off-node communication capabilities on-node compute overde-composition overdecomposition parallel architectures parallel processing parallel programming Runtime Task analysis traditional CPU-based systems
Title	Achieving Computation-Communication Overlap with Overdecomposition on GPU Systems
URI	https://ieeexplore.ieee.org/document/9307123
WOSCitedRecordID	wos000674882700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQMToBbxrQyMmNqxE8cjQgUGKEFQ1K1K7Ivo0lal7e_H50RFlViQMjgebPnF8sfLvXcA187PAc1dwbgqFVMZIjNJJpnRTqBNReqCl97nsx4Os_HY5C242WphEDEEn-EtFcO_fDe3a6LK-sZPSL_StqGtdVprtRpRjuCmP3jPX0iHmRJVEvPgTJjuJE0Je8bDwf96O4Ter_guyrfbyhG0cNaFtzv7NUW6_Ed1HoYAKNuRd0SvGyLnFhFxq-HFIUWMN2FZkX8e81HUeJT3YPQw-Lh_Yk02BDaNuVwxFQuXZM64UlkhjEoMeVUV_viUaJRSFnS2QZJloFZopBMeJCfSymheclvJY-jM5jM8gUg67htKqoSYxLjwV56SF0JVtvAjRoWn0CU4Jova8GLSIHH2d_U57BPetUDvAjqr5RovYc9uVtPv5VX4Sj_NDJR8
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0gmuhJDRi_3YNHK-223W6PxqAYAdcIhhvZ3c5GLkAQ-P223Q2GxItJD20PbTpt-vE67w3ArbFrQFGTEioyQUSMSLSMOdHKMMwjFhmvpffZVf1-PBrppAZ3Gy4MInrnM7x3Wf-Xb2b5ykFlLW0XpN1pd2BXChHSkq1V0XIY1a32R9JzTMzIgSUh9dqE0VbYFH9qPB3-r78jaP7S74Jkc7AcQw2nDXh_yL8m6J7_QRmJwZuUbBE8gre1g-fmgUNXfcGg8xmvHLMCm56TYVCplDdh-NQePHZIFQ-BTELKl0SEzMjYaJOJnDEtpHZqVam9QEmFnPPU3W7QETNQCdTcMKUiw6JCK5rRvOAnUJ_OpngKATfUNiQL6bDEMLWPnoymTBR5akeMAs-g4cwxnpeSF-PKEud_V9_AfmfQ6467L_3XCzhwti_pepdQXy5WeAV7-Xo5-V5c-xn7AbaAl8M
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+IEEE%2FACM+Fifth+International+Workshop+on+Extreme+Scale+Programming+Models+and+Middleware+%28ESPM2%29&rft.atitle=Achieving+Computation-Communication+Overlap+with+Overdecomposition+on+GPU+Systems&rft.au=Choi%2C+Jaemin&rft.au=Richards%2C+David+F.&rft.au=Kale%2C+Laxmikant+V.&rft.date=2020-11-01&rft.pub=IEEE&rft.spage=1&rft.epage=10&rft_id=info:doi/10.1109%2FESPM251964.2020.00006&rft.externalDocID=9307123