Achieving Computation-Communication Overlap with Overdecomposition on GPU Systems
The landscape of high performance computing is shifting towards a collection of multi-GPU nodes, widening the gap between on-node compute and off-node communication capabilities. Consequently, the ability to tolerate communication latencies and maximize utilization of the compute hardware are becomi...
Gespeichert in:
| Veröffentlicht in: | 2020 IEEE/ACM Fifth International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) S. 1 - 10 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.11.2020
|
| Schlagworte: | |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | The landscape of high performance computing is shifting towards a collection of multi-GPU nodes, widening the gap between on-node compute and off-node communication capabilities. Consequently, the ability to tolerate communication latencies and maximize utilization of the compute hardware are becoming increasingly important in achieving high performance. Overdecomposition has been successfully adopted on traditional CPU-based systems to achieve computation-communication overlap, significantly reducing the impact of communication on application performance. However, it has been unclear whether overdecomposition can provide the same benefits on modern GPU systems. In this work, we address the challenges in achieving computation-communication overlap with overdecomposition on GPU systems using the Charm++ parallel programming system. By prioritizing communication with CUDA streams in the application and supporting asynchronous progress of GPU operations in the Charm++ runtime system, we obtain improvements in overall performance of up to 50% and 47% with proxy applications Jacobi3D and MiniMD, respectively. |
|---|---|
| AbstractList | The landscape of high performance computing is shifting towards a collection of multi-GPU nodes, widening the gap between on-node compute and off-node communication capabilities. Consequently, the ability to tolerate communication latencies and maximize utilization of the compute hardware are becoming increasingly important in achieving high performance. Overdecomposition has been successfully adopted on traditional CPU-based systems to achieve computation-communication overlap, significantly reducing the impact of communication on application performance. However, it has been unclear whether overdecomposition can provide the same benefits on modern GPU systems. In this work, we address the challenges in achieving computation-communication overlap with overdecomposition on GPU systems using the Charm++ parallel programming system. By prioritizing communication with CUDA streams in the application and supporting asynchronous progress of GPU operations in the Charm++ runtime system, we obtain improvements in overall performance of up to 50% and 47% with proxy applications Jacobi3D and MiniMD, respectively. |
| Author | Richards, David F. Kale, Laxmikant V. Choi, Jaemin |
| Author_xml | – sequence: 1 givenname: Jaemin surname: Choi fullname: Choi, Jaemin email: jchoi157@illinois.edu organization: University of Illinois at Urbana-Champaign,Department of Computer Science,Urbana,Illinois – sequence: 2 givenname: David F. surname: Richards fullname: Richards, David F. email: richards12@llnl.gov organization: Center for Applied Scientific Computing, Lawrence Livermore National Laboratory,Livermore,California – sequence: 3 givenname: Laxmikant V. surname: Kale fullname: Kale, Laxmikant V. email: kale@illinois.edu organization: University of Illinois at Urbana-Champaign,Department of Computer Science,Urbana,Illinois |
| BookMark | eNotTl1Lw0AQPEEftPYXiJA_kLi7d7nrPZZQq1Bppfa5XHMbe9B8kKSV_ntDdBjYGWZY5kHcVnXFQjwjJIhgXxbbzQelaLVKCAgSGKBvxNSaGWqdKqKZgnvxOc-PgS-h-o6yumzOvetDXcWDLs9VyEcXrS_cnlwT_YT-OBrP-VCuuzDGA5ebXbS9dj2X3aO4K9yp4-n_nYjd6-Ire4tX6-V7Nl_FgUD2sSL06cxbf1A5olWpRQXWEenUsJTSKdKKYZjLRrGVHo3RHnVhDRwgL-REPP39Dcy8b9pQuva6txIMkpS_uvRNFw |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ESPM251964.2020.00006 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9781665422840 166542284X |
| EndPage | 10 |
| ExternalDocumentID | 9307123 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Office of Science funderid: 10.13039/100006132 – fundername: U.S. DOE grantid: DE-AC05-000R22725 funderid: 10.13039/100000015 – fundername: Lawrence Livermore National Laboratory grantid: DE-AC52-07NA27344 (LLNL-CONF-814558) funderid: 10.13039/100006227 – fundername: U.S. Department of Energy (DOE) funderid: 10.13039/100000015 |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i203t-421d58d9db4c1194591409a22657e333a4264e0978e74e93d1776d16f970b0cf3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 18 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000674882700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Thu Jun 29 18:39:04 EDT 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-421d58d9db4c1194591409a22657e333a4264e0978e74e93d1776d16f970b0cf3 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_9307123 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-Nov. |
| PublicationDateYYYYMMDD | 2020-11-01 |
| PublicationDate_xml | – month: 11 year: 2020 text: 2020-Nov. |
| PublicationDecade | 2020 |
| PublicationTitle | 2020 IEEE/ACM Fifth International Workshop on Extreme Scale Programming Models and Middleware (ESPM2) |
| PublicationTitleAbbrev | ESPM2 |
| PublicationYear | 2020 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.851182 |
| Snippet | The landscape of high performance computing is shifting towards a collection of multi-GPU nodes, widening the gap between on-node compute and off-node... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | application program interfaces asynchronous task-based runtime Charm++ parallel programming system communication latencies computation-communication overlap compute hardware computer graphic equipment coprocessors Delays GPU computing graphics processing units Hardware high performance computing Jacobian matrices Kernel message passing modern GPU systems multiGPU nodes off-node communication capabilities on-node compute overde-composition overdecomposition parallel architectures parallel processing parallel programming Runtime Task analysis traditional CPU-based systems |
| Title | Achieving Computation-Communication Overlap with Overdecomposition on GPU Systems |
| URI | https://ieeexplore.ieee.org/document/9307123 |
| WOSCitedRecordID | wos000674882700001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQMToBbxrQyMmNqxE8cjQgUGKEFQ1K1K7Ivo0lal7e_H50RFlViQMjgebPnF8sfLvXcA187PAc1dwbgqFVMZIjNJJpnRTqBNReqCl97nsx4Os_HY5C242WphEDEEn-EtFcO_fDe3a6LK-sZPSL_StqGtdVprtRpRjuCmP3jPX0iHmRJVEvPgTJjuJE0Je8bDwf96O4Ter_guyrfbyhG0cNaFtzv7NUW6_Ed1HoYAKNuRd0SvGyLnFhFxq-HFIUWMN2FZkX8e81HUeJT3YPQw-Lh_Yk02BDaNuVwxFQuXZM64UlkhjEoMeVUV_viUaJRSFnS2QZJloFZopBMeJCfSymheclvJY-jM5jM8gUg67htKqoSYxLjwV56SF0JVtvAjRoWn0CU4Jova8GLSIHH2d_U57BPetUDvAjqr5RovYc9uVtPv5VX4Sj_NDJR8 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0gmuhJDRi_3YNHK-223W6PxqAYAdcIhhvZ3c5GLkAQ-P223Q2GxItJD20PbTpt-vE67w3ArbFrQFGTEioyQUSMSLSMOdHKMMwjFhmvpffZVf1-PBrppAZ3Gy4MInrnM7x3Wf-Xb2b5ykFlLW0XpN1pd2BXChHSkq1V0XIY1a32R9JzTMzIgSUh9dqE0VbYFH9qPB3-r78jaP7S74Jkc7AcQw2nDXh_yL8m6J7_QRmJwZuUbBE8gre1g-fmgUNXfcGg8xmvHLMCm56TYVCplDdh-NQePHZIFQ-BTELKl0SEzMjYaJOJnDEtpHZqVam9QEmFnPPU3W7QETNQCdTcMKUiw6JCK5rRvOAnUJ_OpngKATfUNiQL6bDEMLWPnoymTBR5akeMAs-g4cwxnpeSF-PKEud_V9_AfmfQ6467L_3XCzhwti_pepdQXy5WeAV7-Xo5-V5c-xn7AbaAl8M |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+IEEE%2FACM+Fifth+International+Workshop+on+Extreme+Scale+Programming+Models+and+Middleware+%28ESPM2%29&rft.atitle=Achieving+Computation-Communication+Overlap+with+Overdecomposition+on+GPU+Systems&rft.au=Choi%2C+Jaemin&rft.au=Richards%2C+David+F.&rft.au=Kale%2C+Laxmikant+V.&rft.date=2020-11-01&rft.pub=IEEE&rft.spage=1&rft.epage=10&rft_id=info:doi/10.1109%2FESPM251964.2020.00006&rft.externalDocID=9307123 |