GPA: A GPU Performance Advisor Based on Instruction Sampling
Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained tuning advice at the kernel level, if any. In this paper, we describe GPA, a performance advisor for NVIDIA GPUs that sugges...
Gespeichert in:
| Veröffentlicht in: | 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) S. 115 - 125 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
27.02.2021
|
| Schlagworte: | |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained tuning advice at the kernel level, if any. In this paper, we describe GPA, a performance advisor for NVIDIA GPUs that suggests potential code optimizations at a hierarchy of levels, including individual lines, loops, and functions. To relieve users of the burden of interpreting performance counters and analyzing bottlenecks, GPA uses data flow analysis to approximately attribute measured instruction stalls to their root causes and uses information about a program's structure and the GPU to match inefficiency patterns with optimization strategies. To quantify the potential benefits of each optimization strategy, we developed PC sampling-based performance models to estimate its speedup. Our experiments with benchmarks and applications show that GPA provides insightful reports to guide performance optimization. Using GPA, we obtained speedups on a Volta V100 GPU ranging from 1.01 x to 3.58 ×, with a geometric mean of 1.22 x. |
|---|---|
| AbstractList | Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained tuning advice at the kernel level, if any. In this paper, we describe GPA, a performance advisor for NVIDIA GPUs that suggests potential code optimizations at a hierarchy of levels, including individual lines, loops, and functions. To relieve users of the burden of interpreting performance counters and analyzing bottlenecks, GPA uses data flow analysis to approximately attribute measured instruction stalls to their root causes and uses information about a program's structure and the GPU to match inefficiency patterns with optimization strategies. To quantify the potential benefits of each optimization strategy, we developed PC sampling-based performance models to estimate its speedup. Our experiments with benchmarks and applications show that GPA provides insightful reports to guide performance optimization. Using GPA, we obtained speedups on a Volta V100 GPU ranging from 1.01 x to 3.58 ×, with a geometric mean of 1.22 x. |
| Author | Zhou, Keren Sai, Ryuichi Mellor-Crummey, John Meng, Xiaozhu |
| Author_xml | – sequence: 1 givenname: Keren surname: Zhou fullname: Zhou, Keren email: keren.zhou@rice.edu organization: Rice University,Department of Computer Science,Houston,Texas – sequence: 2 givenname: Xiaozhu surname: Meng fullname: Meng, Xiaozhu email: xm13@rice.edu organization: Rice University,Department of Computer Science,Houston,Texas – sequence: 3 givenname: Ryuichi surname: Sai fullname: Sai, Ryuichi email: ryuichi@rice.edu organization: Rice University,Department of Computer Science,Houston,Texas – sequence: 4 givenname: John surname: Mellor-Crummey fullname: Mellor-Crummey, John email: johnmc@rice.edu organization: Rice University,Department of Computer Science,Houston,Texas |
| BookMark | eNotj99KwzAcRiMo6OaeQIS8QGt--R_xppZZB4MVdNcjbRMJrOlIquDbO3BX37k6nG-BruMUHUKPQEoAYp7qZidAGCgpoVAapghj5gotQFENWgJTt2iVc-gIl0IbQdQdemna6hlXuGn3uHXJT2m0sXe4Gn5CnhJ-tdkNeIp4E_Ocvvs5nPnDjqdjiF_36MbbY3aryy7R_m39Wb8X212zqattYalWczHIzvfCc9Fzp40mmkkpqfUGjHHcKN0rJdnAz-3UyY6CGjTvJAHpue0sYUv08O8NzrnDKYXRpt_D5SD7Aw7YRsg |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/CGO51591.2021.9370339 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 1728186137 9781728186139 |
| EndPage | 125 |
| ExternalDocumentID | 9370339 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK GUFHI LHSKQ RIE RIL |
| ID | FETCH-LOGICAL-a287t-d6bfc5f45c4e8980836662af9199e4978c7763d41592e6b217d84b6016f4aba03 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 16 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000666933100011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:30:10 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a287t-d6bfc5f45c4e8980836662af9199e4978c7763d41592e6b217d84b6016f4aba03 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_9370339 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-Feb.-27 |
| PublicationDateYYYYMMDD | 2021-02-27 |
| PublicationDate_xml | – month: 02 year: 2021 text: 2021-Feb.-27 day: 27 |
| PublicationDecade | 2020 |
| PublicationTitle | 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) |
| PublicationTitleAbbrev | CGO |
| PublicationYear | 2021 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssib046589507 |
| Score | 2.274988 |
| Snippet | Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 115 |
| SubjectTerms | Computer architecture Graphics processing units High performance computing Kernel Optimization Parallel architectures Parallel programming Performance analysis Programming Tuning |
| Title | GPA: A GPU Performance Advisor Based on Instruction Sampling |
| URI | https://ieeexplore.ieee.org/document/9370339 |
| WOSCitedRecordID | wos000666933100011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB3a4sGTSit-k4NHt93NJtmNeKnFrl7qghZ6K_mYiJdt0drfb7JdWwQv3kIghJkE3kwy7w3AdSwMJj5MjqxQMmKGpZFHRRsplwiOaHJtXN1sIptM8tlMli242XJhELEuPsN-GNZ_-XZhvsJT2cBDaZymsg3tLBMbrtbP3WEeSaWPbRqSThLLwah4DmAdkkCa9Ju1v5qo1BgyPvjf7ofQ25HxSLmFmSNoYdWFu6Ic3pIhKcopKXe1_2Ro10FOk9x7cLJkUZGnnUIseVGhfLx668F0_PA6eoyaRgiR8gnNyvtRO8Md44ZhLvMgKC0EVU4mUmJoEWe8P1LrsVhSFNpnGTZnOgitOKa0itNj6FSLCk-ASKmpDlmdj1MZz7WylGtNLaeOcqrsKXSD5fPlRuti3hh99vf0OewH59YU7-wCOt4gvIQ9s169f35c1Qf0DUjjj8s |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5qFfSk0opvc_DotrvZZLsRL7XYB9a6YAu9lTwm4mUrWvv7TbZri-DFWwiEMEnIN5PM9w3AdZhojJybHJhEioBpFgcOFU0gbZRwRJ0qbYtiE63RKJ1ORVaBmzUXBhGL5DNs-Gbxl2_m-ss_lTUdlIZxLLZgmzNGwxVb6-f0MIelwnk3JU0nCkWz03v2cO3DQBo1ytG_yqgUKNLd_9_8B1Df0PFItgaaQ6hgXoO7Xta-JW3SyyYk22T_k7ZZekFNcu_gyZB5TgYbjVjyIn0Cef5ah0n3YdzpB2UphEC6kGbhVlJZzS3jmmEqUi8pnSRUWhEJgb5InG65i8I4NBYUE-XiDJMy5aVWLJNKhvERVPN5jsdAhFBU-bjOeaqMp0oaypWihlNLOZXmBGre8tn7Su1iVhp9-nf3Fez2x0_D2XAwejyDPb_QBeG7dQ5VZxxewI5eLt4-Py6LzfoG-TmTEg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+IEEE%2FACM+International+Symposium+on+Code+Generation+and+Optimization+%28CGO%29&rft.atitle=GPA%3A+A+GPU+Performance+Advisor+Based+on+Instruction+Sampling&rft.au=Zhou%2C+Keren&rft.au=Meng%2C+Xiaozhu&rft.au=Sai%2C+Ryuichi&rft.au=Mellor-Crummey%2C+John&rft.date=2021-02-27&rft.pub=IEEE&rft.spage=115&rft.epage=125&rft_id=info:doi/10.1109%2FCGO51591.2021.9370339&rft.externalDocID=9370339 |