GPA: A GPU Performance Advisor Based on Instruction Sampling

Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained tuning advice at the kernel level, if any. In this paper, we describe GPA, a performance advisor for NVIDIA GPUs that sugges...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) s. 115 - 125
Hlavní autoři: Zhou, Keren, Meng, Xiaozhu, Sai, Ryuichi, Mellor-Crummey, John
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 27.02.2021
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained tuning advice at the kernel level, if any. In this paper, we describe GPA, a performance advisor for NVIDIA GPUs that suggests potential code optimizations at a hierarchy of levels, including individual lines, loops, and functions. To relieve users of the burden of interpreting performance counters and analyzing bottlenecks, GPA uses data flow analysis to approximately attribute measured instruction stalls to their root causes and uses information about a program's structure and the GPU to match inefficiency patterns with optimization strategies. To quantify the potential benefits of each optimization strategy, we developed PC sampling-based performance models to estimate its speedup. Our experiments with benchmarks and applications show that GPA provides insightful reports to guide performance optimization. Using GPA, we obtained speedups on a Volta V100 GPU ranging from 1.01 x to 3.58 ×, with a geometric mean of 1.22 x.
AbstractList Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained tuning advice at the kernel level, if any. In this paper, we describe GPA, a performance advisor for NVIDIA GPUs that suggests potential code optimizations at a hierarchy of levels, including individual lines, loops, and functions. To relieve users of the burden of interpreting performance counters and analyzing bottlenecks, GPA uses data flow analysis to approximately attribute measured instruction stalls to their root causes and uses information about a program's structure and the GPU to match inefficiency patterns with optimization strategies. To quantify the potential benefits of each optimization strategy, we developed PC sampling-based performance models to estimate its speedup. Our experiments with benchmarks and applications show that GPA provides insightful reports to guide performance optimization. Using GPA, we obtained speedups on a Volta V100 GPU ranging from 1.01 x to 3.58 ×, with a geometric mean of 1.22 x.
Author Zhou, Keren
Sai, Ryuichi
Mellor-Crummey, John
Meng, Xiaozhu
Author_xml – sequence: 1
  givenname: Keren
  surname: Zhou
  fullname: Zhou, Keren
  email: keren.zhou@rice.edu
  organization: Rice University,Department of Computer Science,Houston,Texas
– sequence: 2
  givenname: Xiaozhu
  surname: Meng
  fullname: Meng, Xiaozhu
  email: xm13@rice.edu
  organization: Rice University,Department of Computer Science,Houston,Texas
– sequence: 3
  givenname: Ryuichi
  surname: Sai
  fullname: Sai, Ryuichi
  email: ryuichi@rice.edu
  organization: Rice University,Department of Computer Science,Houston,Texas
– sequence: 4
  givenname: John
  surname: Mellor-Crummey
  fullname: Mellor-Crummey, John
  email: johnmc@rice.edu
  organization: Rice University,Department of Computer Science,Houston,Texas
BookMark eNotj99KwzAcRiMo6OaeQIS8QGt--R_xppZZB4MVdNcjbRMJrOlIquDbO3BX37k6nG-BruMUHUKPQEoAYp7qZidAGCgpoVAapghj5gotQFENWgJTt2iVc-gIl0IbQdQdemna6hlXuGn3uHXJT2m0sXe4Gn5CnhJ-tdkNeIp4E_Ocvvs5nPnDjqdjiF_36MbbY3aryy7R_m39Wb8X212zqattYalWczHIzvfCc9Fzp40mmkkpqfUGjHHcKN0rJdnAz-3UyY6CGjTvJAHpue0sYUv08O8NzrnDKYXRpt_D5SD7Aw7YRsg
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CGO51591.2021.9370339
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1728186137
9781728186139
EndPage 125
ExternalDocumentID 9370339
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
GUFHI
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a287t-d6bfc5f45c4e8980836662af9199e4978c7763d41592e6b217d84b6016f4aba03
IEDL.DBID RIE
ISICitedReferencesCount 16
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000666933100011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:30:10 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a287t-d6bfc5f45c4e8980836662af9199e4978c7763d41592e6b217d84b6016f4aba03
PageCount 11
ParticipantIDs ieee_primary_9370339
PublicationCentury 2000
PublicationDate 2021-Feb.-27
PublicationDateYYYYMMDD 2021-02-27
PublicationDate_xml – month: 02
  year: 2021
  text: 2021-Feb.-27
  day: 27
PublicationDecade 2020
PublicationTitle 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
PublicationTitleAbbrev CGO
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib046589507
Score 2.274988
Snippet Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only...
SourceID ieee
SourceType Publisher
StartPage 115
SubjectTerms Computer architecture
Graphics processing units
High performance computing
Kernel
Optimization
Parallel architectures
Parallel programming
Performance analysis
Programming
Tuning
Title GPA: A GPU Performance Advisor Based on Instruction Sampling
URI https://ieeexplore.ieee.org/document/9370339
WOSCitedRecordID wos000666933100011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB3a4sGTSit-k4NHt93dpNlEvKzFVkHqghZ6K_kUL1vR2t9vZrt2Ebx4CwkhTBJ4M8m8NwCXlHqqY6-jmMcmYolXkdCSYyaA8KlR3LuqasljNp2K-VwWLbjacmGcc1Xymetjs_rLt0vzhU9lgwClMaWyDe0s4xuu1s_dYQFJZfBtapJOEsvBaPKEYI1BYJr067m_iqhUGDLe-9_q-9BryHik2MLMAbRc2YWbSZFfk5xMihkpmtx_kts1ymmS2wBOlixL8tAoxJJnhenj5WsPZuO7l9F9VBdCiFQIaFaR5dqboWdDw5yQAgWlOU-Vl4mUDkvEmbAf1AYslqnjOkQZVjCNQiueKa1iegidclm6IyDU0TTTwroUSaU4yLShAcS5Cr6cz46hi5Yv3jdaF4va6JO_u09hFze3onhnZ9AJBrlz2DHr1dvnx0V1QN_OW49f
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LSgMxFL3UKuhKpRXfZuHSaWcmaSYRN7XYB9Y6YAvdlTzFzVS09vtN0rFFcONuyBDCnYSc3Mw95wBcY2yxjK2MYhqriCRWRExy6isBmE2VoNYE15JhNhqx6ZTnFbhZc2GMMaH4zDT8Y_iXr-fqy1-VNR2UxhjzLdgOzlkrttbP6iEOS7k73ZQ0nSTmzU7v2cO1TwPTpFH2_mWjElCku_-_8Q-gvqHjoXwNNIdQMUUN7np5-xa1US-foHxT_Y_aeukFNdG9gyeN5gUabDRi0YvwBeTFax0m3Ydxpx-VVgiRcCnNItJUWtWypKWIYZx5SWlKU2F5wrnxJnEqcxuFdmjMU0OlyzM0I9JLrVgipIjxEVSLeWGOAWGD00wybVJPK_UviVTYwTgV7jRnsxOo-chn7yu1i1kZ9OnfzVew2x8_DWfDwejxDPb8hw6E7-wcqi44cwE7arl4-_y4DJP1DYNvkqw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+IEEE%2FACM+International+Symposium+on+Code+Generation+and+Optimization+%28CGO%29&rft.atitle=GPA%3A+A+GPU+Performance+Advisor+Based+on+Instruction+Sampling&rft.au=Zhou%2C+Keren&rft.au=Meng%2C+Xiaozhu&rft.au=Sai%2C+Ryuichi&rft.au=Mellor-Crummey%2C+John&rft.date=2021-02-27&rft.pub=IEEE&rft.spage=115&rft.epage=125&rft_id=info:doi/10.1109%2FCGO51591.2021.9370339&rft.externalDocID=9370339