OptiWISE: Combining Sampling and Instrumentation for Granular CPI Analysis

Despite decades of improvement in compiler technology, it remains necessary to profile applications to improve performance. Existing profiling tools typically either sample hardware performance counters or instrument the program with extra instructions to analyze its execution. Both techniques are v...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings / International Symposium on Code Generation and Optimization s. 373 - 385
Hlavní autori: Guo, Yuxin, Chadwick, Alex W., Erdos, Marton, Bora, Utpal, Vougioukas, Ilias, Gabrielli, Giacomo, Jones, Timothy M.
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 02.03.2024
Predmet:
ISSN:2643-2838
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Despite decades of improvement in compiler technology, it remains necessary to profile applications to improve performance. Existing profiling tools typically either sample hardware performance counters or instrument the program with extra instructions to analyze its execution. Both techniques are valuable with different strengths and weaknesses, but do not always correctly identify optimization opportunities. We present OPTIWISE, a profiling tool that runs the program twice, once with low-overhead sampling to accurately measure performance, and once with instrumentation to accurately capture control flow and execution counts. OPTIWISE then combines this information to give a highly detailed per-instruction CPI metric by computing the ratio of samples to execution counts, as well as aggregated information such as costs per loop, source-code line, or function. We evaluate OPTIWISE to show it has an overhead of 8.1× geomean, and 57× worst case on SPEC CPU2017 benchmarks. Using OPTIWISE, we present case studies of optimizing selected SPEC benchmarks on a modern x86 server processor. The per-instruction CPI metrics quickly reveal problems such as costly mispredicted branches and cache misses, which we use to manually optimize for effective performance improvements.
AbstractList Despite decades of improvement in compiler technology, it remains necessary to profile applications to improve performance. Existing profiling tools typically either sample hardware performance counters or instrument the program with extra instructions to analyze its execution. Both techniques are valuable with different strengths and weaknesses, but do not always correctly identify optimization opportunities. We present OPTIWISE, a profiling tool that runs the program twice, once with low-overhead sampling to accurately measure performance, and once with instrumentation to accurately capture control flow and execution counts. OPTIWISE then combines this information to give a highly detailed per-instruction CPI metric by computing the ratio of samples to execution counts, as well as aggregated information such as costs per loop, source-code line, or function. We evaluate OPTIWISE to show it has an overhead of 8.1× geomean, and 57× worst case on SPEC CPU2017 benchmarks. Using OPTIWISE, we present case studies of optimizing selected SPEC benchmarks on a modern x86 server processor. The per-instruction CPI metrics quickly reveal problems such as costly mispredicted branches and cache misses, which we use to manually optimize for effective performance improvements.
Author Erdos, Marton
Vougioukas, Ilias
Jones, Timothy M.
Chadwick, Alex W.
Bora, Utpal
Gabrielli, Giacomo
Guo, Yuxin
Author_xml – sequence: 1
  givenname: Yuxin
  surname: Guo
  fullname: Guo, Yuxin
  email: yg413@cl.cam.ac.uk
  organization: University of Cambridge,UK
– sequence: 2
  givenname: Alex W.
  surname: Chadwick
  fullname: Chadwick, Alex W.
  email: alex.chadwick@cl.cam.ac.uk
  organization: University of Cambridge,UK
– sequence: 3
  givenname: Marton
  surname: Erdos
  fullname: Erdos, Marton
  email: marton.erdos@cl.cam.ac.uk
  organization: University of Cambridge,UK
– sequence: 4
  givenname: Utpal
  surname: Bora
  fullname: Bora, Utpal
  email: ub230@cl.cam.ac.uk
  organization: University of Cambridge,UK
– sequence: 5
  givenname: Ilias
  surname: Vougioukas
  fullname: Vougioukas, Ilias
  email: ilias.vougioukas@arm.com
  organization: Arm,USA
– sequence: 6
  givenname: Giacomo
  surname: Gabrielli
  fullname: Gabrielli, Giacomo
  email: giacomo.gabrielli@arm.com
  organization: Arm,UK
– sequence: 7
  givenname: Timothy M.
  surname: Jones
  fullname: Jones, Timothy M.
  email: timothy.jones@cl.cam.ac.uk
  organization: University of Cambridge,UK
BookMark eNo1j9FKwzAYRqMouM29gUheoDXJnzSJd6NstTKoMMXLkbSJRNq0tN3F3l5FvfrOxeHAt0RXsY8OoXtKUkqJfsiLSsgMSMoI4yklnHMp6QVaa6kVCAJaEK0v0YJlHBKmQN2g5TR9EsIkp7BAz9Uwh_fysH3Eed_ZEEP8wAfTDe0PmNjgMk7zeOpcnM0c-oh9P-JiNPHUmhHnLyXeRNOepzDdomtv2smt_3aF3nbb1_wp2VdFmW_2iQFQc5JxywmradZQA7WjNCPKe-8EM9JqC0yz2llJ6tpqwaWnyoAH0ahvAVjTwArd_XaDc-44jKEz4_n4_x2-ADonT68
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CGO57630.2024.10444771
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350395099
EISSN 2643-2838
EndPage 385
ExternalDocumentID 10444771
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IK
6IL
6IN
AAJGR
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-a338t-64b402c16d1a3ce11608fffe52a7b9b3292ceb70ccb9547f18a3f35d852a32dd3
IEDL.DBID RIE
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001179185400030&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 03:08:40 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a338t-64b402c16d1a3ce11608fffe52a7b9b3292ceb70ccb9547f18a3f35d852a32dd3
OpenAccessLink https://www.repository.cam.ac.uk/handle/1810/363050
PageCount 13
ParticipantIDs ieee_primary_10444771
PublicationCentury 2000
PublicationDate 2024-March-2
PublicationDateYYYYMMDD 2024-03-02
PublicationDate_xml – month: 03
  year: 2024
  text: 2024-March-2
  day: 02
PublicationDecade 2020
PublicationTitle Proceedings / International Symposium on Code Generation and Optimization
PublicationTitleAbbrev CGO
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0027413
ssib057256076
Score 2.2695713
Snippet Despite decades of improvement in compiler technology, it remains necessary to profile applications to improve performance. Existing profiling tools typically...
SourceID ieee
SourceType Publisher
StartPage 373
SubjectTerms Benchmark testing
Codes
Costs
Hardware
Instruments
Measurement
Optimization
Servers
Title OptiWISE: Combining Sampling and Instrumentation for Granular CPI Analysis
URI https://ieeexplore.ieee.org/document/10444771
WOSCitedRecordID wos001179185400030&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFLWgYmAqjyLe8sDqEj8S26xRW8rQViqIbpVfkRhIERS-n2s3KWJgYLOSKEp845xzk3vuQehGhdjVO2QEuKolQrmMKGEK4oGcM-crnkuXzCbkZKIWCz1rxOpJCxNCSMVnoR-H6V--X7nP-KkMVrgQQkbF-K6UxUas1T48uYzgHbF1m21R3kiCaaZvy9EUqDXPICVkot-e6ZenSoKUYfefF3OAej_iPDzbws4h2gn1Eeq27gy4WazH6GEKb4Pn8Xxwh2GnTUYQeG5iBTkMTO3xOHWPfW3URzUG_opHgF2xMhWXszFuO5b00NNw8Fjek8Y5gRhIOdekEBbyQkcLTw13gdIiU1VVhZwZabXlTDMXrMycszoXsqLKcAiLV3AAZ97zE9SpV3U4RbjwkVZJnVsgW5R7LW2uLCC9FMEJ489QL87N8m3THGPZTsv5H9sv0H6MQCrjYpeoA_cZrtCe-1q_fLxfp5B-A5Iwn1E
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELVQQYJTWYrY8YGrSxw7scO16hIobaUW0VvlLVIPpAgK38_YJEUcOHCzkihKPHHem2TePIRupPNdvV1EgKtqwqWJiOQqJRbIeWxswRJhgtmEGI3kfJ5NKrF60MI450LxmWv7YfiXb1fmw38qgxXOORdeMb7trbMquVb9-CTCw7dH102-RVklCqZRdtvpj4FcswiSwpi363P9clUJoNJr_vNy9lHrR56HJxvgOUBbrjxEzdqfAVfL9Qjdj-F98JxPu3cYdupgBYGnyteQw0CVFuehf-xLpT8qMTBY3Af08rWpuDPJcd2zpIWeet1ZZ0Aq7wSiIOlck5RryAwNTS1VzDhK00gWReGSWAmdaRZnsXFaRMboLOGioFIxCIyVcACLrWXHqFGuSneCcGo9sRJZooFuUWYzoROpAesFd4Yre4pafm4Wr9_tMRb1tJz9sf0a7Q5mj8PFMB89nKM9H41Q1BVfoAbcs7tEO-ZzvXx_uwrh_QISB6Ka
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Symposium+on+Code+Generation+and+Optimization&rft.atitle=OptiWISE%3A+Combining+Sampling+and+Instrumentation+for+Granular+CPI+Analysis&rft.au=Guo%2C+Yuxin&rft.au=Chadwick%2C+Alex+W.&rft.au=Erdos%2C+Marton&rft.au=Bora%2C+Utpal&rft.date=2024-03-02&rft.pub=IEEE&rft.eissn=2643-2838&rft.spage=373&rft.epage=385&rft_id=info:doi/10.1109%2FCGO57630.2024.10444771&rft.externalDocID=10444771