OptiWISE: Combining Sampling and Instrumentation for Granular CPI Analysis

Despite decades of improvement in compiler technology, it remains necessary to profile applications to improve performance. Existing profiling tools typically either sample hardware performance counters or instrument the program with extra instructions to analyze its execution. Both techniques are v...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings / International Symposium on Code Generation and Optimization S. 373 - 385
Hauptverfasser:	Guo, Yuxin, Chadwick, Alex W., Erdos, Marton, Bora, Utpal, Vougioukas, Ilias, Gabrielli, Giacomo, Jones, Timothy M.
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 02.03.2024
Schlagworte:	Benchmark testing Codes Costs Hardware Instruments Measurement Optimization Servers
ISSN:	2643-2838
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Despite decades of improvement in compiler technology, it remains necessary to profile applications to improve performance. Existing profiling tools typically either sample hardware performance counters or instrument the program with extra instructions to analyze its execution. Both techniques are valuable with different strengths and weaknesses, but do not always correctly identify optimization opportunities. We present OPTIWISE, a profiling tool that runs the program twice, once with low-overhead sampling to accurately measure performance, and once with instrumentation to accurately capture control flow and execution counts. OPTIWISE then combines this information to give a highly detailed per-instruction CPI metric by computing the ratio of samples to execution counts, as well as aggregated information such as costs per loop, source-code line, or function. We evaluate OPTIWISE to show it has an overhead of 8.1× geomean, and 57× worst case on SPEC CPU2017 benchmarks. Using OPTIWISE, we present case studies of optimizing selected SPEC benchmarks on a modern x86 server processor. The per-instruction CPI metrics quickly reveal problems such as costly mispredicted branches and cache misses, which we use to manually optimize for effective performance improvements.
ISSN:	2643-2838
DOI:	10.1109/CGO57630.2024.10444771