PALMED: Throughput Characterization for Superscalar Architectures - Extended Version

Saved in:
Bibliographic Details
Title: PALMED: Throughput Characterization for Superscalar Architectures - Extended Version
Authors: Derumigny, Nicolas, Bastian, Théophile, Gruber, Fabian, Iooss, Guillaume, Guillon, Christophe, Pouchet, Louis-Noel, Rastello, Fabrice
Contributors: Compiler Optimization and Run-time Systems (CORSE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP), Université Grenoble Alpes (UGA), Colorado State University Fort Collins (CSU), STMicroelectronics, European Project: 826276,CPS4EU(2019)
Source: https://inria.hal.science/hal-03114933 ; 2022.
Publisher Information: HAL CCSD
Publication Year: 2022
Collection: Université de Rennes 1: Publications scientifiques (HAL)
Subject Terms: performance model, port mapping, throughput, superscalar architecture, compiler, performance debugging, code selection, [INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR], [INFO.INFO-PF]Computer Science [cs]/Performance [cs.PF]
Description: In a super-scalar architecture, the scheduler dynamically assigns micro-operations (µOPs) to execution ports. The port mapping of an architecture describes how an instruction decomposes into µOPs and lists for each µOP the set of ports it can be mapped to. It is used by compilers and performance debugging tools to characterize the performance throughput of a sequence of instructions repeatedly executed as the core component of a loop.This paper introduces a dual equivalent representation: The resource mapping of an architecture is an abstract model where, to be executed, an instruction must use a set of abstract resources, themselves representing combinations of execution ports. For a given architecture, finding a port mapping is an important but difficult problem. Building a resource mapping is a more tractable problem and provides a simpler and equivalent model. This paper describes Palmed, a tool that automatically builds a resource mapping for pipelined, super-scalar, out-of-order CPU architectures. Palmed does not require hardware performance counters, and relies solely on runtime measurements.We evaluate the pertinence of our dual representation for throughput modeling by extracting a representative set of basic-blocks from the compiled binaries of the SPEC CPU 2017 benchmarks. We compared the throughput predicted by existing machine models to that produced by Palmed, and found comparable accuracy to state-of-the art tools, achieving sub-10 % mean square error rate on this workload on Intel's Skylake microarchitecture.
Document Type: report
Language: English
Relation: info:eu-repo/grantAgreement//826276/EU/Applying CPS technologies in modern manufacturing/CPS4EU
Availability: https://inria.hal.science/hal-03114933
https://inria.hal.science/hal-03114933v3/document
https://inria.hal.science/hal-03114933v3/file/main.pdf
Rights: info:eu-repo/semantics/OpenAccess
Accession Number: edsbas.7C84D000
Database: BASE
Be the first to leave a comment!
You must be logged in first