Alternate Path μ-op Cache Prefetching

Datacenter applications are well-known for their large code footprints. This has caused frontend design to evolve by implementing decoupled fetching and large prediction structures - branch predictors, Branch Target Buffers (BTBs) - to mitigate the stagnating size of the instruction cache by prefetc...

Full description

Saved in:
Bibliographic Details
Published in:2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) pp. 1230 - 1245
Main Authors: Singh, Sawan, Perais, Arthur, Jimborean, Alexandra, Ros, Alberto
Format: Conference Proceeding
Language:English
Published: IEEE 29.06.2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Datacenter applications are well-known for their large code footprints. This has caused frontend design to evolve by implementing decoupled fetching and large prediction structures - branch predictors, Branch Target Buffers (BTBs) - to mitigate the stagnating size of the instruction cache by prefetching instructions well in advance. In addition, many designs feature a micro operation (\mu-op) cache, which primarily provides power savings by bypassing the instruction cache and decoders once warmed up. However, this \mu-op cache often has lower reach than the instruction cache, and it is not filled up speculatively using the decoupled fetcher. As a result, the \mu-op cache is often over-subscribed by datacenter applications, up to the point of becoming a burden. This paper first shows that because of this pressure, blindly prefetching into the \mu-op cache using state-of-the-art standalone prefetchers would not provide significant gains. As a consequence, this paper proposes to prefetch only critical \mu-ops into the \mu op cache, by focusing on execution points where the \mu-op cache provides the most gains: Pipeline refills. Concretely, we use hardto-predict conditional branches as indicators that a pipeline refill is likely to happen in the near future, and prefetch into the \mu-op cache the \mu-ops that belong to the path opposed to the predicted path, which we call alternate path. Identifying hard-to-predict branches requires no additional state if the branch predictor confidence is used to classify branches. Including extra alternate branch predictors with limited budget (8.95 KB to 12.95 KB), our proposal provides average speedups of 1.9 \% to 2 \% and as high as \mathbf{1 2 \%} on a subset of CVP-1 traces.
AbstractList Datacenter applications are well-known for their large code footprints. This has caused frontend design to evolve by implementing decoupled fetching and large prediction structures - branch predictors, Branch Target Buffers (BTBs) - to mitigate the stagnating size of the instruction cache by prefetching instructions well in advance. In addition, many designs feature a micro operation (\mu-op) cache, which primarily provides power savings by bypassing the instruction cache and decoders once warmed up. However, this \mu-op cache often has lower reach than the instruction cache, and it is not filled up speculatively using the decoupled fetcher. As a result, the \mu-op cache is often over-subscribed by datacenter applications, up to the point of becoming a burden. This paper first shows that because of this pressure, blindly prefetching into the \mu-op cache using state-of-the-art standalone prefetchers would not provide significant gains. As a consequence, this paper proposes to prefetch only critical \mu-ops into the \mu op cache, by focusing on execution points where the \mu-op cache provides the most gains: Pipeline refills. Concretely, we use hardto-predict conditional branches as indicators that a pipeline refill is likely to happen in the near future, and prefetch into the \mu-op cache the \mu-ops that belong to the path opposed to the predicted path, which we call alternate path. Identifying hard-to-predict branches requires no additional state if the branch predictor confidence is used to classify branches. Including extra alternate branch predictors with limited budget (8.95 KB to 12.95 KB), our proposal provides average speedups of 1.9 \% to 2 \% and as high as \mathbf{1 2 \%} on a subset of CVP-1 traces.
Author Perais, Arthur
Ros, Alberto
Singh, Sawan
Jimborean, Alexandra
Author_xml – sequence: 1
  givenname: Sawan
  surname: Singh
  fullname: Singh, Sawan
  email: singh.sawan@um.es
  organization: University of Murcia,Computer Engineering Department,Murcia,Spain
– sequence: 2
  givenname: Arthur
  surname: Perais
  fullname: Perais, Arthur
  email: arthur.perais@univ-grenoble-alpes.fr
  organization: Univ. Grenoble Alpes, CNRS, Grenoble INP, TIMA,Grenoble,France
– sequence: 3
  givenname: Alexandra
  surname: Jimborean
  fullname: Jimborean, Alexandra
  email: alexandra.jimborean@um.es
  organization: University of Murcia,Computer Engineering Department,Murcia,Spain
– sequence: 4
  givenname: Alberto
  surname: Ros
  fullname: Ros, Alberto
  email: aros@ditec.um.es
  organization: University of Murcia,Computer Engineering Department,Murcia,Spain
BookMark eNotjstKxEAQAEdQUNf8wR5y8pbY8-jM9DEEHwsLCup56SQ9JrBml2Qu_pvf4Dep6KmgDkVdqtPpMIlSaw2l1kA3m-emRgLvSwPGlQBA5kRl5ClYBGsqDPpcZcsytlABeesDXqjrep9knjhJ_sRpyL8-i8Mxb7gbfsQsUVI3jNPblTqLvF8k--dKvd7dvjQPxfbxftPU24KtDakwEbG1jlgzE7gYxRnfaxFfadd7QqdbQcttcJ0nJgaDViL2JjJE3dmVWv91RxHZHefxneePnf4drpy33_-SQb0
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ISCA59077.2024.00092
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350326581
EndPage 1245
ExternalDocumentID 10609647
Genre orig-research
GrantInformation_xml – fundername: European Research Council
  funderid: 10.13039/501100000781
GroupedDBID 6IE
6IH
ACM
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a338t-2f55b349a1aa904ffe427d1ee7614d79541be53ab84c79a9a0253ef5d2fa0f1c3
IEDL.DBID RIE
ISICitedReferencesCount 3
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001290320700082&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:35:15 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a338t-2f55b349a1aa904ffe427d1ee7614d79541be53ab84c79a9a0253ef5d2fa0f1c3
OpenAccessLink https://hal.science/hal-04675260/document
PageCount 16
ParticipantIDs ieee_primary_10609647
PublicationCentury 2000
PublicationDate 2024-June-29
PublicationDateYYYYMMDD 2024-06-29
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-June-29
  day: 29
PublicationDecade 2020
PublicationTitle 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)
PublicationTitleAbbrev ISCA
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib060973785
Score 2.2909982
Snippet Datacenter applications are well-known for their large code footprints. This has caused frontend design to evolve by implementing decoupled fetching and large...
SourceID ieee
SourceType Publisher
StartPage 1230
SubjectTerms Codes
Computer architecture
core design
Decoding
Focusing
hard-to-predict branches
Micro-op Cache
Pipelines
Prefetching
processor front-end
Proposals
Title Alternate Path μ-op Cache Prefetching
URI https://ieeexplore.ieee.org/document/10609647
WOSCitedRecordID wos001290320700082&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5s8eBJxYpv9iDeokk2j82xLBa9lIIKvZVsMgFBdktt_XX9Df4mk-1WvXjwFpIJIZmEeWS-GYBrDFEQaIkEXVBEaEGJVVwQR5103Hvu0bbFJvR4XEynZtKB1VssDCK2wWd4m5rtX75v3Cq5yuILVzQhJ3vQ01ptwFrby6NS3hldyA4ex6i5e3wqhzIafzqagTwlyabpw_NXEZVWhoz2_7n6AQx-0HjZ5FvOHMIO1kdwM3zbuPLiWFTiss81aeZZmdIzR2IMiRuRegAvo_vn8oF0NQ-IjcbikvAgZZULY5m1hooQUHDtGaKOctRrIwWrUOa2KoTTxhobdZYcg_Q8WBqYy4-hXzc1nkAmlfPG5YwHHqfSwlomKqTaKW5kofJTGKRNzuabtBaz7f7O_ug_h710jilOipsL6C8XK7yEXfexfH1fXLXM-ALtFYp0
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB60CnpSseLbPYi3aJLNY3MsxdJiLQUr9FayyQQEaUtt_XX-Bn-TybZVLx68hWRCSCZhHplvBuAaQxQEWiJBFxQRWlBiFRfEUScd9557tFWxCd3rFcOh6a_A6hUWBhGr4DO8Tc3qL99P3CK5yuILVzQhJzdhSwrB6RKutb4-KmWe0YVcAeQYNXedp2ZDRvNPR0OQpzTZNH15_iqjUkmR1t4_19-H-g8eL-t_S5oD2MDxIdw0XpfOvDgW1bjs84NMplkzJWiOxBgSPyJ1HZ5b94Nmm6yqHhAbzcU54UHKMhfGMmsNFSGg4NozRB0lqddGClaizG1ZCKeNNTZqLTkG6XmwNDCXH0FtPBnjMWRSOW9cznjgcSotrGWiRKqd4kYWKj-BetrkaLpMbDFa7-_0j_4r2GkPHrujbqf3cAa76UxT1BQ351CbzxZ4Advuff7yNrusGPMFVlKNuw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+ACM%2FIEEE+51st+Annual+International+Symposium+on+Computer+Architecture+%28ISCA%29&rft.atitle=Alternate+Path+%CE%BC-op+Cache+Prefetching&rft.au=Singh%2C+Sawan&rft.au=Perais%2C+Arthur&rft.au=Jimborean%2C+Alexandra&rft.au=Ros%2C+Alberto&rft.date=2024-06-29&rft.pub=IEEE&rft.spage=1230&rft.epage=1245&rft_id=info:doi/10.1109%2FISCA59077.2024.00092&rft.externalDocID=10609647