Load value prediction via path-based address prediction avoiding mispredictions due to conflicting stores
Current flagship processors excel at extracting instruction-level-parallelism (ILP) by forming large instruction windows. Even then, extracting ILP is inherently limited by true data dependencies. Value prediction was proposed to address this limitation. Many challenges face value prediction, in thi...
Uloženo v:
| Vydáno v: | MICRO-50 : the 50th annual IEEE/ACM International Symposium on Microarchitecture : proceedings : October 14-18, 2017, Cambridge, MA s. 423 - 435 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
New York, NY, USA
ACM
14.10.2017
|
| Edice: | ACM Conferences |
| Témata: | |
| ISBN: | 1450349528, 9781450349529 |
| ISSN: | 2379-3155 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Current flagship processors excel at extracting instruction-level-parallelism (ILP) by forming large instruction windows. Even then, extracting ILP is inherently limited by true data dependencies. Value prediction was proposed to address this limitation. Many challenges face value prediction, in this work we focus on two of them. Challenge #1: store instructions change the values in memory, rendering the values in the value predictor stale, and resulting in value mispredictions and a retraining penalty. Challenge #2: value mispredictions trigger costly pipeline flushes. To minimize the number of pipeline flushes, value predictors employ stringent, yet necessary, high confidence requirements to guarantee high prediction accuracy. Such requirements can negatively impact training time and coverage.
In this work, we propose Decoupled Load Value Prediction (DLVP), a technique that targets the value prediction challenges for load instructions. DLVP mitigates the stale state caused by stores by replacing value prediction with memory address prediction. Then, it opportunistically probes the data cache to retrieve the value(s) corresponding to the predicted address(es) early enough so value prediction can take place. Since the values captured in the data cache mirror the current program data (except for in-flight stores), this addresses the first challenge. Regarding the second challenge, DLVP reduces pipeline flushes by using a new context-based address prediction scheme that leverages load-path history to deliver high address prediction accuracy (over 99%) with relaxed confidence requirements. We call this address prediction scheme Path-based Address Prediction (PAP). With a modest 8KB prediction table, DLVP improves performance by up to 71%, and 4.8% on average, without increasing the core energy consumption. |
|---|---|
| AbstractList | Current flagship processors excel at extracting instruction-level-parallelism (ILP) by forming large instruction windows. Even then, extracting ILP is inherently limited by true data dependencies. Value prediction was proposed to address this limitation. Many challenges face value prediction, in this work we focus on two of them. Challenge #1: store instructions change the values in memory, rendering the values in the value predictor stale, and resulting in value mispredictions and a retraining penalty. Challenge #2: value mispredictions trigger costly pipeline flushes. To minimize the number of pipeline flushes, value predictors employ stringent, yet necessary, high confidence requirements to guarantee high prediction accuracy. Such requirements can negatively impact training time and coverage. In this work, we propose Decoupled Load Value Prediction (DLVP), a technique that targets the value prediction challenges for load instructions. DLVP mitigates the stale state caused by stores by replacing value prediction with memory address prediction. Then, it opportunistically probes the data cache to retrieve the value(s) corresponding to the predicted address(es) early enough so value prediction can take place. Since the values captured in the data cache mirror the current program data (except for in-flight stores), this addresses the first challenge. Regarding the second challenge, DLVP reduces pipeline flushes by using a new context-based address prediction scheme that leverages load-path history to deliver high address prediction accuracy (over 99%) with relaxed confidence requirements. We call this address prediction scheme Path-based Address Prediction (PAP). With a modest 8KB prediction table, DLVP improves performance by up to 71%, and 4.8% on average, without increasing the core energy consumption. CCS CONCEPTS * Computer systems organization → Superscalar architectures; Pipeline computing; Reduced instruction set computing; Current flagship processors excel at extracting instruction-level-parallelism (ILP) by forming large instruction windows. Even then, extracting ILP is inherently limited by true data dependencies. Value prediction was proposed to address this limitation. Many challenges face value prediction, in this work we focus on two of them. Challenge #1: store instructions change the values in memory, rendering the values in the value predictor stale, and resulting in value mispredictions and a retraining penalty. Challenge #2: value mispredictions trigger costly pipeline flushes. To minimize the number of pipeline flushes, value predictors employ stringent, yet necessary, high confidence requirements to guarantee high prediction accuracy. Such requirements can negatively impact training time and coverage. In this work, we propose Decoupled Load Value Prediction (DLVP), a technique that targets the value prediction challenges for load instructions. DLVP mitigates the stale state caused by stores by replacing value prediction with memory address prediction. Then, it opportunistically probes the data cache to retrieve the value(s) corresponding to the predicted address(es) early enough so value prediction can take place. Since the values captured in the data cache mirror the current program data (except for in-flight stores), this addresses the first challenge. Regarding the second challenge, DLVP reduces pipeline flushes by using a new context-based address prediction scheme that leverages load-path history to deliver high address prediction accuracy (over 99%) with relaxed confidence requirements. We call this address prediction scheme Path-based Address Prediction (PAP). With a modest 8KB prediction table, DLVP improves performance by up to 71%, and 4.8% on average, without increasing the core energy consumption. |
| Author | Sheikh, Rami Damodaran, Raguram Cain, Harold W. |
| Author_xml | – sequence: 1 givenname: Rami surname: Sheikh fullname: Sheikh, Rami email: ralsheik@qti.qualcomm.com organization: Qualcomm Technologies, Inc – sequence: 2 givenname: Harold W. surname: Cain fullname: Cain, Harold W. email: tcain@qti.qualcomm.com organization: Qualcomm Datacenter Technologies, Inc – sequence: 3 givenname: Raguram surname: Damodaran fullname: Damodaran, Raguram email: raguramd@qti.qualcomm.com organization: Qualcomm Technologies, Inc |
| BookMark | eNqNkDtPwzAUhc1Loi2dGVgysiT4-jp2PKKqPKRILDBbfkUY2iSKQyX-PYF26Mh0pPudc4dvTs7brg2EXAMtAHh5h8BQoSr-soQTMp-uFLkqWXVKZgylyhHK8uwYXJJlSh-UUgZSCcAZkXVnfLYzm6-Q9UPw0Y2xa7NdNFlvxvfcmhR8ZrwfQkpHjSty0ZhNCstDLsjbw_p19ZTXL4_Pq_s6N4zLMffMS24DSOmdFxIEFazh6FAC5RSFKRvlrAInrVTKTUz4hlpGJyadQFyQm_3fGELQ_RC3ZvjWlaiEUNVEiz01bqtt130mDVT_-tEHP_rgR9shhmYa3P5zgD-5LWFw |
| ContentType | Conference Proceeding |
| Copyright | 2017 ACM |
| Copyright_xml | – notice: 2017 ACM |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3123939.3123951 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1450349528 9781450349529 |
| EISSN | 2379-3155 |
| EndPage | 435 |
| ExternalDocumentID | 8686698 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAJGR ABLEC ACM ADPZR ALMA_UNASSIGNED_HOLDINGS APO BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK GUFHI IEGSK OCL RIB RIC RIE RIL AAWTH LHSKQ |
| ID | FETCH-LOGICAL-a247t-d2d74be177dcd6716062f43c37104036a5f9cb91c7b799c2f46df0b200367c633 |
| IEDL.DBID | RIE |
| ISBN | 1450349528 9781450349529 |
| ISICitedReferencesCount | 11 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000455679300032&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Thu May 29 05:57:38 EDT 2025 Wed Jan 31 06:40:42 EST 2024 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Keywords | value prediction address prediction path-based predictor microarchitecture |
| Language | English |
| License | Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org |
| LinkModel | DirectLink |
| MeetingName | MICRO-50: The 50th Annual IEEE/ACM International Symposium on Microarchitecture |
| MergedId | FETCHMERGED-LOGICAL-a247t-d2d74be177dcd6716062f43c37104036a5f9cb91c7b799c2f46df0b200367c633 |
| PageCount | 13 |
| ParticipantIDs | acm_books_10_1145_3123939_3123951_brief acm_books_10_1145_3123939_3123951 ieee_primary_8686698 |
| PublicationCentury | 2000 |
| PublicationDate | 20171014 2017-Oct. |
| PublicationDateYYYYMMDD | 2017-10-14 2017-10-01 |
| PublicationDate_xml | – month: 10 year: 2017 text: 20171014 day: 14 |
| PublicationDecade | 2010 |
| PublicationPlace | New York, NY, USA |
| PublicationPlace_xml | – name: New York, NY, USA |
| PublicationSeriesTitle | ACM Conferences |
| PublicationTitle | MICRO-50 : the 50th annual IEEE/ACM International Symposium on Microarchitecture : proceedings : October 14-18, 2017, Cambridge, MA |
| PublicationTitleAbbrev | MICRO |
| PublicationYear | 2017 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0002179613 ssib030238632 ssib042476800 ssib023363937 |
| Score | 2.174444 |
| Snippet | Current flagship processors excel at extracting instruction-level-parallelism (ILP) by forming large instruction windows. Even then, extracting ILP is... |
| SourceID | ieee acm |
| SourceType | Publisher |
| StartPage | 423 |
| SubjectTerms | Address Prediction Computer systems organization -- Architectures -- Serial architectures -- Pipeline computing Computer systems organization -- Architectures -- Serial architectures -- Reduced instruction set computing Computer systems organization -- Architectures -- Serial architectures -- Superscalar architectures History Machinery Microarchitecture Path-based Predictor Pipelines Prefetching Registers Value Prediction |
| Subtitle | avoiding mispredictions due to conflicting stores |
| Title | Load value prediction via path-based address prediction |
| URI | https://ieeexplore.ieee.org/document/8686698 |
| WOSCitedRecordID | wos000455679300032&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1La9tAEB7i0ENOaWKXuE3KFgq9VImllXY1uQXTkIMbDE1CbmK1D1BIrGDZ_v2dkWUnhUDoSWI1WsR-K30zmhfAdySSCKQKRBiQDJQs0xGm7IUPiQ2l9YhtF4W7ib6-zu_vcboDP7e5MN77NvjMn_Jp68t3tV3yr7KzXOVKYd6DntZqnau12TuJlEq-olruhZOrl5zJNElJse5K0fFXmlRxJCrrqv3EaXYmYy4HhqftkR2XPWOf_mm60nLO5f7_Pe1HGLwk74nplpYOYMfPDmF_071BdC9zHx4mtXHizjwuWZ79NYyRWFVGTEktjJjfnLhwjg3yVxLn4mJVVzy3-F01z9vxRjiaaVGL8SbThCT-kEnvmwHcXv66GV9FXeeFyNA6LSKXOJ2WPtbaWafIpBqpJKTSStJHUuI8kwW0JcZWlxrR0jXlwqjkQDelrZLyE-zO6pk_AhFsXo4CbZSgCQQj0ZuYbjFoMQkyy4fwjZa5YJOiKdZZ0lnRQVF0UAzhx7syRTmvfBhCn4EontelOooOg89vD3-BvYRZuo3NO4bdxXzpT-CDXS2qZv613V9_ASE0x7I |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3da9RAEB_aKuhTq614ftQVBF9Me8ludjO-ldJS8XocWEvfwmY_IEUv5XJ3f78zudy1QkF8SthMlrC_TX4zmS-AT0gkEUkVSDAiGSh5bhJU7IWPmYuVC4hdF4XrkRmPi5sbnGzBl00uTAihCz4LR3za-fJ94xb8q-y40IXWWGzDk1ypbLjK1lrvnkxKLR-QLXfDKfR91qTKFKnWfTE6_k6TMo5EZn29n1TlxzLlgmB41B3Zdblt3e-_2q50rHO--3_PuwcH9-l7YrIhphewFaYvYXfdv0H0r_M-3I4a68W1_bVgefbYMEpiWVsxIcUwYYbz4sR7NskfSHwVJ8um5rnFZd3ebcZb4WmmeSNO17kmJPGDjPrQHsDP87Or04uk772QWFqneeIzb1QVUmO885qMqqHOopJOkkaiiPVsHtFVmDpTGURH17SPw4pD3bRxWspXsDNtpuE1iOiKahhpq0RDIFiJwaZ0i0WHWZR5MYCPtMwlGxVtucqTzsseirKHYgCf_ylTVrM6xAHsMxDl3apYR9lj8Obx4Q_w7OLqclSOvo2_v4XnGXN2F6n3Dnbms0V4D0_dcl63s8Nur_0BtDzK-Q |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+50th+Annual+IEEE%2FACM+International+Symposium+on+Microarchitecture&rft.atitle=Load+value+prediction+via+path-based+address+prediction&rft.au=Sheikh%2C+Rami&rft.au=Cain%2C+Harold+W.&rft.au=Damodaran%2C+Raguram&rft.series=ACM+Conferences&rft.date=2017-10-14&rft.pub=ACM&rft.isbn=1450349528&rft.spage=423&rft.epage=435&rft_id=info:doi/10.1145%2F3123939.3123951 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450349529/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450349529/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781450349529/sc.gif&client=summon&freeimage=true |

