IMP: Indirect memory prefetcher

Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) S. 178 - 190
Hauptverfasser: Xiangyao Yu, Hughes, Christopher J., Satish, Nadathur, Devadas, Srinivas
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: ACM 01.12.2015
Schlagworte:
ISSN:2379-3155
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and large bandwidth requirements. A traditional streaming or striding prefetcher cannot capture these irregular access patterns. A majority of these irregular accesses come from indirect patterns of the form A[B[j]]. We propose an efficient hardware indirect memory prefetcher (IMP) to capture this access pattern and hide latency. We also propose a partial cacheline accessing mechanism for these prefetches to reduce the network and DRAM bandwidth pressure from the lack of spatial locality. Evaluated on 7 applications, IMP shows 56% speedup on average (up to 2.3×) compared to a baseline 64 core system with streaming prefetchers. This is within 23% of an idealized system. With partial cacheline accessing, we see another 9.4% speedup on average (up to 46.6%).
AbstractList Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and large bandwidth requirements. A traditional streaming or striding prefetcher cannot capture these irregular access patterns. A majority of these irregular accesses come from indirect patterns of the form A[B[j]]. We propose an efficient hardware indirect memory prefetcher (IMP) to capture this access pattern and hide latency. We also propose a partial cacheline accessing mechanism for these prefetches to reduce the network and DRAM bandwidth pressure from the lack of spatial locality. Evaluated on 7 applications, IMP shows 56% speedup on average (up to 2.3×) compared to a baseline 64 core system with streaming prefetchers. This is within 23% of an idealized system. With partial cacheline accessing, we see another 9.4% speedup on average (up to 46.6%).
Author Hughes, Christopher J.
Xiangyao Yu
Satish, Nadathur
Devadas, Srinivas
Author_xml – sequence: 1
  surname: Xiangyao Yu
  fullname: Xiangyao Yu
  email: yxy@mit.edu
  organization: Massachusetts Inst. of Technol., Cambridge, MA, USA
– sequence: 2
  givenname: Christopher J.
  surname: Hughes
  fullname: Hughes, Christopher J.
  email: christopher.j.hughes@intel.com
  organization: Parallel Comput. Lab., Intel Labs., Santa Clara, CA, USA
– sequence: 3
  givenname: Nadathur
  surname: Satish
  fullname: Satish, Nadathur
  email: nadathur.rajagopalan.satish@intel.com
  organization: Parallel Comput. Lab., Intel Labs., Santa Clara, CA, USA
– sequence: 4
  givenname: Srinivas
  surname: Devadas
  fullname: Devadas, Srinivas
  email: devadas@mit.edu
  organization: Massachusetts Inst. of Technol., Cambridge, MA, USA
BookMark eNotjE1Lw0AQQFdRsK09e_Bg_kDq7M5uZuJNStVARQ96LvsxwYBJyyaX_nsrCg8evMObq4thP4hSNxpWWlt3bxiByKx-zUBnan6qgPaEOVczg1SXqJ27Ustx7AIgGOQKzUzdNa_vD0UzpC5LnIpe-n0-FocsrUzxS_K1umz99yjLfy_U59PmY_1Sbt-em_XjtvTG0lQySnBct5X2PjijkzUUg_PccrIx1HUI7LxASgzRA1UWk8REDNYCmRYX6vbv24nI7pC73ufjjthVrib8AVNGPnw
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/2830772.2830807
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1450340342
9781450340342
EISSN 2379-3155
EndPage 190
ExternalDocumentID 7856597
Genre orig-research
GroupedDBID 6IE
6IL
ABLEC
ALMA_UNASSIGNED_HOLDINGS
CBEJK
IEGSK
RIE
RIL
ID FETCH-LOGICAL-a247t-83eb589f61aab521d427cb5a8f8d4cb99bb85ae0dd80ca07643decd78044072f3
IEDL.DBID RIE
ISICitedReferencesCount 112
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000393287300015&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:02:01 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a247t-83eb589f61aab521d427cb5a8f8d4cb99bb85ae0dd80ca07643decd78044072f3
PageCount 13
ParticipantIDs ieee_primary_7856597
PublicationCentury 2000
PublicationDate 2015-Dec.
PublicationDateYYYYMMDD 2015-12-01
PublicationDate_xml – month: 12
  year: 2015
  text: 2015-Dec.
PublicationDecade 2010
PublicationTitle 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
PublicationTitleAbbrev MICRO
PublicationYear 2015
Publisher ACM
Publisher_xml – name: ACM
SSID ssib030238632
ssib023363937
ssib042476800
Score 2.3818152
Snippet Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a...
SourceID ieee
SourceType Publisher
StartPage 178
SubjectTerms Arrays
Bandwidth
Hardware
Indexes
Multicore processing
Prefetching
Sparse matrices
Title IMP: Indirect memory prefetcher
URI https://ieeexplore.ieee.org/document/7856597
WOSCitedRecordID wos000393287300015&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB3a4sGTSit-uwePbpvdZPPhVSwWtPSg0luZfIEHt6VuBf-9yW5tFbx4SsghzJDAeyTz3gBcOWkwNxlNERVPWYDkVMnohKm4ZRGyG7umlwcxHsvpVE1acL3Rwjjn6uIz14_T-i_fzs0qPpUNhAz0Q4k2tIXgjVbr--7klHL6A2pjLxzJt5pJlrNArAlZu_tkrBhE66vALftxlOR3e5UaXYZ7_4trH3pbmV4y2QDQAbRc2YXL0ePkJhmVDVYlb7GS9jNZxHYi9QH14Hl493R7n667IKQYYq5SSZ0upPI8Q9QBbC3LhdEFSi8tM1oprWWBjlgriUEiAsWwzthoLBTNzzw9hE45L90RJBw1ReW9tZgx5xRyYnOfkbCT9jnHY-jG5GaLxuhits7r5O_lU9gN7KFoajvOoFMtV-4cdsxH9fq-vKhP5wuUW4zK
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1qFfSk0orf3YNHt83uZrOJV7G02JYeqvRWJl_gwW3ph-C_N9ldWwUvnhJyCDMk8B7JzHsAd4YrjFWUhIiChdRBcii4V8IUTFMP2aVc0-sgG434dCrGNbjf9sIYY4riM9P20-IvX8_Vxj-VdTLu6IfI9mDfO2dV3VrftydOEpb8AFvvhsPZrmuSxtRRa0IqfZ-Iph0vfuXYZduPnPw2WCnwpXv8v8hOoLlr1AvGWwg6hZrJG9DqD8cPQT8v0Sp497W0n8HCG4oUR9SEl-7T5LEXVj4IIbqY1yFPjEy5sCxClA5uNY0zJVPklmuqpBBS8hQN0ZoThSRzJEMbpb20kJc_s8kZ1PN5bs4hYCgTFNZqjRE1RiAjOrYRcTtJGzO8gIZPbrYopS5mVV6Xfy-34LA3GQ5mg_7o-QqOHJdIy0qPa6ivlxtzAwfqY_22Wt4WJ_UFIJCQEw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2015+48th+Annual+IEEE%2FACM+International+Symposium+on+Microarchitecture+%28MICRO%29&rft.atitle=IMP%3A+Indirect+memory+prefetcher&rft.au=Xiangyao+Yu&rft.au=Hughes%2C+Christopher+J.&rft.au=Satish%2C+Nadathur&rft.au=Devadas%2C+Srinivas&rft.date=2015-12-01&rft.pub=ACM&rft.eissn=2379-3155&rft.spage=178&rft.epage=190&rft_id=info:doi/10.1145%2F2830772.2830807&rft.externalDocID=7856597