IMP: Indirect memory prefetcher

Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) S. 178 - 190
Hauptverfasser:	Xiangyao Yu, Hughes, Christopher J., Satish, Nadathur, Devadas, Srinivas
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	ACM 01.12.2015
Schlagworte:	Arrays Bandwidth Hardware Indexes Multicore processing Prefetching Sparse matrices
ISSN:	2379-3155
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and large bandwidth requirements. A traditional streaming or striding prefetcher cannot capture these irregular access patterns. A majority of these irregular accesses come from indirect patterns of the form A[B[j]]. We propose an efficient hardware indirect memory prefetcher (IMP) to capture this access pattern and hide latency. We also propose a partial cacheline accessing mechanism for these prefetches to reduce the network and DRAM bandwidth pressure from the lack of spatial locality. Evaluated on 7 applications, IMP shows 56% speedup on average (up to 2.3×) compared to a baseline 64 core system with streaming prefetchers. This is within 23% of an idealized system. With partial cacheline accessing, we see another 9.4% speedup on average (up to 46.6%).
AbstractList	Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a graph or non-zero elements in a sparse matrix. These accesses have little temporal or spatial locality, and thus incur long memory stalls and large bandwidth requirements. A traditional streaming or striding prefetcher cannot capture these irregular access patterns. A majority of these irregular accesses come from indirect patterns of the form A[B[j]]. We propose an efficient hardware indirect memory prefetcher (IMP) to capture this access pattern and hide latency. We also propose a partial cacheline accessing mechanism for these prefetches to reduce the network and DRAM bandwidth pressure from the lack of spatial locality. Evaluated on 7 applications, IMP shows 56% speedup on average (up to 2.3×) compared to a baseline 64 core system with streaming prefetchers. This is within 23% of an idealized system. With partial cacheline accessing, we see another 9.4% speedup on average (up to 46.6%).
Author	Hughes, Christopher J. Xiangyao Yu Satish, Nadathur Devadas, Srinivas
Author_xml	– sequence: 1 surname: Xiangyao Yu fullname: Xiangyao Yu email: yxy@mit.edu organization: Massachusetts Inst. of Technol., Cambridge, MA, USA – sequence: 2 givenname: Christopher J. surname: Hughes fullname: Hughes, Christopher J. email: christopher.j.hughes@intel.com organization: Parallel Comput. Lab., Intel Labs., Santa Clara, CA, USA – sequence: 3 givenname: Nadathur surname: Satish fullname: Satish, Nadathur email: nadathur.rajagopalan.satish@intel.com organization: Parallel Comput. Lab., Intel Labs., Santa Clara, CA, USA – sequence: 4 givenname: Srinivas surname: Devadas fullname: Devadas, Srinivas email: devadas@mit.edu organization: Massachusetts Inst. of Technol., Cambridge, MA, USA
BookMark	eNotjE1Lw0AQQFdRsK09e_Bg_kDq7M5uZuJNStVARQ96LvsxwYBJyyaX_nsrCg8evMObq4thP4hSNxpWWlt3bxiByKx-zUBnan6qgPaEOVczg1SXqJ27Ustx7AIgGOQKzUzdNa_vD0UzpC5LnIpe-n0-FocsrUzxS_K1umz99yjLfy_U59PmY_1Sbt-em_XjtvTG0lQySnBct5X2PjijkzUUg_PccrIx1HUI7LxASgzRA1UWk8REDNYCmRYX6vbv24nI7pC73ufjjthVrib8AVNGPnw
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1145/2830772.2830807
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1450340342 9781450340342
EISSN	2379-3155
EndPage	190
ExternalDocumentID	7856597
Genre	orig-research
GroupedDBID	6IE 6IL ABLEC ALMA_UNASSIGNED_HOLDINGS CBEJK IEGSK RIE RIL
ID	FETCH-LOGICAL-a247t-83eb589f61aab521d427cb5a8f8d4cb99bb85ae0dd80ca07643decd78044072f3
IEDL.DBID	RIE
ISICitedReferencesCount	112
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000393287300015&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate	Wed Aug 27 02:02:01 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a247t-83eb589f61aab521d427cb5a8f8d4cb99bb85ae0dd80ca07643decd78044072f3
PageCount	13
ParticipantIDs	ieee_primary_7856597
PublicationCentury	2000
PublicationDate	2015-Dec.
PublicationDateYYYYMMDD	2015-12-01
PublicationDate_xml	– month: 12 year: 2015 text: 2015-Dec.
PublicationDecade	2010
PublicationTitle	2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
PublicationTitleAbbrev	MICRO
PublicationYear	2015
Publisher	ACM
Publisher_xml	– name: ACM
SSID	ssib030238632 ssib023363937 ssib042476800
Score	2.3818152
Snippet	Machine learning, graph analytics and sparse linear algebra-based applications are dominated by irregular memory accesses resulting from following edges in a...
SourceID	ieee
SourceType	Publisher
StartPage	178
SubjectTerms	Arrays Bandwidth Hardware Indexes Multicore processing Prefetching Sparse matrices
Title	IMP: Indirect memory prefetcher
URI	https://ieeexplore.ieee.org/document/7856597
WOSCitedRecordID	wos000393287300015&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB3a4sGTSit-uwePbpvdZPPhVSwWtPSg0luZfIEHt6VuBf-9yW5tFbx4SsghzJDAeyTz3gBcOWkwNxlNERVPWYDkVMnohKm4ZRGyG7umlwcxHsvpVE1acL3Rwjjn6uIz14_T-i_fzs0qPpUNhAz0Q4k2tIXgjVbr--7klHL6A2pjLxzJt5pJlrNArAlZu_tkrBhE66vALftxlOR3e5UaXYZ7_4trH3pbmV4y2QDQAbRc2YXL0ePkJhmVDVYlb7GS9jNZxHYi9QH14Hl493R7n667IKQYYq5SSZ0upPI8Q9QBbC3LhdEFSi8tM1oprWWBjlgriUEiAsWwzthoLBTNzzw9hE45L90RJBw1ReW9tZgx5xRyYnOfkbCT9jnHY-jG5GaLxuhits7r5O_lU9gN7KFoajvOoFMtV-4cdsxH9fq-vKhP5wuUW4zK
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1qFfSk0orf3YNHt83uZrOJV7G02JYeqvRWJl_gwW3ph-C_N9ldWwUvnhJyCDMk8B7JzHsAd4YrjFWUhIiChdRBcii4V8IUTFMP2aVc0-sgG434dCrGNbjf9sIYY4riM9P20-IvX8_Vxj-VdTLu6IfI9mDfO2dV3VrftydOEpb8AFvvhsPZrmuSxtRRa0IqfZ-Iph0vfuXYZduPnPw2WCnwpXv8v8hOoLlr1AvGWwg6hZrJG9DqD8cPQT8v0Sp497W0n8HCG4oUR9SEl-7T5LEXVj4IIbqY1yFPjEy5sCxClA5uNY0zJVPklmuqpBBS8hQN0ZoThSRzJEMbpb20kJc_s8kZ1PN5bs4hYCgTFNZqjRE1RiAjOrYRcTtJGzO8gIZPbrYopS5mVV6Xfy-34LA3GQ5mg_7o-QqOHJdIy0qPa6ivlxtzAwfqY_22Wt4WJ_UFIJCQEw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2015+48th+Annual+IEEE%2FACM+International+Symposium+on+Microarchitecture+%28MICRO%29&rft.atitle=IMP%3A+Indirect+memory+prefetcher&rft.au=Xiangyao+Yu&rft.au=Hughes%2C+Christopher+J.&rft.au=Satish%2C+Nadathur&rft.au=Devadas%2C+Srinivas&rft.date=2015-12-01&rft.pub=ACM&rft.eissn=2379-3155&rft.spage=178&rft.epage=190&rft_id=info:doi/10.1145%2F2830772.2830807&rft.externalDocID=7856597