Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design

Irregular workloads are typically bottlenecked by the memory system. These workloads often use sparse data representations, e.g., compressed sparse row/column (CSR/CSC), to conserve space at the cost of complicated, irregular traversals. Such traversals access large volumes of data and offer little...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings - International Symposium on High-Performance Computer Architecture S. 654 - 667
Hauptverfasser: Talati, Nishil, May, Kyle, Behroozi, Armand, Yang, Yichen, Kaszyk, Kuba, Vasiladiotis, Christos, Verma, Tarunesh, Li, Lu, Nguyen, Brandon, Sun, Jiawen, Morton, John Magnus, Ahmadi, Agreen, Austin, Todd, O'Boyle, Michael, Mahlke, Scott, Mudge, Trevor, Dreslinski, Ronald
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.02.2021
Schlagworte:
ISSN:2378-203X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Irregular workloads are typically bottlenecked by the memory system. These workloads often use sparse data representations, e.g., compressed sparse row/column (CSR/CSC), to conserve space at the cost of complicated, irregular traversals. Such traversals access large volumes of data and offer little locality for caches and conventional prefetchers to exploit. This paper presents Prodigy, a low-cost hardware-software codesign solution for intelligent prefetching to improve the memory latency of several important irregular workloads. Prodigy targets irregular workloads including graph analytics, sparse linear algebra, and fluid mechanics that exhibit two specific types of data-dependent memory access patterns. Prodigy adopts a "best of both worlds" approach by using static program information from software, and dynamic run-time information from hardware. The core of the system is the Data Indirection Graph (DIG)-a proposed compact representation used to express program semantics such as the layout and memory access patterns of key data structures. The DIG representation is agnostic to a particular data structure format and is demonstrated to work with several sparse formats including CSR and CSC. Program semantics are automatically captured with a compiler pass, encoded as a DIG, and inserted into the application binary. The DIG is then used to program a low-cost hardware prefetcher to fetch data according to an irregular algorithm's data structure traversal pattern. We equip the prefetcher with a flexible prefetching algorithm that maintains timeliness by dynamically adapting its prefetch distance to an application's execution pace. We evaluate the performance, energy consumption, and transistor cost of Prodigy using a variety of algorithms from the GAP, HPCG, and NAS benchmark suites. We compare the performance of Prodigy against a non-prefetching baseline as well as state-of-the-art prefetchers. We show that by using just 0.8KB of storage, Prodigy outperforms a non-prefetching baseline by 2.6 \times and saves energy by 1.6 \times, on average. Prodigy also outperforms modern data prefetchers by 1.5- 2.3 \times.
AbstractList Irregular workloads are typically bottlenecked by the memory system. These workloads often use sparse data representations, e.g., compressed sparse row/column (CSR/CSC), to conserve space at the cost of complicated, irregular traversals. Such traversals access large volumes of data and offer little locality for caches and conventional prefetchers to exploit. This paper presents Prodigy, a low-cost hardware-software codesign solution for intelligent prefetching to improve the memory latency of several important irregular workloads. Prodigy targets irregular workloads including graph analytics, sparse linear algebra, and fluid mechanics that exhibit two specific types of data-dependent memory access patterns. Prodigy adopts a "best of both worlds" approach by using static program information from software, and dynamic run-time information from hardware. The core of the system is the Data Indirection Graph (DIG)-a proposed compact representation used to express program semantics such as the layout and memory access patterns of key data structures. The DIG representation is agnostic to a particular data structure format and is demonstrated to work with several sparse formats including CSR and CSC. Program semantics are automatically captured with a compiler pass, encoded as a DIG, and inserted into the application binary. The DIG is then used to program a low-cost hardware prefetcher to fetch data according to an irregular algorithm's data structure traversal pattern. We equip the prefetcher with a flexible prefetching algorithm that maintains timeliness by dynamically adapting its prefetch distance to an application's execution pace. We evaluate the performance, energy consumption, and transistor cost of Prodigy using a variety of algorithms from the GAP, HPCG, and NAS benchmark suites. We compare the performance of Prodigy against a non-prefetching baseline as well as state-of-the-art prefetchers. We show that by using just 0.8KB of storage, Prodigy outperforms a non-prefetching baseline by 2.6 \times and saves energy by 1.6 \times, on average. Prodigy also outperforms modern data prefetchers by 1.5- 2.3 \times.
Author Austin, Todd
Yang, Yichen
Mahlke, Scott
Nguyen, Brandon
Mudge, Trevor
Dreslinski, Ronald
Ahmadi, Agreen
Morton, John Magnus
Behroozi, Armand
Sun, Jiawen
Talati, Nishil
Verma, Tarunesh
May, Kyle
O'Boyle, Michael
Li, Lu
Vasiladiotis, Christos
Kaszyk, Kuba
Author_xml – sequence: 1
  givenname: Nishil
  surname: Talati
  fullname: Talati, Nishil
  email: talatin@umich.edu
  organization: University of Michigan
– sequence: 2
  givenname: Kyle
  surname: May
  fullname: May, Kyle
  organization: University of Michigan
– sequence: 3
  givenname: Armand
  surname: Behroozi
  fullname: Behroozi, Armand
  organization: University of Michigan
– sequence: 4
  givenname: Yichen
  surname: Yang
  fullname: Yang, Yichen
  organization: University of Michigan
– sequence: 5
  givenname: Kuba
  surname: Kaszyk
  fullname: Kaszyk, Kuba
  organization: University of Edinburgh
– sequence: 6
  givenname: Christos
  surname: Vasiladiotis
  fullname: Vasiladiotis, Christos
  organization: University of Edinburgh
– sequence: 7
  givenname: Tarunesh
  surname: Verma
  fullname: Verma, Tarunesh
  organization: University of Michigan
– sequence: 8
  givenname: Lu
  surname: Li
  fullname: Li, Lu
  organization: University of Edinburgh
– sequence: 9
  givenname: Brandon
  surname: Nguyen
  fullname: Nguyen, Brandon
  organization: University of Michigan
– sequence: 10
  givenname: Jiawen
  surname: Sun
  fullname: Sun, Jiawen
  organization: University of Edinburgh
– sequence: 11
  givenname: John Magnus
  surname: Morton
  fullname: Morton, John Magnus
  organization: University of Edinburgh
– sequence: 12
  givenname: Agreen
  surname: Ahmadi
  fullname: Ahmadi, Agreen
  organization: University of Michigan
– sequence: 13
  givenname: Todd
  surname: Austin
  fullname: Austin, Todd
  organization: University of Michigan
– sequence: 14
  givenname: Michael
  surname: O'Boyle
  fullname: O'Boyle, Michael
  organization: University of Edinburgh
– sequence: 15
  givenname: Scott
  surname: Mahlke
  fullname: Mahlke, Scott
  organization: University of Michigan
– sequence: 16
  givenname: Trevor
  surname: Mudge
  fullname: Mudge, Trevor
  organization: University of Michigan
– sequence: 17
  givenname: Ronald
  surname: Dreslinski
  fullname: Dreslinski, Ronald
  organization: University of Michigan
BookMark eNotUM1OwkAYXI0mAvoEetgXWNz_tt5IQWmCkUSJ3shH92utli7ZVk3f3hI9zRxmJjMzJmeNb5CQG8GnQvDkdrlOZ0ZYHU0ll2LKObfihIyFtUZLqYw4JSOpophJrt4uyLhtPwaNTIwYkXodvKvK_o5m-0Pw31VT0u4d6SPufejpCjps8p76gs6hA5Y1rgqYdzQLAcuvGgJ99eGz9uBaummP7iUE9wMB2bMvuiOhqWdzbKuyuSTnBdQtXv3jhGzuFy_pkq2eHrJ0tmLV0LBjubZoikSjjWRsVIFxZGOjde4ibXZoczCcI5eqcLtYcaPAxsNMB1GhAWNQE3L9l1sh4vYQqj2EfptoHsnhj1_k-VrW
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/HPCA51647.2021.00061
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1665422351
9781665422352
EISSN 2378-203X
EndPage 667
ExternalDocumentID 9407222
Genre orig-research
GrantInformation_xml – fundername: Defense Advanced Research Projects Agency
  funderid: 10.13039/100000185
– fundername: National Science Foundation
  funderid: 10.13039/100000001
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i203t-c46e5f94e672853fe8768544cd745be6ca500e023fdb83053a68235da7f4ae8a3
IEDL.DBID RIE
ISICitedReferencesCount 39
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000671076000049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:28:04 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-c46e5f94e672853fe8768544cd745be6ca500e023fdb83053a68235da7f4ae8a3
PageCount 14
ParticipantIDs ieee_primary_9407222
PublicationCentury 2000
PublicationDate 2021-Feb.
PublicationDateYYYYMMDD 2021-02-01
PublicationDate_xml – month: 02
  year: 2021
  text: 2021-Feb.
PublicationDecade 2020
PublicationTitle Proceedings - International Symposium on High-Performance Computer Architecture
PublicationTitleAbbrev HPCA
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0002951
Score 2.4258292
Snippet Irregular workloads are typically bottlenecked by the memory system. These workloads often use sparse data representations, e.g., compressed sparse row/column...
SourceID ieee
SourceType Publisher
StartPage 654
SubjectTerms and hardware prefetching
compiler
Data structures
DRAM stalls
graph processing
hardware-software co-design
Heuristic algorithms
irregular workloads
Layout
Prefetching
programmer annotations
programming model
Random access memory
Semantics
Software algorithms
Title Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design
URI https://ieeexplore.ieee.org/document/9407222
WOSCitedRecordID wos000671076000049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA5zePA0dRN_k4NHs7Vp06TeZHNsMMdAhd1Gmr7IQFrpOmX_vUlaK4IXbyEUCu-173t5-b73ELqJuKSgFSdgPiYSCqZJklBNfKb8mKecxaETCs_4fC6Wy3jRQreNFgYAHPkM-nbp7vLTXG1tqWwQ225e1ATcPc55pdVqoi41qUItjfO9eDBZDO-Z7ZVljoDU7zuo_jVAxeHHuPO_Nx-i3o8QDy8aiDlCLciOUed7EgOuf8wuejMPpevX3R1uigTYZHb40fJod3gmbWa8w7nGI1lKMs0qJMPTonCz6Atsi-ZvuUw32JEIsL3R_5QFkCcTp-0CD3MycnSPHnoZPzwPJ6Seo0DW1AtKosIImDZGjzg16KzBREDBwlClPGQJRG4qAhjw1mkijMsCGQkasFRyHUoQMjhB7SzP4BRhxYJUCF95iQ5CRZWAiIGnuaelyYsUPUNda7zVe9UqY1Xb7fzv7Qt0YL1TkaAvUbsstnCF9tVHud4U186_X7SWp_k
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwEA5jCvo0dRN_mwcfzdamTZP6Jptjw20MnLC3kaYXGYxVuk3Zf2-S1ongi28hFAp37X2Xy_fdIXQXcUlBK07AfEwkFEyTJKGa-Ez5MU85i0MnFB7w0UhMp_G4gu53WhgAcOQzaNqlu8tPM7WxpbJWbLt5URNw91gYUr9Qa-3iLjXJQimO87241Ru3H5ntlmUOgdRvOrD-NULFIUi39r93H6HGjxQPj3cgc4wqsDxBte9ZDLj8NetoYR5K52_bB7wrE2CT2-GhZdJu8UDa3HiLM407ci1Jf1lgGe7nuZtGn2NbNl9kMl1hRyPA9k7_U-ZAXkyktgvczkjHET4a6LX7NGn3SDlJgcypF6yJCiNg2pg94tTgswYTA4WxoUp5yBKI3FwEMPCt00QYpwUyEjRgqeQ6lCBkcIqqy2wJZwgrFqRC-MpLdBAqqgREDDzNPS1NZqToOapb483ei2YZs9JuF39v36KD3mQ4mA36o-dLdGg9VVCir1B1nW_gGu2rj_V8ld84X38BnbmrQA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+High-Performance+Computer+Architecture&rft.atitle=Prodigy%3A+Improving+the+Memory+Latency+of+Data-Indirect+Irregular+Workloads+Using+Hardware-Software+Co-Design&rft.au=Talati%2C+Nishil&rft.au=May%2C+Kyle&rft.au=Behroozi%2C+Armand&rft.au=Yang%2C+Yichen&rft.date=2021-02-01&rft.pub=IEEE&rft.eissn=2378-203X&rft.spage=654&rft.epage=667&rft_id=info:doi/10.1109%2FHPCA51647.2021.00061&rft.externalDocID=9407222