Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design
Irregular workloads are typically bottlenecked by the memory system. These workloads often use sparse data representations, e.g., compressed sparse row/column (CSR/CSC), to conserve space at the cost of complicated, irregular traversals. Such traversals access large volumes of data and offer little...
Gespeichert in:
| Veröffentlicht in: | Proceedings - International Symposium on High-Performance Computer Architecture S. 654 - 667 |
|---|---|
| Hauptverfasser: | , , , , , , , , , , , , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.02.2021
|
| Schlagworte: | |
| ISSN: | 2378-203X |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Irregular workloads are typically bottlenecked by the memory system. These workloads often use sparse data representations, e.g., compressed sparse row/column (CSR/CSC), to conserve space at the cost of complicated, irregular traversals. Such traversals access large volumes of data and offer little locality for caches and conventional prefetchers to exploit. This paper presents Prodigy, a low-cost hardware-software codesign solution for intelligent prefetching to improve the memory latency of several important irregular workloads. Prodigy targets irregular workloads including graph analytics, sparse linear algebra, and fluid mechanics that exhibit two specific types of data-dependent memory access patterns. Prodigy adopts a "best of both worlds" approach by using static program information from software, and dynamic run-time information from hardware. The core of the system is the Data Indirection Graph (DIG)-a proposed compact representation used to express program semantics such as the layout and memory access patterns of key data structures. The DIG representation is agnostic to a particular data structure format and is demonstrated to work with several sparse formats including CSR and CSC. Program semantics are automatically captured with a compiler pass, encoded as a DIG, and inserted into the application binary. The DIG is then used to program a low-cost hardware prefetcher to fetch data according to an irregular algorithm's data structure traversal pattern. We equip the prefetcher with a flexible prefetching algorithm that maintains timeliness by dynamically adapting its prefetch distance to an application's execution pace. We evaluate the performance, energy consumption, and transistor cost of Prodigy using a variety of algorithms from the GAP, HPCG, and NAS benchmark suites. We compare the performance of Prodigy against a non-prefetching baseline as well as state-of-the-art prefetchers. We show that by using just 0.8KB of storage, Prodigy outperforms a non-prefetching baseline by 2.6 \times and saves energy by 1.6 \times, on average. Prodigy also outperforms modern data prefetchers by 1.5- 2.3 \times. |
|---|---|
| AbstractList | Irregular workloads are typically bottlenecked by the memory system. These workloads often use sparse data representations, e.g., compressed sparse row/column (CSR/CSC), to conserve space at the cost of complicated, irregular traversals. Such traversals access large volumes of data and offer little locality for caches and conventional prefetchers to exploit. This paper presents Prodigy, a low-cost hardware-software codesign solution for intelligent prefetching to improve the memory latency of several important irregular workloads. Prodigy targets irregular workloads including graph analytics, sparse linear algebra, and fluid mechanics that exhibit two specific types of data-dependent memory access patterns. Prodigy adopts a "best of both worlds" approach by using static program information from software, and dynamic run-time information from hardware. The core of the system is the Data Indirection Graph (DIG)-a proposed compact representation used to express program semantics such as the layout and memory access patterns of key data structures. The DIG representation is agnostic to a particular data structure format and is demonstrated to work with several sparse formats including CSR and CSC. Program semantics are automatically captured with a compiler pass, encoded as a DIG, and inserted into the application binary. The DIG is then used to program a low-cost hardware prefetcher to fetch data according to an irregular algorithm's data structure traversal pattern. We equip the prefetcher with a flexible prefetching algorithm that maintains timeliness by dynamically adapting its prefetch distance to an application's execution pace. We evaluate the performance, energy consumption, and transistor cost of Prodigy using a variety of algorithms from the GAP, HPCG, and NAS benchmark suites. We compare the performance of Prodigy against a non-prefetching baseline as well as state-of-the-art prefetchers. We show that by using just 0.8KB of storage, Prodigy outperforms a non-prefetching baseline by 2.6 \times and saves energy by 1.6 \times, on average. Prodigy also outperforms modern data prefetchers by 1.5- 2.3 \times. |
| Author | Austin, Todd Yang, Yichen Mahlke, Scott Nguyen, Brandon Mudge, Trevor Dreslinski, Ronald Ahmadi, Agreen Morton, John Magnus Behroozi, Armand Sun, Jiawen Talati, Nishil Verma, Tarunesh May, Kyle O'Boyle, Michael Li, Lu Vasiladiotis, Christos Kaszyk, Kuba |
| Author_xml | – sequence: 1 givenname: Nishil surname: Talati fullname: Talati, Nishil email: talatin@umich.edu organization: University of Michigan – sequence: 2 givenname: Kyle surname: May fullname: May, Kyle organization: University of Michigan – sequence: 3 givenname: Armand surname: Behroozi fullname: Behroozi, Armand organization: University of Michigan – sequence: 4 givenname: Yichen surname: Yang fullname: Yang, Yichen organization: University of Michigan – sequence: 5 givenname: Kuba surname: Kaszyk fullname: Kaszyk, Kuba organization: University of Edinburgh – sequence: 6 givenname: Christos surname: Vasiladiotis fullname: Vasiladiotis, Christos organization: University of Edinburgh – sequence: 7 givenname: Tarunesh surname: Verma fullname: Verma, Tarunesh organization: University of Michigan – sequence: 8 givenname: Lu surname: Li fullname: Li, Lu organization: University of Edinburgh – sequence: 9 givenname: Brandon surname: Nguyen fullname: Nguyen, Brandon organization: University of Michigan – sequence: 10 givenname: Jiawen surname: Sun fullname: Sun, Jiawen organization: University of Edinburgh – sequence: 11 givenname: John Magnus surname: Morton fullname: Morton, John Magnus organization: University of Edinburgh – sequence: 12 givenname: Agreen surname: Ahmadi fullname: Ahmadi, Agreen organization: University of Michigan – sequence: 13 givenname: Todd surname: Austin fullname: Austin, Todd organization: University of Michigan – sequence: 14 givenname: Michael surname: O'Boyle fullname: O'Boyle, Michael organization: University of Edinburgh – sequence: 15 givenname: Scott surname: Mahlke fullname: Mahlke, Scott organization: University of Michigan – sequence: 16 givenname: Trevor surname: Mudge fullname: Mudge, Trevor organization: University of Michigan – sequence: 17 givenname: Ronald surname: Dreslinski fullname: Dreslinski, Ronald organization: University of Michigan |
| BookMark | eNotUM1OwkAYXI0mAvoEetgXWNz_tt5IQWmCkUSJ3shH92utli7ZVk3f3hI9zRxmJjMzJmeNb5CQG8GnQvDkdrlOZ0ZYHU0ll2LKObfihIyFtUZLqYw4JSOpophJrt4uyLhtPwaNTIwYkXodvKvK_o5m-0Pw31VT0u4d6SPufejpCjps8p76gs6hA5Y1rgqYdzQLAcuvGgJ99eGz9uBaummP7iUE9wMB2bMvuiOhqWdzbKuyuSTnBdQtXv3jhGzuFy_pkq2eHrJ0tmLV0LBjubZoikSjjWRsVIFxZGOjde4ibXZoczCcI5eqcLtYcaPAxsNMB1GhAWNQE3L9l1sh4vYQqj2EfptoHsnhj1_k-VrW |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/HPCA51647.2021.00061 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1665422351 9781665422352 |
| EISSN | 2378-203X |
| EndPage | 667 |
| ExternalDocumentID | 9407222 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Defense Advanced Research Projects Agency funderid: 10.13039/100000185 – fundername: National Science Foundation funderid: 10.13039/100000001 |
| GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS |
| ID | FETCH-LOGICAL-i203t-c46e5f94e672853fe8768544cd745be6ca500e023fdb83053a68235da7f4ae8a3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 39 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000671076000049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:28:04 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-c46e5f94e672853fe8768544cd745be6ca500e023fdb83053a68235da7f4ae8a3 |
| PageCount | 14 |
| ParticipantIDs | ieee_primary_9407222 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-Feb. |
| PublicationDateYYYYMMDD | 2021-02-01 |
| PublicationDate_xml | – month: 02 year: 2021 text: 2021-Feb. |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings - International Symposium on High-Performance Computer Architecture |
| PublicationTitleAbbrev | HPCA |
| PublicationYear | 2021 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0002951 |
| Score | 2.4258292 |
| Snippet | Irregular workloads are typically bottlenecked by the memory system. These workloads often use sparse data representations, e.g., compressed sparse row/column... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 654 |
| SubjectTerms | and hardware prefetching compiler Data structures DRAM stalls graph processing hardware-software co-design Heuristic algorithms irregular workloads Layout Prefetching programmer annotations programming model Random access memory Semantics Software algorithms |
| Title | Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design |
| URI | https://ieeexplore.ieee.org/document/9407222 |
| WOSCitedRecordID | wos000671076000049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA5zePA0dRN_k4NHs7Vp06TeZHNsMMdAhd1Gmr7IQFrpOmX_vUlaK4IXbyEUCu-173t5-b73ELqJuKSgFSdgPiYSCqZJklBNfKb8mKecxaETCs_4fC6Wy3jRQreNFgYAHPkM-nbp7vLTXG1tqWwQ225e1ATcPc55pdVqoi41qUItjfO9eDBZDO-Z7ZVljoDU7zuo_jVAxeHHuPO_Nx-i3o8QDy8aiDlCLciOUed7EgOuf8wuejMPpevX3R1uigTYZHb40fJod3gmbWa8w7nGI1lKMs0qJMPTonCz6Atsi-ZvuUw32JEIsL3R_5QFkCcTp-0CD3MycnSPHnoZPzwPJ6Seo0DW1AtKosIImDZGjzg16KzBREDBwlClPGQJRG4qAhjw1mkijMsCGQkasFRyHUoQMjhB7SzP4BRhxYJUCF95iQ5CRZWAiIGnuaelyYsUPUNda7zVe9UqY1Xb7fzv7Qt0YL1TkaAvUbsstnCF9tVHud4U186_X7SWp_k |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwEA5jCvo0dRN_mwcfzdamTZP6Jptjw20MnLC3kaYXGYxVuk3Zf2-S1ongi28hFAp37X2Xy_fdIXQXcUlBK07AfEwkFEyTJKGa-Ez5MU85i0MnFB7w0UhMp_G4gu53WhgAcOQzaNqlu8tPM7WxpbJWbLt5URNw91gYUr9Qa-3iLjXJQimO87241Ru3H5ntlmUOgdRvOrD-NULFIUi39r93H6HGjxQPj3cgc4wqsDxBte9ZDLj8NetoYR5K52_bB7wrE2CT2-GhZdJu8UDa3HiLM407ci1Jf1lgGe7nuZtGn2NbNl9kMl1hRyPA9k7_U-ZAXkyktgvczkjHET4a6LX7NGn3SDlJgcypF6yJCiNg2pg94tTgswYTA4WxoUp5yBKI3FwEMPCt00QYpwUyEjRgqeQ6lCBkcIqqy2wJZwgrFqRC-MpLdBAqqgREDDzNPS1NZqToOapb483ei2YZs9JuF39v36KD3mQ4mA36o-dLdGg9VVCir1B1nW_gGu2rj_V8ld84X38BnbmrQA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+High-Performance+Computer+Architecture&rft.atitle=Prodigy%3A+Improving+the+Memory+Latency+of+Data-Indirect+Irregular+Workloads+Using+Hardware-Software+Co-Design&rft.au=Talati%2C+Nishil&rft.au=May%2C+Kyle&rft.au=Behroozi%2C+Armand&rft.au=Yang%2C+Yichen&rft.date=2021-02-01&rft.pub=IEEE&rft.eissn=2378-203X&rft.spage=654&rft.epage=667&rft_id=info:doi/10.1109%2FHPCA51647.2021.00061&rft.externalDocID=9407222 |