ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast
Near-Memory Processing (NMP) systems that integrate accelerators within DIMM (Dual-Inline Memory Module) buffer chips potentially provide high performance with relatively low design and manufacturing costs. However, an inevitable communication bottleneck arises when considering the main memory bus a...
Gespeichert in:
| Veröffentlicht in: | Proceedings - International Symposium on Computer Architecture S. 237 - 250 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.06.2021
|
| Schlagworte: | |
| ISSN: | 2575-713X |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Near-Memory Processing (NMP) systems that integrate accelerators within DIMM (Dual-Inline Memory Module) buffer chips potentially provide high performance with relatively low design and manufacturing costs. However, an inevitable communication bottleneck arises when considering the main memory bus among peer DIMMs and the host CPU. This communication bottleneck roots in the bus-based nature and the limited point-to-point communication pattern of the main memory system. The aggregated memory bandwidth of DIMM- based NMP scales with the number of DIMMs. When the number of DIMMs in a channel scales up, the per-DIMM point-to-point communication bandwidth scales down, whereas the computation resources and local memory bandwidth per DIMM stay the same. For many important sparse data-intensive workloads like graph applications and sparse tensor algebra, we identify that communication among DIMMs and the host CPU easily dominates their processing procedure in previous DIMM-based NMP systems, which severely bottlenecks their performance.To tackle this challenge, we propose that inter-DIMM broadcast should be implemented and utilized in the main memory system of DIMM-based NMP. On the hardware side, the main memory bus naturally scales out with broadcast, where per- DIMM effective bandwidth of broadcast remains the same as the number of DIMMs grows. On the software side, many sparse applications can be implemented in a form such that broadcasts dominate their communication. Based on these ideas, we design ABC-DIMM, which Alleviates the Bottleneck of Communication in DIMM-based NMP, consisting of integral broadcast mechanisms and Broadcast-Process programming framework, with minimized modifications to commodity software-hardware stack. Our evaluation shows that ABC-DIMM offers an 8.33 × geo-mean speedup over a 16-core CPU baseline, and outperforms two NMP baselines by 2.59 × and 2.93 × on average. |
|---|---|
| AbstractList | Near-Memory Processing (NMP) systems that integrate accelerators within DIMM (Dual-Inline Memory Module) buffer chips potentially provide high performance with relatively low design and manufacturing costs. However, an inevitable communication bottleneck arises when considering the main memory bus among peer DIMMs and the host CPU. This communication bottleneck roots in the bus-based nature and the limited point-to-point communication pattern of the main memory system. The aggregated memory bandwidth of DIMM- based NMP scales with the number of DIMMs. When the number of DIMMs in a channel scales up, the per-DIMM point-to-point communication bandwidth scales down, whereas the computation resources and local memory bandwidth per DIMM stay the same. For many important sparse data-intensive workloads like graph applications and sparse tensor algebra, we identify that communication among DIMMs and the host CPU easily dominates their processing procedure in previous DIMM-based NMP systems, which severely bottlenecks their performance.To tackle this challenge, we propose that inter-DIMM broadcast should be implemented and utilized in the main memory system of DIMM-based NMP. On the hardware side, the main memory bus naturally scales out with broadcast, where per- DIMM effective bandwidth of broadcast remains the same as the number of DIMMs grows. On the software side, many sparse applications can be implemented in a form such that broadcasts dominate their communication. Based on these ideas, we design ABC-DIMM, which Alleviates the Bottleneck of Communication in DIMM-based NMP, consisting of integral broadcast mechanisms and Broadcast-Process programming framework, with minimized modifications to commodity software-hardware stack. Our evaluation shows that ABC-DIMM offers an 8.33 × geo-mean speedup over a 16-core CPU baseline, and outperforms two NMP baselines by 2.59 × and 2.93 × on average. |
| Author | Wei, Shaojun Yin, Shouyi Sun, Weiyi Liu, Leibo Li, Zhaoshi |
| Author_xml | – sequence: 1 givenname: Weiyi surname: Sun fullname: Sun, Weiyi organization: Tsinghua University,Beijing National Research Center for Information Science and Technology (BNRist),School of Integrated Circuits,Beijing,China – sequence: 2 givenname: Zhaoshi surname: Li fullname: Li, Zhaoshi organization: Tsinghua University,Beijing National Research Center for Information Science and Technology (BNRist),School of Integrated Circuits,Beijing,China – sequence: 3 givenname: Shouyi surname: Yin fullname: Yin, Shouyi organization: Tsinghua University,Beijing National Research Center for Information Science and Technology (BNRist),School of Integrated Circuits,Beijing,China – sequence: 4 givenname: Shaojun surname: Wei fullname: Wei, Shaojun organization: Tsinghua University,Beijing National Research Center for Information Science and Technology (BNRist),School of Integrated Circuits,Beijing,China – sequence: 5 givenname: Leibo surname: Liu fullname: Liu, Leibo email: liulb@tsinghua.edu.cn organization: Tsinghua University,Beijing National Research Center for Information Science and Technology (BNRist),School of Integrated Circuits,Beijing,China |
| BookMark | eNotjctOwzAURA0CibbwBbDwD6Rcv2N2aXhVagEJkNhVjnNDA0mMEgPq39MCs5nF6JwZk4MudEjIGYMpY2DP5495pjgwPuXA2RQAuNkjY6a1kmIbs09GXBmVGCZejsh4GN4AmLVKj0jMZnlyOV8uL2jWNPhVu1h3rzSukc5CjA126N9pqGge2vazq_12Dx2tO7qDksINWNI7dH2yxDb0G_rQB4_DsJN813FN513E_veBzvrgSu-GeEwOK9cMePLfE_J8ffWU3yaL-5t5ni0Sx1MVE4-CF67SrjLAUpV6B6VKSwAvUyGdFamTBstCSAXaF8aikwVKrVmJ4I0WE3L6560RcfXR163rNysrrU1BiR_vCFyZ |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ISCA52012.2021.00027 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1665433337 9781665433334 |
| EISSN | 2575-713X |
| EndPage | 250 |
| ExternalDocumentID | 9499805 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China funderid: 10.13039/501100001809 |
| GroupedDBID | 23M 29F 29O 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS APO BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO ZY4 |
| ID | FETCH-LOGICAL-a285t-ce32baf6af701858ca0d58d00c4834a938a47edb34506cb79ea4be4661de0c763 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 30 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000702275600018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:39:28 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a285t-ce32baf6af701858ca0d58d00c4834a938a47edb34506cb79ea4be4661de0c763 |
| PageCount | 14 |
| ParticipantIDs | ieee_primary_9499805 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-June |
| PublicationDateYYYYMMDD | 2021-06-01 |
| PublicationDate_xml | – month: 06 year: 2021 text: 2021-June |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings - International Symposium on Computer Architecture |
| PublicationTitleAbbrev | ISCA |
| PublicationYear | 2021 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0019956 |
| Score | 2.4012554 |
| Snippet | Near-Memory Processing (NMP) systems that integrate accelerators within DIMM (Dual-Inline Memory Module) buffer chips potentially provide high performance with... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 237 |
| SubjectTerms | Algebra Bandwidth Broadcast-Process framework Computer architecture inter-DIMM broadcast Memory modules near-memory processing Programming Software sparse applications Tensors |
| Title | ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast |
| URI | https://ieeexplore.ieee.org/document/9499805 |
| WOSCitedRecordID | wos000702275600018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JawIxFA5Weuipi5bu5NBjoxmTmcn0prZSoYrQBW-S5Q1Iy0zRsdB_37xxahF66S0EMoEkky9v-b5HyLWNnLVRkrAw5YL5Cy9ihkPCAuO7hTSBLXW2Xx_j8VhNp8mkRm42XBgAKJPPoIXNMpbvcrtCV1kbhVQUCpbuxHG05mptIgbI0KyocQFP2sOnfjf04IZcq07QKiNsWwVUSvwY7P9v5gPS_CXi0ckGYg5JDbIjsv9TiYFWP2aDFN1en90NR6Nb2n1HwrjGbGbqH3e0l6NKcQb2jeYp3eKD0HlGcRBDKHN07A89G2Hm7Ret-AP4EfTU0tJxWM5AveGundXLokleBvfP_QdW1VNguqPCglkQHaPTSKcx9zCtrOYuVI5zix5FnQilZQzOCBnyyJo4AS0NSI_gDrj1F9ExqWd5BieECgnezvGmln8fSbBaaSUDBcIZy4UUcEoauIizj7Vkxqxav7O_u8_JHu7SOgPrgtSLxQouya79LObLxVW5z9_u9qlG |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG6ImugJFYy_7cGjhY52W-cNUAKRERLRcCNd-0iIZhgYJv739o2JIfHirWnSNWm7fn0_vu8RcmsCa0wQRcyfcsHchRewhEPEvMR1C5l4JtfZfu2Hg4Eaj6NhidxtuDAAkCefQQ2beSzfzs0KXWV1FFJRKFi6i5WzCrbWJmaAHM2CHOfxqN57bjd9B2_Itmp4tTzGtlVCJUeQTvl_cx-S6i8Vjw43IHNESpAek_JPLQZa_JoVkjVbbfbQi-N72nxHyrjGfGbqnne0NUed4hTMG51P6RYjhM5SioMYgpmlA3fsWYy5t1-0YBDgR9BXS3PXYT4Ddaa7tkYvsyp56TyO2l1WVFRguqH8jBkQjURPAz0NuQNqZTS3vrKcG_Qp6kgoLUOwiZA-D0wSRqBlAtJhuAVu3FV0QnbSeQqnhAoJztJxxpZ7IUkwWmklPQXCJoYLKeCMVHARJx9r0YxJsX7nf3ffkP3uKO5P-r3B0wU5wB1b52Ndkp1ssYIrsmc-s9lycZ3v-TfuaqyP |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+Computer+Architecture&rft.atitle=ABC-DIMM%3A+Alleviating+the+Bottleneck+of+Communication+in+DIMM-based+Near-Memory+Processing+with+Inter-DIMM+Broadcast&rft.au=Sun%2C+Weiyi&rft.au=Li%2C+Zhaoshi&rft.au=Yin%2C+Shouyi&rft.au=Wei%2C+Shaojun&rft.date=2021-06-01&rft.pub=IEEE&rft.eissn=2575-713X&rft.spage=237&rft.epage=250&rft_id=info:doi/10.1109%2FISCA52012.2021.00027&rft.externalDocID=9499805 |