ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast

Near-Memory Processing (NMP) systems that integrate accelerators within DIMM (Dual-Inline Memory Module) buffer chips potentially provide high performance with relatively low design and manufacturing costs. However, an inevitable communication bottleneck arises when considering the main memory bus a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings - International Symposium on Computer Architecture S. 237 - 250
Hauptverfasser: Sun, Weiyi, Li, Zhaoshi, Yin, Shouyi, Wei, Shaojun, Liu, Leibo
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.06.2021
Schlagworte:
ISSN:2575-713X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Near-Memory Processing (NMP) systems that integrate accelerators within DIMM (Dual-Inline Memory Module) buffer chips potentially provide high performance with relatively low design and manufacturing costs. However, an inevitable communication bottleneck arises when considering the main memory bus among peer DIMMs and the host CPU. This communication bottleneck roots in the bus-based nature and the limited point-to-point communication pattern of the main memory system. The aggregated memory bandwidth of DIMM- based NMP scales with the number of DIMMs. When the number of DIMMs in a channel scales up, the per-DIMM point-to-point communication bandwidth scales down, whereas the computation resources and local memory bandwidth per DIMM stay the same. For many important sparse data-intensive workloads like graph applications and sparse tensor algebra, we identify that communication among DIMMs and the host CPU easily dominates their processing procedure in previous DIMM-based NMP systems, which severely bottlenecks their performance.To tackle this challenge, we propose that inter-DIMM broadcast should be implemented and utilized in the main memory system of DIMM-based NMP. On the hardware side, the main memory bus naturally scales out with broadcast, where per- DIMM effective bandwidth of broadcast remains the same as the number of DIMMs grows. On the software side, many sparse applications can be implemented in a form such that broadcasts dominate their communication. Based on these ideas, we design ABC-DIMM, which Alleviates the Bottleneck of Communication in DIMM-based NMP, consisting of integral broadcast mechanisms and Broadcast-Process programming framework, with minimized modifications to commodity software-hardware stack. Our evaluation shows that ABC-DIMM offers an 8.33 × geo-mean speedup over a 16-core CPU baseline, and outperforms two NMP baselines by 2.59 × and 2.93 × on average.
AbstractList Near-Memory Processing (NMP) systems that integrate accelerators within DIMM (Dual-Inline Memory Module) buffer chips potentially provide high performance with relatively low design and manufacturing costs. However, an inevitable communication bottleneck arises when considering the main memory bus among peer DIMMs and the host CPU. This communication bottleneck roots in the bus-based nature and the limited point-to-point communication pattern of the main memory system. The aggregated memory bandwidth of DIMM- based NMP scales with the number of DIMMs. When the number of DIMMs in a channel scales up, the per-DIMM point-to-point communication bandwidth scales down, whereas the computation resources and local memory bandwidth per DIMM stay the same. For many important sparse data-intensive workloads like graph applications and sparse tensor algebra, we identify that communication among DIMMs and the host CPU easily dominates their processing procedure in previous DIMM-based NMP systems, which severely bottlenecks their performance.To tackle this challenge, we propose that inter-DIMM broadcast should be implemented and utilized in the main memory system of DIMM-based NMP. On the hardware side, the main memory bus naturally scales out with broadcast, where per- DIMM effective bandwidth of broadcast remains the same as the number of DIMMs grows. On the software side, many sparse applications can be implemented in a form such that broadcasts dominate their communication. Based on these ideas, we design ABC-DIMM, which Alleviates the Bottleneck of Communication in DIMM-based NMP, consisting of integral broadcast mechanisms and Broadcast-Process programming framework, with minimized modifications to commodity software-hardware stack. Our evaluation shows that ABC-DIMM offers an 8.33 × geo-mean speedup over a 16-core CPU baseline, and outperforms two NMP baselines by 2.59 × and 2.93 × on average.
Author Wei, Shaojun
Yin, Shouyi
Sun, Weiyi
Liu, Leibo
Li, Zhaoshi
Author_xml – sequence: 1
  givenname: Weiyi
  surname: Sun
  fullname: Sun, Weiyi
  organization: Tsinghua University,Beijing National Research Center for Information Science and Technology (BNRist),School of Integrated Circuits,Beijing,China
– sequence: 2
  givenname: Zhaoshi
  surname: Li
  fullname: Li, Zhaoshi
  organization: Tsinghua University,Beijing National Research Center for Information Science and Technology (BNRist),School of Integrated Circuits,Beijing,China
– sequence: 3
  givenname: Shouyi
  surname: Yin
  fullname: Yin, Shouyi
  organization: Tsinghua University,Beijing National Research Center for Information Science and Technology (BNRist),School of Integrated Circuits,Beijing,China
– sequence: 4
  givenname: Shaojun
  surname: Wei
  fullname: Wei, Shaojun
  organization: Tsinghua University,Beijing National Research Center for Information Science and Technology (BNRist),School of Integrated Circuits,Beijing,China
– sequence: 5
  givenname: Leibo
  surname: Liu
  fullname: Liu, Leibo
  email: liulb@tsinghua.edu.cn
  organization: Tsinghua University,Beijing National Research Center for Information Science and Technology (BNRist),School of Integrated Circuits,Beijing,China
BookMark eNotjctOwzAURA0CibbwBbDwD6Rcv2N2aXhVagEJkNhVjnNDA0mMEgPq39MCs5nF6JwZk4MudEjIGYMpY2DP5495pjgwPuXA2RQAuNkjY6a1kmIbs09GXBmVGCZejsh4GN4AmLVKj0jMZnlyOV8uL2jWNPhVu1h3rzSukc5CjA126N9pqGge2vazq_12Dx2tO7qDksINWNI7dH2yxDb0G_rQB4_DsJN813FN513E_veBzvrgSu-GeEwOK9cMePLfE_J8ffWU3yaL-5t5ni0Sx1MVE4-CF67SrjLAUpV6B6VKSwAvUyGdFamTBstCSAXaF8aikwVKrVmJ4I0WE3L6560RcfXR163rNysrrU1BiR_vCFyZ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ISCA52012.2021.00027
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1665433337
9781665433334
EISSN 2575-713X
EndPage 250
ExternalDocumentID 9499805
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  funderid: 10.13039/501100001809
GroupedDBID 23M
29F
29O
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
APO
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
ZY4
ID FETCH-LOGICAL-a285t-ce32baf6af701858ca0d58d00c4834a938a47edb34506cb79ea4be4661de0c763
IEDL.DBID RIE
ISICitedReferencesCount 30
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000702275600018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:39:28 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a285t-ce32baf6af701858ca0d58d00c4834a938a47edb34506cb79ea4be4661de0c763
PageCount 14
ParticipantIDs ieee_primary_9499805
PublicationCentury 2000
PublicationDate 2021-June
PublicationDateYYYYMMDD 2021-06-01
PublicationDate_xml – month: 06
  year: 2021
  text: 2021-June
PublicationDecade 2020
PublicationTitle Proceedings - International Symposium on Computer Architecture
PublicationTitleAbbrev ISCA
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0019956
Score 2.4011698
Snippet Near-Memory Processing (NMP) systems that integrate accelerators within DIMM (Dual-Inline Memory Module) buffer chips potentially provide high performance with...
SourceID ieee
SourceType Publisher
StartPage 237
SubjectTerms Algebra
Bandwidth
Broadcast-Process framework
Computer architecture
inter-DIMM broadcast
Memory modules
near-memory processing
Programming
Software
sparse applications
Tensors
Title ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast
URI https://ieeexplore.ieee.org/document/9499805
WOSCitedRecordID wos000702275600018&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA21ePBUtRW_ycGj22Y_k_XWVosFWwoq9FaykykUZVfareC_N5OulYIXb0tgN5Bk85KZee8xdiOlD6Tc5IGxd9UIwe6DfhZ5pNwiQi1VKBxR-EmOx2o6TSc1drvlwiCiKz7DNj26XL4pYE2hsg4JqSgSLN2TMtlwtbYZA2JoVtQ4X6Sd4XO_G1twI65V4Lddhm3HQMXhx6Dxv54PWeuXiMcnW4g5YjXMj1njx4mBVz9mk5XdXt-7H45Gd7z7ToRxTdXM3B7ueK8gleIc4Y0Xc77DB-GLnNNLHkGZ4WO76L0RVd5-8Yo_QB-hSC13gUPXA7cXd21Ar8oWex08vPQfvcpPwdOBiksPMAwyPU_0XAoL0wq0MLEyQgBFFHUaKh1JNFkYxSKBTKaoowwji-AGBdiN6ITV8yLHU8bVXCAkAD456Ia-ycDXgSGTiRh1KrMz1qRBnH1sJDNm1fid_918wQ5oljYVWJesXi7XeMX24bNcrJbXbp6_AUk0qPs
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA5jCnqauom_zcGj2dK1XVpv23RsuJaBE3YbafIKQ2ll6wT_e_OyOhl48VYCbSBJ8yXvve_7CLkTwlGo3MSUNndVD5TZB53EY6jcwl0pApdbovBYxHEwm4WTCrnfcmEAwBafQRMfbS5f52qNobIWCqkEKFi6h85ZJVtrmzNAjmZJjnN42Bq99Lu-gTdkW7Wdps2x7VioWAQZ1P7X9xFp_FLx6GQLMsekAtkJqf14MdDy16yTotvrs8dRFD3Q7jtSxiXWM1NzvKO9HHWKM1BvNE_pDiOELjKKLzEEM01js-xZhLW3X7RkEOBHMFZLbejQ9kDN1V1qJVdFg7wOnqb9ISsdFZhsB37BFLjtRKYdmQpugDpQkms_0JwrjCnK0A2kJ0AnrufzjkpECNJLwDMYroErsxWdkmqWZ3BGaJByUB2lHPTQdR2dKEe2NdpM-CBDkZyTOg7i_GMjmjEvx-_i7-ZbcjCcRuP5eBQ_X5JDnLFNPdYVqRbLNVyTffVZLFbLGzvn30N8rEQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+Computer+Architecture&rft.atitle=ABC-DIMM%3A+Alleviating+the+Bottleneck+of+Communication+in+DIMM-based+Near-Memory+Processing+with+Inter-DIMM+Broadcast&rft.au=Sun%2C+Weiyi&rft.au=Li%2C+Zhaoshi&rft.au=Yin%2C+Shouyi&rft.au=Wei%2C+Shaojun&rft.date=2021-06-01&rft.pub=IEEE&rft.eissn=2575-713X&rft.spage=237&rft.epage=250&rft_id=info:doi/10.1109%2FISCA52012.2021.00027&rft.externalDocID=9499805