Enabling Efficient Large Recommendation Model Training with Near CXL Memory Processing

Personalized recommendation systems have become one of the most important Internet services nowadays. A critical challenge of training and deploying the recommendation models is their high memory capacity and bandwidth demands, with the embedding layers occupying hundreds of GBs to TBs of storage. T...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) s. 382 - 395
Hlavní autoři: Liu, Haifeng, Zheng, Long, Huang, Yu, Zhou, Jingyi, Liu, Chaoqiang, Wang, Runze, Liao, Xiaofei, Jin, Hai, Xue, Jingling
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 29.06.2024
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Personalized recommendation systems have become one of the most important Internet services nowadays. A critical challenge of training and deploying the recommendation models is their high memory capacity and bandwidth demands, with the embedding layers occupying hundreds of GBs to TBs of storage. The advent of memory disaggregation technology and Compute Express Link (CXL) provides a promising solution for memory capacity scaling. However, relocating memory-intensive embedding layers to CXL memory incurs noticeable performance degradation due to its limited transmission bandwidth, which is significantly lower than the host memory bandwidth. To address this, we introduce ReCXL, a CXL memory disaggregation system that utilizes near-memory processing for scalable, efficient recommendation model training. ReCXL features a unified, hardwareefficient NMP architecture that processes the entire embedding training within CXL memory, minimizing data transfers over the bandwidth-limited CXL and enhancing internal bandwidth. To further improve the performance, ReCXL incorporates softwarehardware co-optimizations, including sophisticated dependencyfree prefetching and fine-grained update scheduling, to maximize hardware utilization. Evaluation results show that ReCXL outperforms the CPU-GPU baseline and the naïve CXL memory by 7.1 \times \sim 10.6 \times(9.4 \times on average) and 12.7 \times \sim 31.3 \times(22.6 \times on average), respectively.
AbstractList Personalized recommendation systems have become one of the most important Internet services nowadays. A critical challenge of training and deploying the recommendation models is their high memory capacity and bandwidth demands, with the embedding layers occupying hundreds of GBs to TBs of storage. The advent of memory disaggregation technology and Compute Express Link (CXL) provides a promising solution for memory capacity scaling. However, relocating memory-intensive embedding layers to CXL memory incurs noticeable performance degradation due to its limited transmission bandwidth, which is significantly lower than the host memory bandwidth. To address this, we introduce ReCXL, a CXL memory disaggregation system that utilizes near-memory processing for scalable, efficient recommendation model training. ReCXL features a unified, hardwareefficient NMP architecture that processes the entire embedding training within CXL memory, minimizing data transfers over the bandwidth-limited CXL and enhancing internal bandwidth. To further improve the performance, ReCXL incorporates softwarehardware co-optimizations, including sophisticated dependencyfree prefetching and fine-grained update scheduling, to maximize hardware utilization. Evaluation results show that ReCXL outperforms the CPU-GPU baseline and the naïve CXL memory by 7.1 \times \sim 10.6 \times(9.4 \times on average) and 12.7 \times \sim 31.3 \times(22.6 \times on average), respectively.
Author Xue, Jingling
Liu, Chaoqiang
Huang, Yu
Zhou, Jingyi
Wang, Runze
Liao, Xiaofei
Jin, Hai
Zheng, Long
Liu, Haifeng
Author_xml – sequence: 1
  givenname: Haifeng
  surname: Liu
  fullname: Liu, Haifeng
  email: hfliu@hust.edu.cn
  organization: Huazhong University of Science and Technology,National Engineering Research Center for Big Data Technology and System/Services Computing Technology and System Lab/Cluster and Grid Computing Lab,China
– sequence: 2
  givenname: Long
  surname: Zheng
  fullname: Zheng, Long
  email: longzh@hust.edu.cn
  organization: Huazhong University of Science and Technology,National Engineering Research Center for Big Data Technology and System/Services Computing Technology and System Lab/Cluster and Grid Computing Lab,China
– sequence: 3
  givenname: Yu
  surname: Huang
  fullname: Huang, Yu
  email: yuh@hust.edu.cn
  organization: Huazhong University of Science and Technology,National Engineering Research Center for Big Data Technology and System/Services Computing Technology and System Lab/Cluster and Grid Computing Lab,China
– sequence: 4
  givenname: Jingyi
  surname: Zhou
  fullname: Zhou, Jingyi
  email: zjy9695@hust.edu.cn
  organization: Huazhong University of Science and Technology,National Engineering Research Center for Big Data Technology and System/Services Computing Technology and System Lab/Cluster and Grid Computing Lab,China
– sequence: 5
  givenname: Chaoqiang
  surname: Liu
  fullname: Liu, Chaoqiang
  email: chqliu@hust.edu.cn
  organization: Huazhong University of Science and Technology,National Engineering Research Center for Big Data Technology and System/Services Computing Technology and System Lab/Cluster and Grid Computing Lab,China
– sequence: 6
  givenname: Runze
  surname: Wang
  fullname: Wang, Runze
  email: rzwang@hust.edu.cn
  organization: Huazhong University of Science and Technology,National Engineering Research Center for Big Data Technology and System/Services Computing Technology and System Lab/Cluster and Grid Computing Lab,China
– sequence: 7
  givenname: Xiaofei
  surname: Liao
  fullname: Liao, Xiaofei
  email: xfliao@hust.edu.cn
  organization: Huazhong University of Science and Technology,National Engineering Research Center for Big Data Technology and System/Services Computing Technology and System Lab/Cluster and Grid Computing Lab,China
– sequence: 8
  givenname: Hai
  surname: Jin
  fullname: Jin, Hai
  email: hjin@hust.edu.cn
  organization: Huazhong University of Science and Technology,National Engineering Research Center for Big Data Technology and System/Services Computing Technology and System Lab/Cluster and Grid Computing Lab,China
– sequence: 9
  givenname: Jingling
  surname: Xue
  fullname: Xue, Jingling
  email: jingling@cse.unsw.edu.au
  organization: Zhejiang Lab,Hangzhou,China
BookMark eNotj81KAzEURiMoqLVv0EVeoPXmbzJZlqFqYaqiVdyVO8mdGuhkJDMgfXt_V9_mcA7fJTtNfSLGZgIWQoC7Xj9XS-PA2oUEqRcAoIoTNnXWlcqAkoUpxTmbDkNsoABnlS3NBXtdJWwOMe35qm2jj5RGXmPeE38i33cdpYBj7BPf9IEOfJsxph_6M47v_J4w8-qt5hvq-nzkj7n39B1I-yt21uJhoOn_TtjLzWpb3c3rh9t1taznKE05zjU2TXCmQeWk8GiD09aUWmpviIIn2xbeGoFtA2itJe3AhFBKZ2RROA1qwmZ_3khEu48cO8zHnfg9KI36AhEgUpY
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ISCA59077.2024.00036
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350326581
EndPage 395
ExternalDocumentID 10609725
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  funderid: 10.13039/501100001809
– fundername: Research and Development
  funderid: 10.13039/100006190
GroupedDBID 6IE
6IH
ACM
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a258t-4abbd95ba3921ca7d94758424c5eedce7f6c751afb0a777e4905dd82952669403
IEDL.DBID RIE
ISICitedReferencesCount 6
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001290320700026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 03:08:18 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a258t-4abbd95ba3921ca7d94758424c5eedce7f6c751afb0a777e4905dd82952669403
PageCount 14
ParticipantIDs ieee_primary_10609725
PublicationCentury 2000
PublicationDate 2024-June-29
PublicationDateYYYYMMDD 2024-06-29
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-June-29
  day: 29
PublicationDecade 2020
PublicationTitle 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)
PublicationTitleAbbrev ISCA
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib060973785
Score 2.3127546
Snippet Personalized recommendation systems have become one of the most important Internet services nowadays. A critical challenge of training and deploying the...
SourceID ieee
SourceType Publisher
StartPage 382
SubjectTerms Bandwidth
Data transfer
Degradation
Hardware
Memory management
Prefetching
Recommender systems
Scalability
Training
Web and internet services
Title Enabling Efficient Large Recommendation Model Training with Near CXL Memory Processing
URI https://ieeexplore.ieee.org/document/10609725
WOSCitedRecordID wos001290320700026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9uePCk4sRvcvBabbMkrznK2FCYY-CU3Ua-CoJuMjvB_973sk69ePDWNrSFl5T3Xvr7YOxSKkwTXthMOFdmMuoiM1WFhVwQkgTcKueSa8kQRqNyOjXjhqyeuDAxxgQ-i1d0mP7lh4Vf0VYZfuGa1GZUi7UA9JqstVk8NNKFUjX0uCI313cPvRuFzR9gGyhIJDsnIeZfJiophwx2__n2Pdb5YePx8Xee2WdbcX7AnvpEesJz3k8iEHgrHxKqm1ND-YqPWpslcTI7e-GTxgmC074rH-Hy5r3pkN8TzvaTN2wBHO-wx0F_0rvNGo-EzApV1pm0zgWjnMU6p_AWgpHYAUghvYoE8IRKe1CFrVxuASBKk6sQSmEUZmYj8-4ha88X83jEuNbBq4AlnJdaBukMGCgiiJDjFJYQj1mHgjJ7W8tgzDbxOPnj-inbobgTrkqYM9aul6t4zrb9R_38vrxIk_cFiMCanA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8NAEB60CnpSseLbPXiNJtvdbPYopaXFNBSs0lvZV0DQVmor-O-dSVP14sFbkiUJ7GyYmc33ALgWEtOE4ybi1maRCGkS6bLEQs5zQQJupbWVa0muiiIbj_WwJqtXXJgQQgU-Czd0WP3L9zO3pK0y_MJTUpuRm7BF1lk1XWu9fGispTJZE-SSWN_2H9p3Ets_hY0gJ5nsmKSYf9moVFmku_fP9-9D84ePx4bfmeYANsL0EJ46RHvCc9apZCDwVpYTrptRS_mKj1rZJTGyO3tho9oLgtHOKytwgbP2OGcDQtp-spovgONNeOx2Ru1eVLskRIbLbBEJY63X0hqsdBJnlNcCewDBhZOBIJ6qTJ2SiSltbJRSQehYep9xLTE3axG3jqAxnU3DMbA09U56LOKcSIUXViutkqC4jzGImQon0KRJmbythDAm6_k4_eP6Fez0RoN8kveL-zPYpRgQyorrc2gs5stwAdvuY_H8Pr-sAvkFx2Kd5Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+ACM%2FIEEE+51st+Annual+International+Symposium+on+Computer+Architecture+%28ISCA%29&rft.atitle=Enabling+Efficient+Large+Recommendation+Model+Training+with+Near+CXL+Memory+Processing&rft.au=Liu%2C+Haifeng&rft.au=Zheng%2C+Long&rft.au=Huang%2C+Yu&rft.au=Zhou%2C+Jingyi&rft.date=2024-06-29&rft.pub=IEEE&rft.spage=382&rft.epage=395&rft_id=info:doi/10.1109%2FISCA59077.2024.00036&rft.externalDocID=10609725