TrimCaching: Parameter-Sharing AI Model Caching in Wireless Edge Networks

Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings of the International Conference on Distributed Computing Systems s. 36 - 46
Hlavní autori: Qu, Guanqiao, Lin, Zheng, Liu, Fangming, Chen, Xianhao, Huang, Kaibin
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 23.07.2024
Predmet:
ISSN:2575-8411
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with (1 - E) /2-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models.
AbstractList Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with (1 - E) /2-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models.
Author Lin, Zheng
Qu, Guanqiao
Chen, Xianhao
Liu, Fangming
Huang, Kaibin
Author_xml – sequence: 1
  givenname: Guanqiao
  surname: Qu
  fullname: Qu, Guanqiao
  email: gqqu@eee.hku.hk
  organization: University of Hong Kong,Department of Electrical and Electronic Engineering,Pok Fu Lam,Hong Kong SAR,China
– sequence: 2
  givenname: Zheng
  surname: Lin
  fullname: Lin, Zheng
  email: linzheng@eee.hku.hk
  organization: University of Hong Kong,Department of Electrical and Electronic Engineering,Pok Fu Lam,Hong Kong SAR,China
– sequence: 3
  givenname: Fangming
  surname: Liu
  fullname: Liu, Fangming
  email: fangminghk@gmail.com
  organization: Peng Cheng Laboratory,Shenzhen,China
– sequence: 4
  givenname: Xianhao
  surname: Chen
  fullname: Chen, Xianhao
  email: xchen@eee.hku.hk
  organization: University of Hong Kong,Department of Electrical and Electronic Engineering,Pok Fu Lam,Hong Kong SAR,China
– sequence: 5
  givenname: Kaibin
  surname: Huang
  fullname: Huang, Kaibin
  email: huangkb@eee.hku.hk
  organization: University of Hong Kong,Department of Electrical and Electronic Engineering,Pok Fu Lam,Hong Kong SAR,China
BookMark eNotT9tKAzEUjKJgW_sHCvmBrcnm7ltZqy7UC7TiY8luTtrodivJgvTvDdinYYaZYWaMLvpDDwjdUjKjlJi7unqoVpKYzEtS8hkhhLIzNDXKaCYI05IIc45GpVCi0JzSKzRO6SvbhJZshOp1DPvKtrvQb-_xu412DwPEYrWzMUt4XuOXg4MOnzw49PgzROggJbxwW8CvMPwe4ne6RpfedgmmJ5ygj8fFunoulm9PdTVfFoEqORReeua4M3me8430DrwG61qraGNBkcwYMYo1Umhu8jEDvqWgGZiGcdmwCbr57w0AsPnJ8208biiROcYF-wO_0U-Q
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICDCS60910.2024.00013
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350386059
EISSN 2575-8411
EndPage 46
ExternalDocumentID 10630945
Genre orig-research
GroupedDBID 29G
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i176t-f6f3d4d9983dfb6fdef8eadca71bae70f8e30973b658490919efc1e83e9b346b3
IEDL.DBID RIE
ISICitedReferencesCount 8
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001304430200004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:32:38 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i176t-f6f3d4d9983dfb6fdef8eadca71bae70f8e30973b658490919efc1e83e9b346b3
PageCount 11
ParticipantIDs ieee_primary_10630945
PublicationCentury 2000
PublicationDate 2024-July-23
PublicationDateYYYYMMDD 2024-07-23
PublicationDate_xml – month: 07
  year: 2024
  text: 2024-July-23
  day: 23
PublicationDecade 2020
PublicationTitle Proceedings of the International Conference on Distributed Computing Systems
PublicationTitleAbbrev ICDCS
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0005863
Score 2.3582695
Snippet Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can...
SourceID ieee
SourceType Publisher
StartPage 36
SubjectTerms Approximation algorithms
Computational modeling
Edge AI
Edge AI model caching
edge computing
edge intelligence
Greedy algorithms
model downloading
Servers
Simulation
Wireless networks
Title TrimCaching: Parameter-Sharing AI Model Caching in Wireless Edge Networks
URI https://ieeexplore.ieee.org/document/10630945
WOSCitedRecordID wos001304430200004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA62ePBUHxXf5OA1utu8dr3J2mJBSsEKvZVkM5GFupU-_P1mdrfWiwdvSQgEJo_JJN83HyG3sRW5ESZhXnrOhNSOpSbvMZOA8CCs5Laa6Rc9GiXTaTpuyOoVFwYAKvAZ3GGx-st3i3yDT2VhhysewhHZIi2tdU3W2uE5EsUbik4cpffD7Cl7VegNQxDYwxTZEUoY_JJQqTzIoPPPsQ9Jd8fFo-MfL3NE9qA8Jp2tGANt9uYJGU6WxUdWQyMf6Ngg6Cp0YJiROTTRxyFF3bM5bfrQoqQIfZ2Ho4723TvQUY0IX3XJ26A_yZ5Zo5PAilirNfPKcydcCJy481Z5Bz4JCyQ3OrYGdBRqHLPyWLxtpMEkKfg8hoRDarlQlp-Sdrko4YxQERlASUmptRFOSCukUxxMKiObS5DnpIummX3WqTBmW6tc_NF-SQ7Q-vgY2uNXpL1ebuCa7Odf62K1vKkm8BtwIZzx
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELWgIMGpLEXs-MDVkMRLEm4otGpEiSpRpN4qOx6jSCVFXfh-7CSlXDhwc6yRIs3YmYz93jyEbn3FcslkRAw3lDAeahLLPCAyAmaAKU5VFelBmGXReBwPG7J6xYUBgAp8BnduWN3l61m-ckdldocLassRvo12OGOBX9O1NoiOSNCGpON78X2aPCWvwuVDWwYGrkm250QMfomoVDmk1_7n2w9QZ8PGw8OfPHOItqA8Qu21HANuducxSkfz4iOpwZEPeCgd7MoaENeT2U7hxxQ75bMpbmxwUWIHfp3ajx3u6nfAWY0JX3TQW687SvqkUUoghR-KJTHCUM20LZ2oNkoYDSaySySXoa8khJ59oq4vj3L_G7F1SQwm9yGiECvKhKInqFXOSjhFmHkSnKgkD0PJNOOKcS0oyJh7KufAz1DHuWbyWTfDmKy9cv7H_A3a649eBpNBmj1foH0XCXc0GtBL1FrOV3CFdvOvZbGYX1fB_AbKaqA4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+International+Conference+on+Distributed+Computing+Systems&rft.atitle=TrimCaching%3A+Parameter-Sharing+AI+Model+Caching+in+Wireless+Edge+Networks&rft.au=Qu%2C+Guanqiao&rft.au=Lin%2C+Zheng&rft.au=Liu%2C+Fangming&rft.au=Chen%2C+Xianhao&rft.date=2024-07-23&rft.pub=IEEE&rft.eissn=2575-8411&rft.spage=36&rft.epage=46&rft_id=info:doi/10.1109%2FICDCS60910.2024.00013&rft.externalDocID=10630945