DDRM: An SLO-aware Deep Dynamic Resource Management Framework for Microservices

Loosely coupled microservice architectures have been widely adopted in cloud-native applications due to their inherent advantages in modularity, development agility, and scalability. However, the resulting complex and dynamic service topologies introduce intricate inter-service dependencies, which o...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings / IEEE International Conference on Cluster Computing s. 1 - 12
Hlavní autori: Tang, Liangping, Wang, Jin, Wang, Wanyou, Shi, Gaotao, Li, Zhijun
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 02.09.2025
Predmet:
ISSN:2168-9253
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Loosely coupled microservice architectures have been widely adopted in cloud-native applications due to their inherent advantages in modularity, development agility, and scalability. However, the resulting complex and dynamic service topologies introduce intricate inter-service dependencies, which often lead to backpressure effects and queuing delays. These phenomena significantly challenge traditional monolithic and rule-based resource management approaches, which struggle to capture the non-linear performance characteristics and long-term effects of resource allocation decisions in such environments. To address these challenges, we propose DDRM, a two-stage predictor-decider collaborative framework for dynamic resource management in microservice systems. DDRM integrates deep learning to model inter-service interactions and predict the probability of Service Level Objective (SLO) violations, and employs reinforcement learning to optimize resource allocation decisions by maximizing long-term cumulative rewards while meeting SLO targets. Extensive evaluations demonstrate that DDRM outperforms state-of-the-art baselines by up to 29.8 %, while exhibiting strong stability and adaptability under highly varying workloads.
AbstractList Loosely coupled microservice architectures have been widely adopted in cloud-native applications due to their inherent advantages in modularity, development agility, and scalability. However, the resulting complex and dynamic service topologies introduce intricate inter-service dependencies, which often lead to backpressure effects and queuing delays. These phenomena significantly challenge traditional monolithic and rule-based resource management approaches, which struggle to capture the non-linear performance characteristics and long-term effects of resource allocation decisions in such environments. To address these challenges, we propose DDRM, a two-stage predictor-decider collaborative framework for dynamic resource management in microservice systems. DDRM integrates deep learning to model inter-service interactions and predict the probability of Service Level Objective (SLO) violations, and employs reinforcement learning to optimize resource allocation decisions by maximizing long-term cumulative rewards while meeting SLO targets. Extensive evaluations demonstrate that DDRM outperforms state-of-the-art baselines by up to 29.8 %, while exhibiting strong stability and adaptability under highly varying workloads.
Author Li, Zhijun
Wang, Jin
Wang, Wanyou
Tang, Liangping
Shi, Gaotao
Author_xml – sequence: 1
  givenname: Liangping
  surname: Tang
  fullname: Tang, Liangping
  email: lptang@stu.suda.edu.cn
  organization: School of Future Science and Engineering,NEIC Laboratory, Soochow University,Soochow,China
– sequence: 2
  givenname: Jin
  surname: Wang
  fullname: Wang, Jin
  email: wjin1985@suda.edu.cn
  organization: School of Future Science and Engineering,NEIC Laboratory, Soochow University,Soochow,China
– sequence: 3
  givenname: Wanyou
  surname: Wang
  fullname: Wang, Wanyou
  email: wangwanyou@stu.hit.edu.cn
  organization: Harbin Institute of Technology,Faculty of Computing,Harbin,China
– sequence: 4
  givenname: Gaotao
  surname: Shi
  fullname: Shi, Gaotao
  email: shgt@tju.edu.cn
  organization: College of Intelligence and Computing, Tianjin University,Tianjin,China
– sequence: 5
  givenname: Zhijun
  surname: Li
  fullname: Li, Zhijun
  email: lizhijun_os@hit.edu.cn
  organization: Harbin Institute of Technology,Faculty of Computing,Harbin,China
BookMark eNo1j81Kw0AURkdRsK19AxeD-9R7ZyY_4640rQoJgbSuy8z0jkRNUibV0re3oK6-w1kc-Mbsqus7YuweYYYI-mFRvK43yzrWUomZABGfNWaJSsUFm-pUZ1JiLAF1dslGApMs0iKWN2w8DO8AMpWQjFiV53X5yOcdXxdVZI4mEM-J9jw_daZtHK9p6L-CI16azrxRS92Br4Jp6diHD-77wMvGhX6g8N04Gm7ZtTefA03_dsI2q-Vm8RwV1dPLYl5EjZaHyLp4t7PKCgEKTIJgNWYKSdozOJ9aVE54FNIBOUUEJpPKg0_EzsQWvZywu99sQ0TbfWhaE07b___yB6iOUj8
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CLUSTER59342.2025.11186472
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798331530198
EISSN 2168-9253
EndPage 12
ExternalDocumentID 11186472
Genre orig-research
GrantInformation_xml – fundername: National Key R&D Program of China
  grantid: 2023YFB4503100
  funderid: 10.13039/501100012166
– fundername: National Natural Science Foundation of China
  grantid: 62072321
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i93t-bc5ddb4b22040a610b91841e3bb91cf7b14c2f123c0ec4ee0a834f0f62da5b1f3
IEDL.DBID RIE
IngestDate Wed Nov 05 07:06:02 EST 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-bc5ddb4b22040a610b91841e3bb91cf7b14c2f123c0ec4ee0a834f0f62da5b1f3
PageCount 12
ParticipantIDs ieee_primary_11186472
PublicationCentury 2000
PublicationDate 2025-Sept.-2
PublicationDateYYYYMMDD 2025-09-02
PublicationDate_xml – month: 09
  year: 2025
  text: 2025-Sept.-2
  day: 02
PublicationDecade 2020
PublicationTitle Proceedings / IEEE International Conference on Cluster Computing
PublicationTitleAbbrev CLUSTER
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0037306
Score 2.302303
Snippet Loosely coupled microservice architectures have been widely adopted in cloud-native applications due to their inherent advantages in modularity, development...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Adaptation models
Cloud computing
Collaboration
Deep learning
deep learning for systems
Dynamic scheduling
Microservice architectures
microservices
resource efficiency
resource man-agement
Resource management
Scalability
Stability analysis
Thermal stability
Title DDRM: An SLO-aware Deep Dynamic Resource Management Framework for Microservices
URI https://ieeexplore.ieee.org/document/11186472
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV25TgMxELVIREEVjiAIh1zQOlkfe5gOZYkocokEKV20tsdSmk2UA34few8iCgo6y5Ily-PxPNvvzSD0FEVSszCTBEQoiVDWEiWkIeB_pTRl1pSWHsbjcbJYyGklVi-0MABQkM-g65vFX75Z64N_Kus5v0x8uvMGasRxVIq16mOXu60aVVlFaSB7_eHHzAHCUHLh9VYs7Najf9VRKcLIoPXPCZyj9lGQh6c_oeYCnUB-iVp1RQZcOegVmqTp--gZv-R4NpyQ7CvbAk4BNjgtC8_j-rUeH2kveFDzs7ADsHjkGXq76gRpo_ngdd5_I1XJBLKSfE-UDo1RQjHmfDNzyEhJd4OjwJVraBsrKjSzLljpALQACLKECxvYiJksVNTya9TM1zncIOxwo8MKSmZGSKG4SAxwq2LqRnElLb1Fbb8-y02ZFGNZL03nj_47dOatUNCz2D1q7rcHeECn-nO_2m0fC1N-A0Jmn1E
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwMhECZaTfRUHzW-5eCVdhfY7eLNdG1q3D5ia9Jbs8CQ9LJt-tC_L-zDxoMHb4SEhDAM8wHfN4PQYxgKRYNUEOCBIFwaQyQXmoD7lVI-NbqwdNIeDKLpVIxKsXquhQGAnHwGTdfM__L1Qm3dU1nL-mXk0p3vo4OAc-oVcq3q4GV2s4ZlXlHfE61O8jG2kDAQjDvFFQ2a1fhflVTyQNKt_3MKJ6ixk-Th0U-wOUV7kJ2helWTAZcueo6Gcfzef8LPGR4nQ5J-pSvAMcASx0XpeVy91-Md8QV3K4YWthAW9x1Hb12eIQ006b5MOj1SFk0gc8E2RKpAa8klpdY7U4uNpLB3OB-YtA1l2tLnihobrpQHigN4acS48UxIdRpI37ALVMsWGVwibJGjRQtSpJoLLhmPNDAj274dxaQw_hVquPWZLYu0GLNqaa7_6H9AR71JP5klr4O3G3TsLJKTtegtqm1WW7hDh-pzM1-v7nOzfgO7yaKY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+IEEE+International+Conference+on+Cluster+Computing&rft.atitle=DDRM%3A+An+SLO-aware+Deep+Dynamic+Resource+Management+Framework+for+Microservices&rft.au=Tang%2C+Liangping&rft.au=Wang%2C+Jin&rft.au=Wang%2C+Wanyou&rft.au=Shi%2C+Gaotao&rft.date=2025-09-02&rft.pub=IEEE&rft.eissn=2168-9253&rft.spage=1&rft.epage=12&rft_id=info:doi/10.1109%2FCLUSTER59342.2025.11186472&rft.externalDocID=11186472