Derm: SLA-aware Resource Management for Highly Dynamic Microservices

Ensuring efficient resource allocation while providing service level agreement (SLA) guarantees for end-to-end (E2E) latency is crucial for microservice applications. Although existing studies have made significant contributions towards achieving this objective, they primarily concentrate on static...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) s. 424 - 436
Hlavní autoři: Chen, Liao, Luo, Shutian, Lin, Chenyu, Mo, Zizhao, Xu, Huanle, Ye, Kejiang, Xu, Chengzhong
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 29.06.2024
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Ensuring efficient resource allocation while providing service level agreement (SLA) guarantees for end-to-end (E2E) latency is crucial for microservice applications. Although existing studies have made significant contributions towards achieving this objective, they primarily concentrate on static graphs. However, microservice graphs are inherently dynamic during runtime in production environments, necessitating more effective and scalable resource management solutions.In this paper, we present Derm, a new resource management system designed for microservice applications with highly dynamic graphs. Our principal finding is that prioritizing different microservice graphs can lead to a substantial reduction in resource allocation. To take advantage of this opportunity, we develop three main components. The first is a performance model that describes uncertainties of microservice latency through a conditional exponential distribution. The second is a probabilistic quantification of the dynamics of microservice graphs. The third is an optimization method for adjusting the resource allocation of microservices to minimize resource usage. We evaluate Derm in our cluster using real microservice benchmarks and production traces. The results highlight that Derm reduces the resource usage by 68.4 \% and lowers SLA violation probability by 6.7 \times, compared to existing approaches.
DOI:10.1109/ISCA59077.2024.00039