Derm: SLA-aware Resource Management for Highly Dynamic Microservices
Ensuring efficient resource allocation while providing service level agreement (SLA) guarantees for end-to-end (E2E) latency is crucial for microservice applications. Although existing studies have made significant contributions towards achieving this objective, they primarily concentrate on static...
Uloženo v:
| Vydáno v: | 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) s. 424 - 436 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
29.06.2024
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Ensuring efficient resource allocation while providing service level agreement (SLA) guarantees for end-to-end (E2E) latency is crucial for microservice applications. Although existing studies have made significant contributions towards achieving this objective, they primarily concentrate on static graphs. However, microservice graphs are inherently dynamic during runtime in production environments, necessitating more effective and scalable resource management solutions.In this paper, we present Derm, a new resource management system designed for microservice applications with highly dynamic graphs. Our principal finding is that prioritizing different microservice graphs can lead to a substantial reduction in resource allocation. To take advantage of this opportunity, we develop three main components. The first is a performance model that describes uncertainties of microservice latency through a conditional exponential distribution. The second is a probabilistic quantification of the dynamics of microservice graphs. The third is an optimization method for adjusting the resource allocation of microservices to minimize resource usage. We evaluate Derm in our cluster using real microservice benchmarks and production traces. The results highlight that Derm reduces the resource usage by 68.4 \% and lowers SLA violation probability by 6.7 \times, compared to existing approaches. |
|---|---|
| DOI: | 10.1109/ISCA59077.2024.00039 |