Sage: Using Unsupervised Learning for Scalable Performance Debugging in Microservices
Saved in:
| Title: | Sage: Using Unsupervised Learning for Scalable Performance Debugging in Microservices |
|---|---|
| Authors: | Gan, Yu, Liang, Mingyu, Dev, Sundar, Lo, David, Delimitrou, Christina |
| Publisher Information: | 2021-01-01 |
| Document Type: | Electronic Resource |
| Abstract: | Cloud applications are increasingly shifting from large monolithic services to complex graphs of loosely-coupled microservices. Despite the advantages of modularity and elasticity microservices offer, they also complicate cluster management and performance debugging, as dependencies between tiers introduce backpressure and cascading QoS violations. We present Sage, a machine learning-driven root cause analysis system for interactive cloud microservices. Sage leverages unsupervised ML models to circumvent the overhead of trace labeling, captures the impact of dependencies between microservices to determine the root cause of unpredictable performance online, and applies corrective actions to recover a cloud service's QoS. In experiments on both dedicated local clusters and large clusters on Google Compute Engine we show that Sage consistently achieves over 93% accuracy in correctly identifying the root cause of QoS violations, and improves performance predictability. |
| Index Terms: | Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Performance, text |
| URL: | |
| Availability: | Open access content. Open access content |
| Other Numbers: | COO oai:arXiv.org:2101.00267 1269521132 |
| Contributing Source: | CORNELL UNIV From OAIster®, provided by the OCLC Cooperative. |
| Accession Number: | edsoai.on1269521132 |
| Database: | OAIster |
| Abstract: | Cloud applications are increasingly shifting from large monolithic services to complex graphs of loosely-coupled microservices. Despite the advantages of modularity and elasticity microservices offer, they also complicate cluster management and performance debugging, as dependencies between tiers introduce backpressure and cascading QoS violations. We present Sage, a machine learning-driven root cause analysis system for interactive cloud microservices. Sage leverages unsupervised ML models to circumvent the overhead of trace labeling, captures the impact of dependencies between microservices to determine the root cause of unpredictable performance online, and applies corrective actions to recover a cloud service's QoS. In experiments on both dedicated local clusters and large clusters on Google Compute Engine we show that Sage consistently achieves over 93% accuracy in correctly identifying the root cause of QoS violations, and improves performance predictability. |
|---|
Nájsť tento článok vo Web of Science