Perfce: Performance Debugging on Databases with Chaos Engineering-Enhanced Causality Analysis

Debugging performance anomalies in databases is challenging. Causal inference techniques enable qualitative and quantitative root cause analysis of performance downgrades. Nevertheless, causality analysis is challenging in practice, particularly due to limited observability. Recently, chaos engineer...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM International Conference on Automated Software Engineering : [proceedings] S. 1454 - 1466
Hauptverfasser:	Ji, Zhenlan, Ma, Pingchuan, Wang, Shuai
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 11.09.2023
Schlagworte:	Causality Analysis Chaos Chaos Engineering Debugging Distributed databases Observability Performance Debugging Root cause analysis Software systems Software testing
ISSN:	2643-1572
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Debugging performance anomalies in databases is challenging. Causal inference techniques enable qualitative and quantitative root cause analysis of performance downgrades. Nevertheless, causality analysis is challenging in practice, particularly due to limited observability. Recently, chaos engineering (CE) has been applied to test complex software systems. CE frameworks mutate chaos variables to inject catastrophic events (e.g., network slowdowns) to stress-test these software systems. The systems under chaos stress are then tested (e.g., via differential testing) to check if they retain normal functionality, such as returning correct SQL query outputs even under stress. To date, CE is mainly employed to aid software testing. This paper identifies the novel usage of CE in diagnosing performance anomalies in databases. Our framework, PERFCE, has two phases - offline and online. The offline phase learns statistical models of a database using both passive observations and proactive chaos experiments. The online phase diagnoses the root cause of performance anomalies from both qualitative and quantitative aspects on-the-fly. In evaluation, Perfce outperformed previous works on synthetic datasets and is highly accurate and moderately expensive when analyzing real-world (distributed) databases like MySQL and TiDB.
ISSN:	2643-1572
DOI:	10.1109/ASE56229.2023.00106