SLIM: a Scalable and Interpretable Light-weight Fault Localization Algorithm for Imbalanced Data in Microservice

In real-world microservice systems, the newly deployed service -one kind of change service, could lead to a new type of minority fault. Existing state-of-the-art (SOTA) methods for fault localization rarely consider the imbalanced fault classification in change service. This paper proposes a novel m...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE/ACM International Conference on Automated Software Engineering : [proceedings] s. 27 - 39
Hlavní autoři: Ren, Rui, Yang, Jingbang, Yang, Linxiao, Gu, Xinyue, Sun, Liang
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 27.10.2024
Témata:
ISSN:2643-1572
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In real-world microservice systems, the newly deployed service -one kind of change service, could lead to a new type of minority fault. Existing state-of-the-art (SOTA) methods for fault localization rarely consider the imbalanced fault classification in change service. This paper proposes a novel method that utilizes decision rule sets to deal with highly imbalanced data by optimizing the F1 score subject to cardinality constraints. The proposed method greedily generates the rule with maximal marginal gain and uses an efficient minorize-maximization (MM) approach to select rules iteratively, maximizing a non-monotone submodular lower bound. Compared with existing fault localization algorithms, our algorithm can adapt to the imbalanced fault scenario of change service, and provide interpretable fault causes which are easy to understand and verify. Our method can also be deployed in the online training setting, with only about 15% training overhead compared to the current SOTA methods. Empirical studies demonstrate the superior performance of our algorithm to existing fault localization algorithms in terms of both accuracy and model interpretability.CCS Concepts* Software and its engineering → Reliability, Debugging.
ISSN:2643-1572
DOI:10.1145/3691620.3694984