Pre-LogMGAE: Identification of Log Anomalies Using a Pre-Trained Masked Graph Autoencoder

Log-based anomaly detection in software systems is becoming increasingly crucial for monitoring network operations and ensuring system security. Deep learning-based methods are widely used for large-scale log anomaly detection due to their capacity to learn complex features. However, current researc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings - Symposium on Reliable Distributed Systems S. 294 - 306
Hauptverfasser: Wu, Aming, Kwon, Young-Woo
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 30.09.2024
Schlagworte:
ISSN:2575-8462
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Log-based anomaly detection in software systems is becoming increasingly crucial for monitoring network operations and ensuring system security. Deep learning-based methods are widely used for large-scale log anomaly detection due to their capacity to learn complex features. However, current research predominantly treats original logs as simple sequences, ignoring their complex structure and dynamic dependency relationships. Additionally, these methods often rely on extensive labeled data or domain-specific vectors to represent logs for model training, which can be labor-intensive to label manually and ineffective across various domains within a system. To address these challenges, this paper proposes Pre-LogMGAE, a universal masked graph autoencoder (GAE) framework with contrastive learning for self-supervised pre-training for log anomaly detection. In contrast to graph or link reconstruction, Pre-LogMGAE focuses on node feature reconstruction using a masking strategy to reduce the impact of excessive redundant information. Furthermore, we introduce Graph Attention Networks (GAT) with the Gated Recurrent Unit (GRU) to incorporate sequence modeling, allowing for capturing long-term and short-term dependencies in log events. We include contrastive learning objectives in finetuning to extract diverse features and enhance the algorithm's robustness. Through an extensive evaluation of three real-world datasets and specific case studies with configuration error, Pre-LogMGAE demonstrates superior performance compared to the six baselines, including PCA, IM, DeepLog, LogRobust, LogBERT, and DeepTraLog. This superiority is evident in terms of precision, recall, F1 score, and time efficiency, highlighting Pre-LogMGAE's stability and reliability in anomaly detection. The study aims to improve anomaly detection capabilities in multi-source system logs, offering innovative technical support to enhance system security and reliability.
ISSN:2575-8462
DOI:10.1109/SRDS64841.2024.00036