Pre-LogMGAE: Identification of Log Anomalies Using a Pre-Trained Masked Graph Autoencoder

Log-based anomaly detection in software systems is becoming increasingly crucial for monitoring network operations and ensuring system security. Deep learning-based methods are widely used for large-scale log anomaly detection due to their capacity to learn complex features. However, current researc...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings - Symposium on Reliable Distributed Systems pp. 294 - 306
Main Authors: Wu, Aming, Kwon, Young-Woo
Format: Conference Proceeding
Language:English
Published: IEEE 30.09.2024
Subjects:
ISSN:2575-8462
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Log-based anomaly detection in software systems is becoming increasingly crucial for monitoring network operations and ensuring system security. Deep learning-based methods are widely used for large-scale log anomaly detection due to their capacity to learn complex features. However, current research predominantly treats original logs as simple sequences, ignoring their complex structure and dynamic dependency relationships. Additionally, these methods often rely on extensive labeled data or domain-specific vectors to represent logs for model training, which can be labor-intensive to label manually and ineffective across various domains within a system. To address these challenges, this paper proposes Pre-LogMGAE, a universal masked graph autoencoder (GAE) framework with contrastive learning for self-supervised pre-training for log anomaly detection. In contrast to graph or link reconstruction, Pre-LogMGAE focuses on node feature reconstruction using a masking strategy to reduce the impact of excessive redundant information. Furthermore, we introduce Graph Attention Networks (GAT) with the Gated Recurrent Unit (GRU) to incorporate sequence modeling, allowing for capturing long-term and short-term dependencies in log events. We include contrastive learning objectives in finetuning to extract diverse features and enhance the algorithm's robustness. Through an extensive evaluation of three real-world datasets and specific case studies with configuration error, Pre-LogMGAE demonstrates superior performance compared to the six baselines, including PCA, IM, DeepLog, LogRobust, LogBERT, and DeepTraLog. This superiority is evident in terms of precision, recall, F1 score, and time efficiency, highlighting Pre-LogMGAE's stability and reliability in anomaly detection. The study aims to improve anomaly detection capabilities in multi-source system logs, offering innovative technical support to enhance system security and reliability.
ISSN:2575-8462
DOI:10.1109/SRDS64841.2024.00036