Improving anomaly detection in software logs through hybrid language modeling and reduced reliance on parser
Anomaly detection in software logs is crucial for development and maintenance, allowing timely identification of system failures and ensuring normal operations. Although recent deep learning advancements in log anomaly detection have shown exceptional performance, the reliance on time-consuming log...
Saved in:
| Published in: | Automated software engineering Vol. 33; no. 1; p. 12 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
Springer US
01.06.2026
Springer Nature B.V |
| Subjects: | |
| ISSN: | 0928-8910, 1573-7535 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Anomaly detection in software logs is crucial for development and maintenance, allowing timely identification of system failures and ensuring normal operations. Although recent deep learning advancements in log anomaly detection have shown exceptional performance, the reliance on time-consuming log parsers raises concerns about their necessity for quickly identifying anomalies. Standardized preprocessing methods can mishandle or lose important information. Additionally, the significant imbalance between normal and anomalous log data, along with the scarcity of labeled data, presents a persistent challenge in anomaly detection. We first evaluated the impact of omitting a log parser on anomaly detection models. Subsequently, we propose LogRoBERTa, an innovative anomaly detection model that eliminates the need for a parser. LogRoBERTa creates a stable and diverse labeled training set using the Determinantal Point Process (DPP) method, needing only a small amount of labeled data. The hybrid language model is based on RoBERTa’s architecture, combined with an attention-based BiLSTM. This setup leverages RoBERTa’s strong contextual understanding and BiLSTM’s capability to capture sequential dependencies, enhancing performance in complex log sequences. Experiments on four widely used datasets demonstrate that LogRoBERTa outperforms state-of-the-art benchmark models—including three fully supervised approaches—without relying on a dedicated log parser. Furthermore, its consistently strong performance on low-resource datasets highlights its robustness and generalizability across varying data conditions. These results validate the overall effectiveness of LogRoBERTa’s design and offer a thorough evaluation of the implications of bypassing a log parser. Additionally, our ablation studies and training set construction experiments further confirm the contributions of each individual component to the model’s performance. The study empirically validated that a RoBERTa-based approach effectively handles software log anomaly detection in long and complex log sequences, providing a more efficient and robust solution for omitting a parser compared to existing models. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0928-8910 1573-7535 |
| DOI: | 10.1007/s10515-025-00548-y |