Unleashing the True Potential of Semantic-Based Log Parsing with Pre-Trained Language Models

Software-intensive systems often produce console logs for troubleshooting purposes. Log parsing, which aims at parsing a log message into a specific log template, typically serves as the first step toward automated log analytics. To better comprehend the semantic information of log messages, many se...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings / International Conference on Software Engineering S. 975 - 987
Hauptverfasser:	Le, Van-Hoang, Xiao, Yi, Zhang, Hongyu
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	IEEE 26.04.2025
Schlagworte:	Business Chatbots Contrastive learning Large language models log analytics log parsing Optimization methods pre-trained LMs Scalability Semantics Software engineering
ISSN:	1558-1225
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Software-intensive systems often produce console logs for troubleshooting purposes. Log parsing, which aims at parsing a log message into a specific log template, typically serves as the first step toward automated log analytics. To better comprehend the semantic information of log messages, many semantic-based log parsers have been proposed. These log parsers fine-tune a small pre-trained language model (PLM) such as RoBERTa on a few labelled log samples. With the increasing popularity of large language models (LLMs), some recent studies also propose to leverage LLMs such as ChatGPT through in-context learning for automated log parsing and obtain better results than previous semantic-based log parsers with small PLMs. In this paper, we show that semantic-based log parsers with small PLMs can actually achieve better or comparable performance to state-of-the-art LLM-based log parsing models while being more efficient and cost-effective. We propose Unleash, a novel semantic-based log parsing approach, which incorporates three enhancement methods to boost the performance of PLMs for log parsing, including (1) an entropy-based ranking method to select the most informative log samples; (2) a contrastive learning method to enhance the fine-tuning process; and (3) an inference optimization method to improve the log parsing performance. We evaluate Unleash on a set of large-scale, public log datasets and the experimental results show that Unleash is effective and efficient compared to state-of-the-art log parsers.
ISSN:	1558-1225
DOI:	10.1109/ICSE55347.2025.00174