Unleashing the True Potential of Semantic-Based Log Parsing with Pre-Trained Language Models

Software-intensive systems often produce console logs for troubleshooting purposes. Log parsing, which aims at parsing a log message into a specific log template, typically serves as the first step toward automated log analytics. To better comprehend the semantic information of log messages, many se...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings / International Conference on Software Engineering s. 975 - 987
Hlavní autoři:	Le, Van-Hoang, Xiao, Yi, Zhang, Hongyu
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 26.04.2025
Témata:	Business Chatbots Contrastive learning Large language models log analytics log parsing Optimization methods pre-trained LMs Scalability Semantics Software engineering
ISSN:	1558-1225
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Software-intensive systems often produce console logs for troubleshooting purposes. Log parsing, which aims at parsing a log message into a specific log template, typically serves as the first step toward automated log analytics. To better comprehend the semantic information of log messages, many semantic-based log parsers have been proposed. These log parsers fine-tune a small pre-trained language model (PLM) such as RoBERTa on a few labelled log samples. With the increasing popularity of large language models (LLMs), some recent studies also propose to leverage LLMs such as ChatGPT through in-context learning for automated log parsing and obtain better results than previous semantic-based log parsers with small PLMs. In this paper, we show that semantic-based log parsers with small PLMs can actually achieve better or comparable performance to state-of-the-art LLM-based log parsing models while being more efficient and cost-effective. We propose Unleash, a novel semantic-based log parsing approach, which incorporates three enhancement methods to boost the performance of PLMs for log parsing, including (1) an entropy-based ranking method to select the most informative log samples; (2) a contrastive learning method to enhance the fine-tuning process; and (3) an inference optimization method to improve the log parsing performance. We evaluate Unleash on a set of large-scale, public log datasets and the experimental results show that Unleash is effective and efficient compared to state-of-the-art log parsers.
ISSN:	1558-1225
DOI:	10.1109/ICSE55347.2025.00174