Unleashing the True Potential of Semantic-Based Log Parsing with Pre-Trained Language Models

Software-intensive systems often produce console logs for troubleshooting purposes. Log parsing, which aims at parsing a log message into a specific log template, typically serves as the first step toward automated log analytics. To better comprehend the semantic information of log messages, many se...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings / International Conference on Software Engineering s. 975 - 987
Hlavní autoři: Le, Van-Hoang, Xiao, Yi, Zhang, Hongyu
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 26.04.2025
Témata:
ISSN:1558-1225
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Software-intensive systems often produce console logs for troubleshooting purposes. Log parsing, which aims at parsing a log message into a specific log template, typically serves as the first step toward automated log analytics. To better comprehend the semantic information of log messages, many semantic-based log parsers have been proposed. These log parsers fine-tune a small pre-trained language model (PLM) such as RoBERTa on a few labelled log samples. With the increasing popularity of large language models (LLMs), some recent studies also propose to leverage LLMs such as ChatGPT through in-context learning for automated log parsing and obtain better results than previous semantic-based log parsers with small PLMs. In this paper, we show that semantic-based log parsers with small PLMs can actually achieve better or comparable performance to state-of-the-art LLM-based log parsing models while being more efficient and cost-effective. We propose Unleash, a novel semantic-based log parsing approach, which incorporates three enhancement methods to boost the performance of PLMs for log parsing, including (1) an entropy-based ranking method to select the most informative log samples; (2) a contrastive learning method to enhance the fine-tuning process; and (3) an inference optimization method to improve the log parsing performance. We evaluate Unleash on a set of large-scale, public log datasets and the experimental results show that Unleash is effective and efficient compared to state-of-the-art log parsers.
ISSN:1558-1225
DOI:10.1109/ICSE55347.2025.00174