LibreLog: Accurate and Efficient Unsupervised Log Parsing Using Open-Source Large Language Models

Log parsing is a critical step that transforms unstructured log data into structured formats, facilitating subsequent log-based analysis. Traditional syntax-based log parsers are efficient and effective, but they often experience decreased accuracy when processing logs that deviate from the predefin...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings / International Conference on Software Engineering pp. 924 - 936
Main Authors: Ma, Zeyang, Kim, Dong Jae, Chen, Tse-Hsun Peter
Format: Conference Proceeding
Language:English
Published: IEEE 26.04.2025
Subjects:
ISSN:1558-1225
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Log parsing is a critical step that transforms unstructured log data into structured formats, facilitating subsequent log-based analysis. Traditional syntax-based log parsers are efficient and effective, but they often experience decreased accuracy when processing logs that deviate from the predefined rules. Recently, large language models (LLM) based log parsers have shown superior parsing accuracy. However, existing LLM-based parsers face three main challenges: 1) time-consuming and labor-intensive manual labeling for fine-tuning or in-context learning, 2) increased parsing costs due to the vast volume of log data and limited context size of LLMs, and 3) privacy risks from using commercial models like ChatGPT with sensitive log information. To overcome these limitations, this paper introduces LibreLog, an unsupervised log parsing approach that leverages open-source LLMs (i.e., Llama3-8B) to enhance privacy and reduce operational costs while achieving state-of-the-art parsing accuracy. LibreLog first groups logs with similar static text but varying dynamic variables using a fixed-depth grouping tree. It then parses logs within these groups using three components: i) similarity scoring-based retrieval augmented generation: selects diverse logs within each group based on Jaccard similarity, helping the LLM distinguish between static text and dynamic variables; ii) self-reflection: iteratively query LLMs to refine log templates to improve parsing accuracy; and iii) log template memory: stores parsed templates to reduce LLM queries for improved parsing efficiency. Our evaluation on LogHub-2.0 shows that LibreLog achieves 25% higher parsing accuracy and processes logs 2.7 times faster compared to state-of-the-art LLM-based parsers. In short, LibreLog addresses privacy and cost concerns of using commercial LLMs while achieving state-of-the-arts parsing efficiency and accuracy.
AbstractList Log parsing is a critical step that transforms unstructured log data into structured formats, facilitating subsequent log-based analysis. Traditional syntax-based log parsers are efficient and effective, but they often experience decreased accuracy when processing logs that deviate from the predefined rules. Recently, large language models (LLM) based log parsers have shown superior parsing accuracy. However, existing LLM-based parsers face three main challenges: 1) time-consuming and labor-intensive manual labeling for fine-tuning or in-context learning, 2) increased parsing costs due to the vast volume of log data and limited context size of LLMs, and 3) privacy risks from using commercial models like ChatGPT with sensitive log information. To overcome these limitations, this paper introduces LibreLog, an unsupervised log parsing approach that leverages open-source LLMs (i.e., Llama3-8B) to enhance privacy and reduce operational costs while achieving state-of-the-art parsing accuracy. LibreLog first groups logs with similar static text but varying dynamic variables using a fixed-depth grouping tree. It then parses logs within these groups using three components: i) similarity scoring-based retrieval augmented generation: selects diverse logs within each group based on Jaccard similarity, helping the LLM distinguish between static text and dynamic variables; ii) self-reflection: iteratively query LLMs to refine log templates to improve parsing accuracy; and iii) log template memory: stores parsed templates to reduce LLM queries for improved parsing efficiency. Our evaluation on LogHub-2.0 shows that LibreLog achieves 25% higher parsing accuracy and processes logs 2.7 times faster compared to state-of-the-art LLM-based parsers. In short, LibreLog addresses privacy and cost concerns of using commercial LLMs while achieving state-of-the-arts parsing efficiency and accuracy.
Author Ma, Zeyang
Chen, Tse-Hsun Peter
Kim, Dong Jae
Author_xml – sequence: 1
  givenname: Zeyang
  surname: Ma
  fullname: Ma, Zeyang
  email: m_zeyang@encs.concordia.ca
  organization: Concordia University,Software PErformance, Analysis and Reliability (SPEAR) Lab,Montreal,Quebec,Canada
– sequence: 2
  givenname: Dong Jae
  surname: Kim
  fullname: Kim, Dong Jae
  email: djaekim086@gmail.com
  organization: DePaul University,Chicago,Illinois,USA
– sequence: 3
  givenname: Tse-Hsun Peter
  surname: Chen
  fullname: Chen, Tse-Hsun Peter
  email: peterc@encs.concordia.ca
  organization: Concordia University,Software PErformance, Analysis and Reliability (SPEAR) Lab,Montreal,Quebec,Canada
BookMark eNotkMlOwzAYhA0Cibb0DXrwC6R4X7hVVSmVgopUeq6c-E9kVJzKTpB4e8Jy-WYOM3OYKbqJXQSEFpQsKSX2Ybc-bKTkQi8ZYXJJCCX8Cs2ttoZzKolUll6jCZXSFJQxeYemOb8TQpSwdoJcGaoEZdc-4lVdD8n1gF30eNM0oQ4Qe3yMebhA-gwZPB6D-NWlHGKLj7_cXyAWh25INeDSpfaHsR3caF46D-d8j24bd84w_9cZOj5t3tbPRbnf7tarsnBMkb7QWjWceaWVklxSY4RwljTCeCOtd8pYpg0Y5rlTYJlQglSsdrUWla2qRvAZWvztBgA4XVL4cOnrNH7E7Fjl3wfiVoI
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICSE55347.2025.00103
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798331505691
EISSN 1558-1225
EndPage 936
ExternalDocumentID 11029927
Genre orig-research
GroupedDBID -~X
.4S
.DC
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
FEDTE
I-F
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-a260t-776f32d676653518844a90f48d859da689278e82d3a6e924640b2cac74b9bbf43
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001538318100072&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 01:40:13 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a260t-776f32d676653518844a90f48d859da689278e82d3a6e924640b2cac74b9bbf43
PageCount 13
ParticipantIDs ieee_primary_11029927
PublicationCentury 2000
PublicationDate 2025-April-26
PublicationDateYYYYMMDD 2025-04-26
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-April-26
  day: 26
PublicationDecade 2020
PublicationTitle Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev ICSE
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0006499
Score 2.2896988
Snippet Log parsing is a critical step that transforms unstructured log data into structured formats, facilitating subsequent log-based analysis. Traditional...
SourceID ieee
SourceType Publisher
StartPage 924
SubjectTerms Accuracy
Costs
large language model
Large language models
log analysis
log parsing
Memory management
Privacy
Retrieval augmented generation
Software engineering
Solid modeling
Syntactics
Transforms
Title LibreLog: Accurate and Efficient Unsupervised Log Parsing Using Open-Source Large Language Models
URI https://ieeexplore.ieee.org/document/11029927
WOSCitedRecordID wos001538318100072&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1La8JAEF6q9NCTfVj6Zg-9piabfSS9FVFaEBGs4E02-yiFEsUYf39n1mh76aGXEMKSZXfYee183xDyCDaVp4VyUaJ8EXHmfQReNYMTz31utTPaBZ7ZkRqPs_k8nzRg9YCFcc6F4jP3hK_hLt8uTY2psh6YKtCeTLVISym1A2sd1K4E373BxiVx3nvrTwdCpFxBDMgwb5JgX6xfHVSCARl2_jn1Ken-QPHo5GBkzsiRK89JZ9-LgTZH84JoBCC40fLjmb4YUyMBBNWlpYNAEQH_prOyqleoGSpnKQykEx0SBTRUDVCsLImmIZVPR1geDs9dKpNiv7Svqktmw8F7_zVq2idEGoKUDfjN0qfMSiWlSJF3jXOdx55nNhMgBpnBcjKXMZtq6SAMkzwumNFG8SIvCs_TS9Iul6W7IjTnwsdGCo3WXFmmEZ4aF4n13mRCmGvSxS1brHYMGYv9bt388f2WnKBU8FaGyTvS3qxrd0-OzXbzWa0fgly_ATxBo5I
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFA46BX2al4l38-BrXZvm0vomY2PDOgbbYG8jzUUE6ca6-vvNybrpiw--lFJCQ3LIueV830Ho0dlUGufCBJGweUCJtYHzqok78dSmWholjeeZzcRwmMxm6agGq3ssjDHGF5-ZJ3j1d_l6oSpIlbWdqXLak4h9dMAoJdEGrrVTvNx57zU6LgrT9qAz7jIWU-GiQAKZkwg6Y_3qoeJNSK_5z8lPUOsHjIdHOzNzivZMcYaa224MuD6c50gCBMFki_dn_KJUBRQQWBYadz1JhPs3nhZltQTdUBqN3UA8kj5VgH3dAIbakmDsk_k4gwJx99wkMzF0TPssW2ja6046_aBuoBBIF6asnefMbUw0F5yzGJjXKJVpaGmiE-YEwRO3nMQkRMeSGxeIcRrmREklaJ7muaXxBWoUi8JcIpxSZkPFmQR7LjSRAFAN80hbqxLG1BVqwZbNlxuOjPl2t67_-P6AjvqTt2yeDYavN-gYJAR3NITfosZ6VZk7dKi-1h_l6t7L-Bvtu6bZ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=LibreLog%3A+Accurate+and+Efficient+Unsupervised+Log+Parsing+Using+Open-Source+Large+Language+Models&rft.au=Ma%2C+Zeyang&rft.au=Kim%2C+Dong+Jae&rft.au=Chen%2C+Tse-Hsun+Peter&rft.date=2025-04-26&rft.pub=IEEE&rft.eissn=1558-1225&rft.spage=924&rft.epage=936&rft_id=info:doi/10.1109%2FICSE55347.2025.00103&rft.externalDocID=11029927