Guidelines for Assessing the Accuracy of Log Message Template Identification Techniques

Log message template identification aims to convert raw logs containing free-formed log messages into structured logs to be processed by automated log-based analysis, such as anomaly detection and model inference. While many techniques have been proposed in the literature, only two recent studies pr...

Full description

Saved in:
Bibliographic Details
Published in:2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) pp. 1095 - 1106
Main Authors: Khan, Zanis Ali, Shin, Donghwan, Bianculli, Domenico, Briand, Lionel
Format: Conference Proceeding
Language:English
Published: ACM 01.05.2022
Subjects:
ISSN:1558-1225
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Log message template identification aims to convert raw logs containing free-formed log messages into structured logs to be processed by automated log-based analysis, such as anomaly detection and model inference. While many techniques have been proposed in the literature, only two recent studies provide a comprehensive evaluation and comparison of the techniques using an established benchmark composed of real-world logs. Nevertheless, we argue that both studies have the following issues: (1) they used different accuracy metrics without comparison between them, (2) some ground-truth (oracle) templates are incorrect, and (3) the accuracy evaluation results do not provide any information regarding incorrectly identified templates. In this paper, we address the above issues by providing three guidelines for assessing the accuracy of log template identification techniques: (1) use appropriate accuracy metrics, (2) perform oracle template correction, and (3) perform analysis of incorrect templates. We then assess the application of such guidelines through a comprehensive evaluation of 14 existing template identification techniques on the established benchmark logs. Results show very different insights than existing studies and in particular a much less optimistic outlook on existing techniques.
AbstractList Log message template identification aims to convert raw logs containing free-formed log messages into structured logs to be processed by automated log-based analysis, such as anomaly detection and model inference. While many techniques have been proposed in the literature, only two recent studies provide a comprehensive evaluation and comparison of the techniques using an established benchmark composed of real-world logs. Nevertheless, we argue that both studies have the following issues: (1) they used different accuracy metrics without comparison between them, (2) some ground-truth (oracle) templates are incorrect, and (3) the accuracy evaluation results do not provide any information regarding incorrectly identified templates. In this paper, we address the above issues by providing three guidelines for assessing the accuracy of log template identification techniques: (1) use appropriate accuracy metrics, (2) perform oracle template correction, and (3) perform analysis of incorrect templates. We then assess the application of such guidelines through a comprehensive evaluation of 14 existing template identification techniques on the established benchmark logs. Results show very different insights than existing studies and in particular a much less optimistic outlook on existing techniques.
Author Khan, Zanis Ali
Shin, Donghwan
Briand, Lionel
Bianculli, Domenico
Author_xml – sequence: 1
  givenname: Zanis Ali
  surname: Khan
  fullname: Khan, Zanis Ali
  email: zanis-ali.khan@uni.lu
  organization: University of Luxembourg Luxembourg,Luxembourg
– sequence: 2
  givenname: Donghwan
  surname: Shin
  fullname: Shin, Donghwan
  email: donghwan.shin@uni.lu
  organization: University of Luxembourg Luxembourg,Luxembourg
– sequence: 3
  givenname: Domenico
  surname: Bianculli
  fullname: Bianculli, Domenico
  email: domenico.bianculli@uni.lu
  organization: University of Luxembourg Luxembourg,Luxembourg
– sequence: 4
  givenname: Lionel
  surname: Briand
  fullname: Briand, Lionel
  email: lionel.briand@uni.lu
  organization: University of Luxembourg Luxembourg, Luxembourg University of Ottawa,Ottawa,Canada
BookMark eNotjD1PwzAURQ0CibZ0ZmDxH0ix4_hrjCoolYJYihgr1-85NWqdEqdD_z1BoDsc6Z6rOyU3qUtIyANnC84r-SQkZ4yJxS_HXJG51WYUTNiy5PyaTLiUpuBlKe_INOevca0qayfkc3WOgIeYMNPQ9bTOGXOOqaXDHmnt_bl3_kK7QJuupW-jcy3SDR5PBzcgXQOmIYbo3RC7NPZ-n-L3GfM9uQ3ukHH-zxn5eHneLF-L5n21XtZN4QRXQyGYBK4UBB6kcaDAeA475gTTFXIjjZfGBqeBQQAw1isLTCvnK7mToEHMyOPfb0TE7amPR9dftlZbYbUSPxm0U3k
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
ESBDL
RIE
RIO
DOI 10.1145/3510003.3510101
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore Open Access Journals
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781450392211
1450392210
EISSN 1558-1225
EndPage 1106
ExternalDocumentID 9793976
Genre orig-research
GroupedDBID -~X
.4S
.DC
123
23M
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
AFFNX
ALMA_UNASSIGNED_HOLDINGS
APO
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
ESBDL
FEDTE
I-F
I07
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
XOL
ID FETCH-LOGICAL-a316t-305d166df1f58ad6d8c1db0a3074e1858c589fa7d0dfdd89c69d076ac45b5d7d3
IEDL.DBID RIE
ISICitedReferencesCount 46
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000832185400089&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:28:32 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a316t-305d166df1f58ad6d8c1db0a3074e1858c589fa7d0dfdd89c69d076ac45b5d7d3
OpenAccessLink https://ieeexplore.ieee.org/document/9793976
PageCount 12
ParticipantIDs ieee_primary_9793976
PublicationCentury 2000
PublicationDate 2022-May
PublicationDateYYYYMMDD 2022-05-01
PublicationDate_xml – month: 05
  year: 2022
  text: 2022-May
PublicationDecade 2020
PublicationTitle 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)
PublicationTitleAbbrev ICSE
PublicationYear 2022
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0006499
ssj0002871777
Score 2.4896157
Snippet Log message template identification aims to convert raw logs containing free-formed log messages into structured logs to be processed by automated log-based...
SourceID ieee
SourceType Publisher
StartPage 1095
SubjectTerms Analytical models
Anomaly detection
Benchmark testing
Guidelines
logs
Measurement
metrics
Software engineering
template identification
Title Guidelines for Assessing the Accuracy of Log Message Template Identification Techniques
URI https://ieeexplore.ieee.org/document/9793976
WOSCitedRecordID wos000832185400089&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQNTgRbxLQ-MpI1J_DUiRGEoVYciulWOz66QoKnaBol_j52khYGFKVEkS5Hjyz2f770HcO1SlmRosihLdRKl1PiQUpZH0m8_kFPULCuJwkMxGsnpVI0bcLPjwlhry-Yz2wu35Vk-5qYIpbK-8ovJp88mNIUQFVdrV08JyL-Utqv_wtxD-VrKh6asn7BQyE564UqDA8wvL5UylQza_3uJA-j-cPLIeJdtDqFhF0fQ3poykDpGO_D6WATlqtDNTjwgJdWprh9CPNQjd8YUK22-SO7IMJ-T5-CAMrdkYj-W7x52koq46-pKHplsJV7XXXgZPEzun6LaPSHSCeWbyAcyUs7RUcekRo7SUMxi7YM6tT5LS8OkclpgjA5RKsMVxoJrk7KMocDkGFqLfGFPgAjjt7GKY2adSBl1ynkY4AeyWGtjOJ5CJ8zTbFkJZMzqKTr7-_E57N8GDkHZNXgBrc2qsJewZz43b-vVVflVvwFJz6Qu
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH5BNNETKhh_24NHB5S13XY0RsQ4CIcZuZGuryUmyggwE_97222iBy-etixpsnR9e19f3_d9ANeGcT9FlXopk77HqLIhFWnhhXb7gYKi5GlBFI6D0SicTKJxDW42XBitddF8ptvutjjLx0zlrlTWiexisulzC7Y5Yz1asrU2FRWH_Qtxu-o_LCyYr8R8KOMdn7tStt92V-o8YH65qRTJpN_432vsQ-uHlUfGm3xzADU9P4TGty0DqaK0CS8PudOucv3sxEJSUp7r2iHEgj1yq1S-lOqTZIbE2YwMnQfKTJNEvy_eLPAkJXXXVLU8knyLvK5a8Ny_T-4GXuWf4EmfirVnQxmpEGio4aFEgaGimHalDWumbZ4OFQ8jIwPsokEMIyUi7AZCKsZTjgH6R1CfZ3N9DCRQdiMbCUy1CRinJjIWCNiBvCulUgJPoOnmabooJTKm1RSd_v34CnYHyTCexo-jpzPY6zlGQdFDeA719TLXF7CjPtavq-Vl8YW_AAJop3U
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE%2FACM+44th+International+Conference+on+Software+Engineering+%28ICSE%29&rft.atitle=Guidelines+for+Assessing+the+Accuracy+of+Log+Message+Template+Identification+Techniques&rft.au=Khan%2C+Zanis+Ali&rft.au=Shin%2C+Donghwan&rft.au=Bianculli%2C+Domenico&rft.au=Briand%2C+Lionel&rft.date=2022-05-01&rft.pub=ACM&rft.eissn=1558-1225&rft.spage=1095&rft.epage=1106&rft_id=info:doi/10.1145%2F3510003.3510101&rft.externalDocID=9793976