A Search-Based Approach for Accurate Identification of Log Message Formats

Many software engineering activities process the events contained in log files. However, before performing any processing activity, it is necessary to parse the entries in a log file, to retrieve the actual events recorded in the log. Each event is denoted by a log message, which is composed of a fi...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC) s. 167 - 16710
Hlavní autoři: Messaoudi, Salma, Panichella, Annibale, Bianculli, Domenico, Briand, Lionel, Sasnauskas, Raimondas
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 28.05.2018
Témata:
ISSN:2643-7171
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Many software engineering activities process the events contained in log files. However, before performing any processing activity, it is necessary to parse the entries in a log file, to retrieve the actual events recorded in the log. Each event is denoted by a log message, which is composed of a fixed part-called (event) template-that is the same for all occurrences of the same event type, and a variable part, which may vary with each event occurrence. The formats of log messages, in complex and evolving systems, have numerous variations, are typically not entirely known, and change on a frequent basis; therefore, they need to be identified automatically. The log message format identification problem deals with the identification of the different templates used in the messages of a log. Any solution to this problem has to generate templates that meet two main goals: generating templates that are not too general, so as to distinguish different events, but also not too specific, so as not to consider different occurrences of the same event as following different templates; however, these goals are conflicting. In this paper, we present the MoLFI approach, which recasts the log message identification problem as a multi-objective problem. MoLFI uses an evolutionary approach to solve this problem, by tailoring the NSGA-II algorithm to search the space of solutions for a Pareto optimal set of message templates. We have implemented MoLFI in a tool, which we have evaluated on six real-world datasets, containing log files with a number of entries ranging from 2K to 300K. The experiments results show that MoLFI extracts by far the highest number of correct log message templates, significantly outperforming two state-of-the-art approaches on all datasets.
AbstractList Many software engineering activities process the events contained in log files. However, before performing any processing activity, it is necessary to parse the entries in a log file, to retrieve the actual events recorded in the log. Each event is denoted by a log message, which is composed of a fixed part-called (event) template-that is the same for all occurrences of the same event type, and a variable part, which may vary with each event occurrence. The formats of log messages, in complex and evolving systems, have numerous variations, are typically not entirely known, and change on a frequent basis; therefore, they need to be identified automatically. The log message format identification problem deals with the identification of the different templates used in the messages of a log. Any solution to this problem has to generate templates that meet two main goals: generating templates that are not too general, so as to distinguish different events, but also not too specific, so as not to consider different occurrences of the same event as following different templates; however, these goals are conflicting. In this paper, we present the MoLFI approach, which recasts the log message identification problem as a multi-objective problem. MoLFI uses an evolutionary approach to solve this problem, by tailoring the NSGA-II algorithm to search the space of solutions for a Pareto optimal set of message templates. We have implemented MoLFI in a tool, which we have evaluated on six real-world datasets, containing log files with a number of entries ranging from 2K to 300K. The experiments results show that MoLFI extracts by far the highest number of correct log message templates, significantly outperforming two state-of-the-art approaches on all datasets.
Author Sasnauskas, Raimondas
Messaoudi, Salma
Briand, Lionel
Panichella, Annibale
Bianculli, Domenico
Author_xml – sequence: 1
  givenname: Salma
  surname: Messaoudi
  fullname: Messaoudi, Salma
  organization: University of Luxembourg
– sequence: 2
  givenname: Annibale
  surname: Panichella
  fullname: Panichella, Annibale
  organization: University of Luxembourg
– sequence: 3
  givenname: Domenico
  surname: Bianculli
  fullname: Bianculli, Domenico
  organization: University of Luxembourg
– sequence: 4
  givenname: Lionel
  surname: Briand
  fullname: Briand, Lionel
  organization: University of Luxembourg
– sequence: 5
  givenname: Raimondas
  surname: Sasnauskas
  fullname: Sasnauskas, Raimondas
  organization: SES
BookMark eNotjsFOwzAQRA0Cibb0zIGLfyDFWzve5BgqCkVBHIBztXHWbRCNKzsc-PtGgObwpJHeaKbiog89C3EDagFg8jsNpdVLWPzSqDMxHVulcwRTnIvJ0hqdISBciXlKn0opvVTaIE7EcyXfmKLbZ_eUuJXV8RgDub30IcrKue9IA8tNy_3Q-c7R0IVeBi_rsJMvnBLtWK5DPNCQrsWlp6_E83_OxMf64X31lNWvj5tVVWekQQ9Z7jy71ueWjC3Hs7bR4NBY9gWjZyoasDk2iGTIU2vINGh9g8CuHEXUM3H7t9sx8_YYuwPFn21RolZjTuKgTmM
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3196321.3196340
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL) - NZ
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1450357148
9781450357142
EISSN 2643-7171
EndPage 16710
ExternalDocumentID 8973030
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-a313t-5cfecdf56a4693406b31c746ef8e7fea8b1657b77a4afad4a4b76fb71ec9ecd73
IEDL.DBID RIE
ISICitedReferencesCount 117
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000555427300017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:26:43 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a313t-5cfecdf56a4693406b31c746ef8e7fea8b1657b77a4afad4a4b76fb71ec9ecd73
PageCount 16544
ParticipantIDs ieee_primary_8973030
PublicationCentury 2000
PublicationDate 2018-05-28
PublicationDateYYYYMMDD 2018-05-28
PublicationDate_xml – month: 05
  year: 2018
  text: 2018-05-28
  day: 28
PublicationDecade 2010
PublicationTitle 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC)
PublicationTitleAbbrev ICPC
PublicationYear 2018
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0003203477
ssj0002869941
Score 2.482794
Snippet Many software engineering activities process the events contained in log files. However, before performing any processing activity, it is necessary to parse...
SourceID ieee
SourceType Publisher
StartPage 167
SubjectTerms Distance measurement
log analysis
log message format
log parsing
NSGA-II
Pareto optimization
Search problems
Software engineering
Title A Search-Based Approach for Accurate Identification of Log Message Formats
URI https://ieeexplore.ieee.org/document/8973030
WOSCitedRecordID wos000555427300017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFLVKxcBUoEVQHvLAiNv6EdsZC6JCqFQdAHWrbOe6YmlQH3w_tpMWBhamOJEcxY6Te3ztcw5Ct1Q4XXhDSZabARFSMmKtEAQioLCFsCyZ9r2P1WSiZ7N82kB3ey4MAKTNZ9CLxbSWX5RuG1NlfZ2H8cjDBP1AKVlxtfb5FKZlvuNkxnPOBlwoVav5UJH102BjtJeOMdnxy04lRZNR63_PcYw6P7Q8PN0HnBPUgOUpau18GXD9mbbR8xBXu4jJfYhRBR7WuuE4AFQ8dG4b5SFwRdH1dc4Olx6PywV-iZYoC8CjhGXXHfQ2enx9eCK1ZwIxnPINyZwHV_hMmjDvDa2UllOnhASvQXkw2lKZKauUEcabQhhhlfRWUXB5qKj4GWouyyWcI2wC-DbahZ-holGFzdpwIx8AmOQy18AuUDt2zfyzksWY173S_fvyJToKWEPHhXemr1Bzs9rCNTp0X5uP9eomvctv7r2d4g
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8MgGP2yTBM9Td2Mv-XgUbZRKNDjNC5Tu2WHaXZbgMLiZTP74d8vsG568OKptAlNobTf44P3HsAdYUYWThGcZqqNGecJ1poxbAOg0AXTSTTte8_FYCDH42xYgfsdF8ZaGzef2WYoxrX8Ym7WIVXWkpkfj9RP0PeCc1bJ1tplVBLJsy0rM5zTpE2ZEKWeD2FpKw63hDTjMaQ7fhmqxHjSrf3vSY6g8UPMQ8NdyDmGip2dQG3rzIDKD7UOLx202UeMH3yUKlCnVA5HHqKijjHrIBCBNiRdV2bt0NyhfD5F_WCKMrWoG9HssgFv3afRYw-XrglYUUJXODXOmsKlXPmZr28l15QYwbh10gpnldSEp0ILoZhyqmCKacGdFsSazFcU9BSqs_nMngFSHn4rafzvUJCgw6a1v5HzEIxTnkmbnEM9dM3kcyOMMSl75eLvy7dw0Bv180n-PHi9hEOPPGRYhk_kFVRXi7W9hn3ztfpYLm7ie_0GWbmhKw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2018+IEEE%2FACM+26th+International+Conference+on+Program+Comprehension+%28ICPC%29&rft.atitle=A+Search-Based+Approach+for+Accurate+Identification+of+Log+Message+Formats&rft.au=Messaoudi%2C+Salma&rft.au=Panichella%2C+Annibale&rft.au=Bianculli%2C+Domenico&rft.au=Briand%2C+Lionel&rft.date=2018-05-28&rft.pub=ACM&rft.eissn=2643-7171&rft.spage=167&rft.epage=16710&rft_id=info:doi/10.1145%2F3196321.3196340&rft.externalDocID=8973030