A Search-Based Approach for Accurate Identification of Log Message Formats
Many software engineering activities process the events contained in log files. However, before performing any processing activity, it is necessary to parse the entries in a log file, to retrieve the actual events recorded in the log. Each event is denoted by a log message, which is composed of a fi...
Uloženo v:
| Vydáno v: | 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC) s. 167 - 16710 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
28.05.2018
|
| Témata: | |
| ISSN: | 2643-7171 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Many software engineering activities process the events contained in log files. However, before performing any processing activity, it is necessary to parse the entries in a log file, to retrieve the actual events recorded in the log. Each event is denoted by a log message, which is composed of a fixed part-called (event) template-that is the same for all occurrences of the same event type, and a variable part, which may vary with each event occurrence. The formats of log messages, in complex and evolving systems, have numerous variations, are typically not entirely known, and change on a frequent basis; therefore, they need to be identified automatically. The log message format identification problem deals with the identification of the different templates used in the messages of a log. Any solution to this problem has to generate templates that meet two main goals: generating templates that are not too general, so as to distinguish different events, but also not too specific, so as not to consider different occurrences of the same event as following different templates; however, these goals are conflicting. In this paper, we present the MoLFI approach, which recasts the log message identification problem as a multi-objective problem. MoLFI uses an evolutionary approach to solve this problem, by tailoring the NSGA-II algorithm to search the space of solutions for a Pareto optimal set of message templates. We have implemented MoLFI in a tool, which we have evaluated on six real-world datasets, containing log files with a number of entries ranging from 2K to 300K. The experiments results show that MoLFI extracts by far the highest number of correct log message templates, significantly outperforming two state-of-the-art approaches on all datasets. |
|---|---|
| AbstractList | Many software engineering activities process the events contained in log files. However, before performing any processing activity, it is necessary to parse the entries in a log file, to retrieve the actual events recorded in the log. Each event is denoted by a log message, which is composed of a fixed part-called (event) template-that is the same for all occurrences of the same event type, and a variable part, which may vary with each event occurrence. The formats of log messages, in complex and evolving systems, have numerous variations, are typically not entirely known, and change on a frequent basis; therefore, they need to be identified automatically. The log message format identification problem deals with the identification of the different templates used in the messages of a log. Any solution to this problem has to generate templates that meet two main goals: generating templates that are not too general, so as to distinguish different events, but also not too specific, so as not to consider different occurrences of the same event as following different templates; however, these goals are conflicting. In this paper, we present the MoLFI approach, which recasts the log message identification problem as a multi-objective problem. MoLFI uses an evolutionary approach to solve this problem, by tailoring the NSGA-II algorithm to search the space of solutions for a Pareto optimal set of message templates. We have implemented MoLFI in a tool, which we have evaluated on six real-world datasets, containing log files with a number of entries ranging from 2K to 300K. The experiments results show that MoLFI extracts by far the highest number of correct log message templates, significantly outperforming two state-of-the-art approaches on all datasets. |
| Author | Sasnauskas, Raimondas Messaoudi, Salma Briand, Lionel Panichella, Annibale Bianculli, Domenico |
| Author_xml | – sequence: 1 givenname: Salma surname: Messaoudi fullname: Messaoudi, Salma organization: University of Luxembourg – sequence: 2 givenname: Annibale surname: Panichella fullname: Panichella, Annibale organization: University of Luxembourg – sequence: 3 givenname: Domenico surname: Bianculli fullname: Bianculli, Domenico organization: University of Luxembourg – sequence: 4 givenname: Lionel surname: Briand fullname: Briand, Lionel organization: University of Luxembourg – sequence: 5 givenname: Raimondas surname: Sasnauskas fullname: Sasnauskas, Raimondas organization: SES |
| BookMark | eNotjsFOwzAQRA0Cibb0zIGLfyDFWzve5BgqCkVBHIBztXHWbRCNKzsc-PtGgObwpJHeaKbiog89C3EDagFg8jsNpdVLWPzSqDMxHVulcwRTnIvJ0hqdISBciXlKn0opvVTaIE7EcyXfmKLbZ_eUuJXV8RgDub30IcrKue9IA8tNy_3Q-c7R0IVeBi_rsJMvnBLtWK5DPNCQrsWlp6_E83_OxMf64X31lNWvj5tVVWekQQ9Z7jy71ueWjC3Hs7bR4NBY9gWjZyoasDk2iGTIU2vINGh9g8CuHEXUM3H7t9sx8_YYuwPFn21RolZjTuKgTmM |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3196321.3196340 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) - NZ url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 1450357148 9781450357142 |
| EISSN | 2643-7171 |
| EndPage | 16710 |
| ExternalDocumentID | 8973030 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-a313t-5cfecdf56a4693406b31c746ef8e7fea8b1657b77a4afad4a4b76fb71ec9ecd73 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 117 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000555427300017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:26:43 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a313t-5cfecdf56a4693406b31c746ef8e7fea8b1657b77a4afad4a4b76fb71ec9ecd73 |
| PageCount | 16544 |
| ParticipantIDs | ieee_primary_8973030 |
| PublicationCentury | 2000 |
| PublicationDate | 2018-05-28 |
| PublicationDateYYYYMMDD | 2018-05-28 |
| PublicationDate_xml | – month: 05 year: 2018 text: 2018-05-28 day: 28 |
| PublicationDecade | 2010 |
| PublicationTitle | 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC) |
| PublicationTitleAbbrev | ICPC |
| PublicationYear | 2018 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0003203477 ssj0002869941 |
| Score | 2.482794 |
| Snippet | Many software engineering activities process the events contained in log files. However, before performing any processing activity, it is necessary to parse... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 167 |
| SubjectTerms | Distance measurement log analysis log message format log parsing NSGA-II Pareto optimization Search problems Software engineering |
| Title | A Search-Based Approach for Accurate Identification of Log Message Formats |
| URI | https://ieeexplore.ieee.org/document/8973030 |
| WOSCitedRecordID | wos000555427300017&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFLVKxcBUoEVQHvLAiNv6EdsZC6JCqFQdAHWrbOe6YmlQH3w_tpMWBhamOJEcxY6Te3ztcw5Ct1Q4XXhDSZabARFSMmKtEAQioLCFsCyZ9r2P1WSiZ7N82kB3ey4MAKTNZ9CLxbSWX5RuG1NlfZ2H8cjDBP1AKVlxtfb5FKZlvuNkxnPOBlwoVav5UJH102BjtJeOMdnxy04lRZNR63_PcYw6P7Q8PN0HnBPUgOUpau18GXD9mbbR8xBXu4jJfYhRBR7WuuE4AFQ8dG4b5SFwRdH1dc4Olx6PywV-iZYoC8CjhGXXHfQ2enx9eCK1ZwIxnPINyZwHV_hMmjDvDa2UllOnhASvQXkw2lKZKauUEcabQhhhlfRWUXB5qKj4GWouyyWcI2wC-DbahZ-holGFzdpwIx8AmOQy18AuUDt2zfyzksWY173S_fvyJToKWEPHhXemr1Bzs9rCNTp0X5uP9eomvctv7r2d4g |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8MgGP2yTBM9Td2Mv-XgUbZRKNDjNC5Tu2WHaXZbgMLiZTP74d8vsG568OKptAlNobTf44P3HsAdYUYWThGcZqqNGecJ1poxbAOg0AXTSTTte8_FYCDH42xYgfsdF8ZaGzef2WYoxrX8Ym7WIVXWkpkfj9RP0PeCc1bJ1tplVBLJsy0rM5zTpE2ZEKWeD2FpKw63hDTjMaQ7fhmqxHjSrf3vSY6g8UPMQ8NdyDmGip2dQG3rzIDKD7UOLx202UeMH3yUKlCnVA5HHqKijjHrIBCBNiRdV2bt0NyhfD5F_WCKMrWoG9HssgFv3afRYw-XrglYUUJXODXOmsKlXPmZr28l15QYwbh10gpnldSEp0ILoZhyqmCKacGdFsSazFcU9BSqs_nMngFSHn4rafzvUJCgw6a1v5HzEIxTnkmbnEM9dM3kcyOMMSl75eLvy7dw0Bv180n-PHi9hEOPPGRYhk_kFVRXi7W9hn3ztfpYLm7ie_0GWbmhKw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2018+IEEE%2FACM+26th+International+Conference+on+Program+Comprehension+%28ICPC%29&rft.atitle=A+Search-Based+Approach+for+Accurate+Identification+of+Log+Message+Formats&rft.au=Messaoudi%2C+Salma&rft.au=Panichella%2C+Annibale&rft.au=Bianculli%2C+Domenico&rft.au=Briand%2C+Lionel&rft.date=2018-05-28&rft.pub=ACM&rft.eissn=2643-7171&rft.spage=167&rft.epage=16710&rft_id=info:doi/10.1145%2F3196321.3196340&rft.externalDocID=8973030 |