Practical Preprocessing of Logs at Scale
Logs are diverse in structure and large in volume. While containing important information about systems at runtime, they must be preprocessed before analysis can be performed. First, logs need to be parsed into a useful format and second, often times, logs need to be separated into groups before to...
Uloženo v:
| Vydáno v: | Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online) s. 117 - 121 |
|---|---|
| Hlavní autor: | |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
27.04.2025
|
| Témata: | |
| ISSN: | 2574-1934 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Logs are diverse in structure and large in volume. While containing important information about systems at runtime, they must be preprocessed before analysis can be performed. First, logs need to be parsed into a useful format and second, often times, logs need to be separated into groups before to reduce noise. We identify two challenges in adopting preprocessing of logs in large scale: the preprocessing steps must be generalizable to handle the diversity and evolving nature of logs, and efficient to keep up with the large volume produced by applications. To tackle these challenges, we first focus our research on studying the use of identifiers used for log groupings. We then propose an alternative approach based on interleaved sequence models. We also investigate log parsing on console logs, a type of logs of which parsing is not well studied. Finally, we propose a log parsing technique based on entropy estimated with a language model. |
|---|---|
| AbstractList | Logs are diverse in structure and large in volume. While containing important information about systems at runtime, they must be preprocessed before analysis can be performed. First, logs need to be parsed into a useful format and second, often times, logs need to be separated into groups before to reduce noise. We identify two challenges in adopting preprocessing of logs in large scale: the preprocessing steps must be generalizable to handle the diversity and evolving nature of logs, and efficient to keep up with the large volume produced by applications. To tackle these challenges, we first focus our research on studying the use of identifiers used for log groupings. We then propose an alternative approach based on interleaved sequence models. We also investigate log parsing on console logs, a type of logs of which parsing is not well studied. Finally, we propose a log parsing technique based on entropy estimated with a language model. |
| Author | Zhao, Jianchen |
| Author_xml | – sequence: 1 givenname: Jianchen surname: Zhao fullname: Zhao, Jianchen email: jianchen.zhao@uwaterloo.ca organization: University of Waterloo,Department of Electrical and Computer Engineering,Canada |
| BookMark | eNotj01Lw0AURUdRsNb8AxdZiZvEefP5ZimhaiFgobourzOTEmkzIZON_96Kru7iHi7n3rKrIQ2RsQfgNQB3T-tmu6qadBpp6NNgjNCiFlzomnMuzQUrnHUoJWhpUMIlWwhtVQVOqhtW5Pz1iwkulbML9riZyM-9p2O5meI4JR9z7odDmbqyTYdc0lxuz228Y9cdHXMs_nPJPl9WH81b1b6_rpvntiKBfK5Imb1XFCEEDNYZG8Cj8So6i4R67yFYBGkIOx8C18FosKiIe9RAZ-slu__b7WOMu3HqTzR9786_hVIK5A_7kkc7 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICSE-Companion66252.2025.00036 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798331536831 |
| EISSN | 2574-1934 |
| EndPage | 121 |
| ExternalDocumentID | 11024441 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-a280t-a46bc4ae1dd8d7967d1c86c4e978a85bc1d78136a8fcdd05d651784a0c851a833 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001554070400026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Jun 18 06:01:38 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a280t-a46bc4ae1dd8d7967d1c86c4e978a85bc1d78136a8fcdd05d651784a0c851a833 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_11024441 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-April-27 |
| PublicationDateYYYYMMDD | 2025-04-27 |
| PublicationDate_xml | – month: 04 year: 2025 text: 2025-April-27 day: 27 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online) |
| PublicationTitleAbbrev | ICSE-COMPANION |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003203497 |
| Score | 2.2899272 |
| Snippet | Logs are diverse in structure and large in volume. While containing important information about systems at runtime, they must be preprocessed before analysis... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 117 |
| SubjectTerms | ai for se Data mining Entropy Industries Noise Runtime Software software log mining Springs |
| Title | Practical Preprocessing of Logs at Scale |
| URI | https://ieeexplore.ieee.org/document/11024441 |
| WOSCitedRecordID | wos001554070400026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB20iHhSseI3exDxEru7ySaz59KiUEqhKr2VbD7Ey25pt_39TnZr9eLBW8hckgnhZSbz3gDcE-Ip5FbQTfOcCS8zhl5zFntvUy7z3DVVle8jNR7jbJZPtmT1hgvjnGuKz9xTGDZ_-bYy65Aq6xFUERoFmvq-UrIla-0SKjwNUivqEB62Opq9l_50wNprFfrh0Es_MK_SkESJgyTzr3YqDZoMj_-5jhPo_vDyoskOcU5hz5Vn8NhKDpGvyeYWbeE_maPKR6PqYxXpOpqS1XXhbTh47T-zbf8DplOMa6aFLIzQLrEWrcqlsolBaYSjyE9jVpjEKky41OiNtXFmZZYoFDo29IzSyPk5dMqqdBcQYaggFJ6iiUIKbVP0gvyVa4oOCa6MuYRu2OZ80UpczL93ePXH_DUcBU-Gb5VU3UCnXq7dLRyYTf25Wt41B_MFSOiMog |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0YNOpJjRi_3YMxXiq7bbftngkE4kpIQMONlH4YLyyBxd_vdBfRiwdvTefSTtO8znTeG4B7RDypmOV40zwj3IuUKK8Zib23lIksc1VV5VsuBwM1mWTDDVm94sI456riM_cUhtVfvi3MOqTKWghViEaBpr6bck7jmq61TakwGsRW5D48bJQ0W_32qEPqixU64uBbP3CvaEijxEGU-VdDlQpPukf_XMkxNH-YedFwizknsOPmp_BYiw6ht9HmFnXpP5qjwkd58b6KdBmN0Oqa8NrtjNs9sumAQDRVcUk0FzPDtUusVVZmQtrEKGG4w9hPq3RmEitVwoRW3lgbp1akiVRcxwYfUloxdgaNeTF35xCpUEPIPcYTM8G1pcpz9FemMT5EwDLmApphm9NFLXIx_d7h5R_zd3DQG7_k07w_eL6Cw-DV8MlC5TU0yuXa3cCe-Sw_Vsvb6pC-AM7Oj-k |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE%2FACM+International+Conference+on+Software+Engineering+Companion.+Online%29&rft.atitle=Practical+Preprocessing+of+Logs+at+Scale&rft.au=Zhao%2C+Jianchen&rft.date=2025-04-27&rft.pub=IEEE&rft.eissn=2574-1934&rft.spage=117&rft.epage=121&rft_id=info:doi/10.1109%2FICSE-Companion66252.2025.00036&rft.externalDocID=11024441 |