Practical Preprocessing of Logs at Scale

Logs are diverse in structure and large in volume. While containing important information about systems at runtime, they must be preprocessed before analysis can be performed. First, logs need to be parsed into a useful format and second, often times, logs need to be separated into groups before to...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online) s. 117 - 121
Hlavní autor: Zhao, Jianchen
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 27.04.2025
Témata:
ISSN:2574-1934
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Logs are diverse in structure and large in volume. While containing important information about systems at runtime, they must be preprocessed before analysis can be performed. First, logs need to be parsed into a useful format and second, often times, logs need to be separated into groups before to reduce noise. We identify two challenges in adopting preprocessing of logs in large scale: the preprocessing steps must be generalizable to handle the diversity and evolving nature of logs, and efficient to keep up with the large volume produced by applications. To tackle these challenges, we first focus our research on studying the use of identifiers used for log groupings. We then propose an alternative approach based on interleaved sequence models. We also investigate log parsing on console logs, a type of logs of which parsing is not well studied. Finally, we propose a log parsing technique based on entropy estimated with a language model.
AbstractList Logs are diverse in structure and large in volume. While containing important information about systems at runtime, they must be preprocessed before analysis can be performed. First, logs need to be parsed into a useful format and second, often times, logs need to be separated into groups before to reduce noise. We identify two challenges in adopting preprocessing of logs in large scale: the preprocessing steps must be generalizable to handle the diversity and evolving nature of logs, and efficient to keep up with the large volume produced by applications. To tackle these challenges, we first focus our research on studying the use of identifiers used for log groupings. We then propose an alternative approach based on interleaved sequence models. We also investigate log parsing on console logs, a type of logs of which parsing is not well studied. Finally, we propose a log parsing technique based on entropy estimated with a language model.
Author Zhao, Jianchen
Author_xml – sequence: 1
  givenname: Jianchen
  surname: Zhao
  fullname: Zhao, Jianchen
  email: jianchen.zhao@uwaterloo.ca
  organization: University of Waterloo,Department of Electrical and Computer Engineering,Canada
BookMark eNotj01Lw0AURUdRsNb8AxdZiZvEefP5ZimhaiFgobourzOTEmkzIZON_96Kru7iHi7n3rKrIQ2RsQfgNQB3T-tmu6qadBpp6NNgjNCiFlzomnMuzQUrnHUoJWhpUMIlWwhtVQVOqhtW5Pz1iwkulbML9riZyM-9p2O5meI4JR9z7odDmbqyTYdc0lxuz228Y9cdHXMs_nPJPl9WH81b1b6_rpvntiKBfK5Imb1XFCEEDNYZG8Cj8So6i4R67yFYBGkIOx8C18FosKiIe9RAZ-slu__b7WOMu3HqTzR9786_hVIK5A_7kkc7
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICSE-Companion66252.2025.00036
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331536831
EISSN 2574-1934
EndPage 121
ExternalDocumentID 11024441
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-a280t-a46bc4ae1dd8d7967d1c86c4e978a85bc1d78136a8fcdd05d651784a0c851a833
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001554070400026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Jun 18 06:01:38 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a280t-a46bc4ae1dd8d7967d1c86c4e978a85bc1d78136a8fcdd05d651784a0c851a833
PageCount 5
ParticipantIDs ieee_primary_11024441
PublicationCentury 2000
PublicationDate 2025-April-27
PublicationDateYYYYMMDD 2025-04-27
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-April-27
  day: 27
PublicationDecade 2020
PublicationTitle Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online)
PublicationTitleAbbrev ICSE-COMPANION
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003203497
Score 2.2899272
Snippet Logs are diverse in structure and large in volume. While containing important information about systems at runtime, they must be preprocessed before analysis...
SourceID ieee
SourceType Publisher
StartPage 117
SubjectTerms ai for se
Data mining
Entropy
Industries
Noise
Runtime
Software
software log mining
Springs
Title Practical Preprocessing of Logs at Scale
URI https://ieeexplore.ieee.org/document/11024441
WOSCitedRecordID wos001554070400026&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB20iHhSseI3exDxEru7ySaz59KiUEqhKr2VbD7Ey25pt_39TnZr9eLBW8hckgnhZSbz3gDcE-Ip5FbQTfOcCS8zhl5zFntvUy7z3DVVle8jNR7jbJZPtmT1hgvjnGuKz9xTGDZ_-bYy65Aq6xFUERoFmvq-UrIla-0SKjwNUivqEB62Opq9l_50wNprFfrh0Es_MK_SkESJgyTzr3YqDZoMj_-5jhPo_vDyoskOcU5hz5Vn8NhKDpGvyeYWbeE_maPKR6PqYxXpOpqS1XXhbTh47T-zbf8DplOMa6aFLIzQLrEWrcqlsolBaYSjyE9jVpjEKky41OiNtXFmZZYoFDo29IzSyPk5dMqqdBcQYaggFJ6iiUIKbVP0gvyVa4oOCa6MuYRu2OZ80UpczL93ePXH_DUcBU-Gb5VU3UCnXq7dLRyYTf25Wt41B_MFSOiMog
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEJ0YNOpJjRi_3YMxXiq7bbftngkE4kpIQMONlH4YLyyBxd_vdBfRiwdvTefSTtO8znTeG4B7RDypmOV40zwj3IuUKK8Zib23lIksc1VV5VsuBwM1mWTDDVm94sI456riM_cUhtVfvi3MOqTKWghViEaBpr6bck7jmq61TakwGsRW5D48bJQ0W_32qEPqixU64uBbP3CvaEijxEGU-VdDlQpPukf_XMkxNH-YedFwizknsOPmp_BYiw6ht9HmFnXpP5qjwkd58b6KdBmN0Oqa8NrtjNs9sumAQDRVcUk0FzPDtUusVVZmQtrEKGG4w9hPq3RmEitVwoRW3lgbp1akiVRcxwYfUloxdgaNeTF35xCpUEPIPcYTM8G1pcpz9FemMT5EwDLmApphm9NFLXIx_d7h5R_zd3DQG7_k07w_eL6Cw-DV8MlC5TU0yuXa3cCe-Sw_Vsvb6pC-AM7Oj-k
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE%2FACM+International+Conference+on+Software+Engineering+Companion.+Online%29&rft.atitle=Practical+Preprocessing+of+Logs+at+Scale&rft.au=Zhao%2C+Jianchen&rft.date=2025-04-27&rft.pub=IEEE&rft.eissn=2574-1934&rft.spage=117&rft.epage=121&rft_id=info:doi/10.1109%2FICSE-Companion66252.2025.00036&rft.externalDocID=11024441