Practical Preprocessing of Logs at Scale

Logs are diverse in structure and large in volume. While containing important information about systems at runtime, they must be preprocessed before analysis can be performed. First, logs need to be parsed into a useful format and second, often times, logs need to be separated into groups before to...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online) pp. 117 - 121
Main Author:	Zhao, Jianchen
Format:	Conference Proceeding
Language:	English
Published:	IEEE 27.04.2025
Subjects:	ai for se Data mining Entropy Industries Noise Runtime Software software log mining Springs
ISSN:	2574-1934
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Logs are diverse in structure and large in volume. While containing important information about systems at runtime, they must be preprocessed before analysis can be performed. First, logs need to be parsed into a useful format and second, often times, logs need to be separated into groups before to reduce noise. We identify two challenges in adopting preprocessing of logs in large scale: the preprocessing steps must be generalizable to handle the diversity and evolving nature of logs, and efficient to keep up with the large volume produced by applications. To tackle these challenges, we first focus our research on studying the use of identifiers used for log groupings. We then propose an alternative approach based on interleaved sequence models. We also investigate log parsing on console logs, a type of logs of which parsing is not well studied. Finally, we propose a log parsing technique based on entropy estimated with a language model.
ISSN:	2574-1934
DOI:	10.1109/ICSE-Companion66252.2025.00036