Practical Preprocessing of Logs at Scale

Logs are diverse in structure and large in volume. While containing important information about systems at runtime, they must be preprocessed before analysis can be performed. First, logs need to be parsed into a useful format and second, often times, logs need to be separated into groups before to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online) S. 117 - 121
1. Verfasser: Zhao, Jianchen
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 27.04.2025
Schlagworte:
ISSN:2574-1934
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Logs are diverse in structure and large in volume. While containing important information about systems at runtime, they must be preprocessed before analysis can be performed. First, logs need to be parsed into a useful format and second, often times, logs need to be separated into groups before to reduce noise. We identify two challenges in adopting preprocessing of logs in large scale: the preprocessing steps must be generalizable to handle the diversity and evolving nature of logs, and efficient to keep up with the large volume produced by applications. To tackle these challenges, we first focus our research on studying the use of identifiers used for log groupings. We then propose an alternative approach based on interleaved sequence models. We also investigate log parsing on console logs, a type of logs of which parsing is not well studied. Finally, we propose a log parsing technique based on entropy estimated with a language model.
ISSN:2574-1934
DOI:10.1109/ICSE-Companion66252.2025.00036