Predictive diagnostics of computer systems logs using natural language processing techniques
This study aims to develop and validate a method for predictive diagnostics and anomaly detection in computer system logs, using the Vertica database as a case study. The proposed approach is based on semisupervised learning combined with natural language processing techniques. A specialized parser...
Uloženo v:
| Vydáno v: | Discrete and continuous models and applied computational science Ročník 33; číslo 2; s. 172 - 183 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Peoples’ Friendship University of Russia (RUDN University)
15.07.2025
|
| Témata: | |
| ISSN: | 2658-4670, 2658-7149 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | This study aims to develop and validate a method for predictive diagnostics and anomaly detection in computer system logs, using the Vertica database as a case study. The proposed approach is based on semisupervised learning combined with natural language processing techniques. A specialized parser utilizing a semantic graph was developed for data preprocessing. Vectorization was performed using the fastText NLP library and TF-IDF weighting. Empirical validation was conducted on real Vertica log files from a large IT company, containing periods of normal operation and anomalies leading to failures. A comparative assessment of various anomaly detection algorithms was performed, including k-nearest neighbors, autoencoders, One Class SVM, Isolation Forest, Local Outlier Factor, and Elliptic Envelope. Results are visualized through anomaly graphs depicting time intervals exceeding the threshold level. The findings demonstrate high efficacy of the proposed approach in identifying anomalies preceding system failures and delineate promising directions for further research. |
|---|---|
| ISSN: | 2658-4670 2658-7149 |
| DOI: | 10.22363/2658-4670-2025-33-2-172-183 |