Analysis of selected algorithms for detecting outliers in data.

Uloženo v:
Podrobná bibliografie
Název: Analysis of selected algorithms for detecting outliers in data.
Autoři: Brzezińska, Agnieszka Nowak1 (AUTHOR) agnieszka.nowak-brzezinska@us.edu.pl, Jasiak, Dawid1 (AUTHOR)
Zdroj: Procedia Computer Science. 2024, Vol. 246, p2205-2214. 10p.
Témata: Sampling (Process), Algorithms
Abstrakt: This study involves analysis of various machine learning models for anomaly detection, focusing on diverse aspects such as sampling types, model tuning, and the size of training sets. By exploring a broad spectrum of supervised, unsupervised, and semi-supervised algorithms, the research aims to furnish a nuanced understanding of model performance dynamics within the anomaly detection domain. A significant finding from this analysis is the superior performance of tuned models over their untuned counterparts, thereby underscoring the pivotal role of hyperparameter tuning in augmenting algorithmic efficiency. The examination of sampling methodologies reveals that non-augmented sampling techniques strike an optimal balance between accuracy and training duration. Although hybrid and oversampling methods necessitate extended training periods, they nonetheless yield competitive outcomes. Conversely, undersampling approaches facilitate expedited training processes although at the expense of reduced average AUCPR values. [ABSTRACT FROM AUTHOR]
Databáze: Supplemental Index
Popis
Abstrakt:This study involves analysis of various machine learning models for anomaly detection, focusing on diverse aspects such as sampling types, model tuning, and the size of training sets. By exploring a broad spectrum of supervised, unsupervised, and semi-supervised algorithms, the research aims to furnish a nuanced understanding of model performance dynamics within the anomaly detection domain. A significant finding from this analysis is the superior performance of tuned models over their untuned counterparts, thereby underscoring the pivotal role of hyperparameter tuning in augmenting algorithmic efficiency. The examination of sampling methodologies reveals that non-augmented sampling techniques strike an optimal balance between accuracy and training duration. Although hybrid and oversampling methods necessitate extended training periods, they nonetheless yield competitive outcomes. Conversely, undersampling approaches facilitate expedited training processes although at the expense of reduced average AUCPR values. [ABSTRACT FROM AUTHOR]
ISSN:18770509
DOI:10.1016/j.procs.2024.09.582