A Robust Statistical Framework for Outlier Detection and Its Influence on Predictive Modeling Accuracy
Outliers, defined as observations that deviate substantially from the majority of data, pose a serious challenge to predictive modeling by distorting estimation, increasing variance, and reducing model reliability. Although numerous statistical and machine learning approaches for outlier detection h...
Saved in:
| Published in: | Journal of Al-Qadisiyah for Computer Science and Mathematics Vol. 17; no. 3 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
30.09.2025
|
| ISSN: | 2074-0204, 2521-3504 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Outliers, defined as observations that deviate substantially from the majority of data, pose a serious challenge to predictive modeling by distorting estimation, increasing variance, and reducing model reliability. Although numerous statistical and machine learning approaches for outlier detection have been proposed, their direct influence on prediction accuracy across real-world domains has received limited attention. This study develops a robust statistical framework that integrates univariate, multivariate, and machine learning–based detection methods with confirmatory regression diagnostics and a bootstrap-driven model selection strategy. Candidate anomalies are first identified through histogram- and IQR-based screening, kNN and LOF density–proximity measures, and isolation forest and one-class SVM classifiers. They are then statistically validated using standardized residuals and Cook’s distance, while robustness is reinforced through MM-estimation and bounded loss functions. Evaluation is conducted using both synthetic contamination experiments and real datasets from finance, healthcare, and marketing, comparing models trained with and without detected outliers across classifiers such as SVM, logistic regression, KNN, random forest, and AdaBoost. The results demonstrate that excluding or down-weighting outliers consistently enhances predictive accuracy and stability, particularly in settings with heavy-tailed errors and heterogeneous distributions. The proposed framework provides a practical and statistically principled approach for improving model fidelity, offering broad applicability across diverse domains where reliable prediction is essential. |
|---|---|
| ISSN: | 2074-0204 2521-3504 |
| DOI: | 10.29304/jqcsm.2025.17.32424 |