A Robust Statistical Framework for Outlier Detection and Its Influence on Predictive Modeling Accuracy

Outliers, defined as observations that deviate substantially from the majority of data, pose a serious challenge to predictive modeling by distorting estimation, increasing variance, and reducing model reliability. Although numerous statistical and machine learning approaches for outlier detection h...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of Al-Qadisiyah for Computer Science and Mathematics Ročník 17; číslo 3
Hlavní autoři: Kamil Habeeb, Hadeel, Hatem Hassan, Faten
Médium: Journal Article
Jazyk:angličtina
Vydáno: 30.09.2025
ISSN:2074-0204, 2521-3504
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Outliers, defined as observations that deviate substantially from the majority of data, pose a serious challenge to predictive modeling by distorting estimation, increasing variance, and reducing model reliability. Although numerous statistical and machine learning approaches for outlier detection have been proposed, their direct influence on prediction accuracy across real-world domains has received limited attention. This study develops a robust statistical framework that integrates univariate, multivariate, and machine learning–based detection methods with confirmatory regression diagnostics and a bootstrap-driven model selection strategy. Candidate anomalies are first identified through histogram- and IQR-based screening, kNN and LOF density–proximity measures, and isolation forest and one-class SVM classifiers. They are then statistically validated using standardized residuals and Cook’s distance, while robustness is reinforced through MM-estimation and bounded loss functions. Evaluation is conducted using both synthetic contamination experiments and real datasets from finance, healthcare, and marketing, comparing models trained with and without detected outliers across classifiers such as SVM, logistic regression, KNN, random forest, and AdaBoost. The results demonstrate that excluding or down-weighting outliers consistently enhances predictive accuracy and stability, particularly in settings with heavy-tailed errors and heterogeneous distributions. The proposed framework provides a practical and statistically principled approach for improving model fidelity, offering broad applicability across diverse domains where reliable prediction is essential.
AbstractList Outliers, defined as observations that deviate substantially from the majority of data, pose a serious challenge to predictive modeling by distorting estimation, increasing variance, and reducing model reliability. Although numerous statistical and machine learning approaches for outlier detection have been proposed, their direct influence on prediction accuracy across real-world domains has received limited attention. This study develops a robust statistical framework that integrates univariate, multivariate, and machine learning–based detection methods with confirmatory regression diagnostics and a bootstrap-driven model selection strategy. Candidate anomalies are first identified through histogram- and IQR-based screening, kNN and LOF density–proximity measures, and isolation forest and one-class SVM classifiers. They are then statistically validated using standardized residuals and Cook’s distance, while robustness is reinforced through MM-estimation and bounded loss functions. Evaluation is conducted using both synthetic contamination experiments and real datasets from finance, healthcare, and marketing, comparing models trained with and without detected outliers across classifiers such as SVM, logistic regression, KNN, random forest, and AdaBoost. The results demonstrate that excluding or down-weighting outliers consistently enhances predictive accuracy and stability, particularly in settings with heavy-tailed errors and heterogeneous distributions. The proposed framework provides a practical and statistically principled approach for improving model fidelity, offering broad applicability across diverse domains where reliable prediction is essential.
Author Kamil Habeeb, Hadeel
Hatem Hassan, Faten
Author_xml – sequence: 1
  givenname: Hadeel
  surname: Kamil Habeeb
  fullname: Kamil Habeeb, Hadeel
– sequence: 2
  givenname: Faten
  surname: Hatem Hassan
  fullname: Hatem Hassan, Faten
BookMark eNot0EFOwzAQBVALFYlSegMWvkCCx3biZFkVCpWKiqD7yJ2MkSFNwHZBvT2lsJqv-dJfvEs26oeeGLsGkctaCX3z9olxl0shixxMrqSW-oyNZSEhU4XQo2MWRmdCCn3BpjH6rdDaFFCXYszcjD8P231M_CXZ5GPyaDu-CHZH30N4524IfL1PnafAbykRJj_03PYtX6bIl73r9tQj8ePzKVDrj_0X8cehpc73r3yGuA8WD1fs3Nku0vT_TthmcbeZP2Sr9f1yPltlWIPODCo0FgQIhBLaipRtwUBli21JrUM0oFSNVS0dUF0BlSUZ0sIIpwkFqgnTf7MYhhgDueYj-J0NhwZEc9JqTlrNr1YDpjlpqR_xfGHb
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.29304/jqcsm.2025.17.32424
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
EISSN 2521-3504
ExternalDocumentID 10_29304_jqcsm_2025_17_32424
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
OK1
ID FETCH-LOGICAL-c914-7c3c7a1010c161d8e3ad1718a5b6edfcc71339c892f1e981e66e7e4070f4ec0c3
ISSN 2074-0204
IngestDate Wed Nov 05 20:54:12 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Issue 3
Language English
License http://creativecommons.org/licenses/by-nc-nd/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c914-7c3c7a1010c161d8e3ad1718a5b6edfcc71339c892f1e981e66e7e4070f4ec0c3
OpenAccessLink https://jqcsm.qu.edu.iq/index.php/journalcm/article/download/2424/1122
ParticipantIDs crossref_primary_10_29304_jqcsm_2025_17_32424
PublicationCentury 2000
PublicationDate 2025-09-30
PublicationDateYYYYMMDD 2025-09-30
PublicationDate_xml – month: 09
  year: 2025
  text: 2025-09-30
  day: 30
PublicationDecade 2020
PublicationTitle Journal of Al-Qadisiyah for Computer Science and Mathematics
PublicationYear 2025
SSID ssib044751960
ssib016479590
ssib032177102
ssib046619541
Score 1.9233497
Snippet Outliers, defined as observations that deviate substantially from the majority of data, pose a serious challenge to predictive modeling by distorting...
SourceID crossref
SourceType Index Database
Title A Robust Statistical Framework for Outlier Detection and Its Influence on Predictive Modeling Accuracy
Volume 17
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2521-3504
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssib044751960
  issn: 2074-0204
  databaseCode: M~E
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Nb9MwFLfKxoHLtGkg2BjygdvkUSd2HR8rWLXD2IdUod0ix7XFpBJGk07bhX-Df5f3nDjJ0ITYgUvVvkZOm_fT7z37fRHyPk0KDYYFy37HigkwQEwLWTDlJ0VqjLE89NL7cqrOzrKrK30xGv2KtTC3S1WW2d2dvvmvqgYZKBtLZ5-g7m5REMB7UDq8gtrh9Z8UP8Vs6XVVBz8ytGEGLcxiDlZIKzxf10usMvnkateOCses4LoCumhnlmAQ4WKFUZyQW4Qj00Lh-tTa9crYB8HgoVO7ZJdmcV1d35uv4VZxakRHIk1iR-wV24eS8KDl8MQUzoUIETCic8ueH2v3DWRV1ZzXzuBzOTywSGTMroi8lmAKKNbkNiaokYEXwVLZyiIxqwEA08f4HnyVsUDC_2ErbCuQyCOujtBHFL19izH9P8xel4wI26CwTh5WyXGVnKs8rPKMbCZKaqTLzz-PI1FhDzYt-_BiCvs6NejDhk0Ugdm67wU4QVqGKardn2_KOMONPzzy8wdu0sDfmW-TrVandNoAbIeMXLlL_JQ24KIDcNEOXBQ0Tltw0Q5cFFROAVy0AxcFYQ8uGsFFI7hekvnseP7xhLWTOpjVXDBlU6sMkPvYwgZikbnULDg4PUYWE7fw1uJJiLaZTjx3OuNuMnHKCbA2Xjg7tukrslF-L91rQrmB6wuwepJ7IbzJYEdhdCKFlz4phH9DWHwu-U3TjyX_mwL3nnj9PnnRY_Yt2ahXa3dAnttbeKardwEFvwHD-IGU
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Robust+Statistical+Framework+for+Outlier+Detection+and+Its+Influence+on+Predictive+Modeling+Accuracy&rft.jtitle=Journal+of+Al-Qadisiyah+for+Computer+Science+and+Mathematics&rft.au=Kamil+Habeeb%2C+Hadeel&rft.au=Hatem+Hassan%2C+Faten&rft.date=2025-09-30&rft.issn=2074-0204&rft.eissn=2521-3504&rft.volume=17&rft.issue=3&rft_id=info:doi/10.29304%2Fjqcsm.2025.17.32424&rft.externalDBID=n%2Fa&rft.externalDocID=10_29304_jqcsm_2025_17_32424
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2074-0204&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2074-0204&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2074-0204&client=summon