A Robust Statistical Framework for Outlier Detection and Its Influence on Predictive Modeling Accuracy
Outliers, defined as observations that deviate substantially from the majority of data, pose a serious challenge to predictive modeling by distorting estimation, increasing variance, and reducing model reliability. Although numerous statistical and machine learning approaches for outlier detection h...
Uložené v:
| Vydané v: | Journal of Al-Qadisiyah for Computer Science and Mathematics Ročník 17; číslo 3 |
|---|---|
| Hlavní autori: | , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
30.09.2025
|
| ISSN: | 2074-0204, 2521-3504 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Outliers, defined as observations that deviate substantially from the majority of data, pose a serious challenge to predictive modeling by distorting estimation, increasing variance, and reducing model reliability. Although numerous statistical and machine learning approaches for outlier detection have been proposed, their direct influence on prediction accuracy across real-world domains has received limited attention. This study develops a robust statistical framework that integrates univariate, multivariate, and machine learning–based detection methods with confirmatory regression diagnostics and a bootstrap-driven model selection strategy. Candidate anomalies are first identified through histogram- and IQR-based screening, kNN and LOF density–proximity measures, and isolation forest and one-class SVM classifiers. They are then statistically validated using standardized residuals and Cook’s distance, while robustness is reinforced through MM-estimation and bounded loss functions. Evaluation is conducted using both synthetic contamination experiments and real datasets from finance, healthcare, and marketing, comparing models trained with and without detected outliers across classifiers such as SVM, logistic regression, KNN, random forest, and AdaBoost. The results demonstrate that excluding or down-weighting outliers consistently enhances predictive accuracy and stability, particularly in settings with heavy-tailed errors and heterogeneous distributions. The proposed framework provides a practical and statistically principled approach for improving model fidelity, offering broad applicability across diverse domains where reliable prediction is essential. |
|---|---|
| AbstractList | Outliers, defined as observations that deviate substantially from the majority of data, pose a serious challenge to predictive modeling by distorting estimation, increasing variance, and reducing model reliability. Although numerous statistical and machine learning approaches for outlier detection have been proposed, their direct influence on prediction accuracy across real-world domains has received limited attention. This study develops a robust statistical framework that integrates univariate, multivariate, and machine learning–based detection methods with confirmatory regression diagnostics and a bootstrap-driven model selection strategy. Candidate anomalies are first identified through histogram- and IQR-based screening, kNN and LOF density–proximity measures, and isolation forest and one-class SVM classifiers. They are then statistically validated using standardized residuals and Cook’s distance, while robustness is reinforced through MM-estimation and bounded loss functions. Evaluation is conducted using both synthetic contamination experiments and real datasets from finance, healthcare, and marketing, comparing models trained with and without detected outliers across classifiers such as SVM, logistic regression, KNN, random forest, and AdaBoost. The results demonstrate that excluding or down-weighting outliers consistently enhances predictive accuracy and stability, particularly in settings with heavy-tailed errors and heterogeneous distributions. The proposed framework provides a practical and statistically principled approach for improving model fidelity, offering broad applicability across diverse domains where reliable prediction is essential. |
| Author | Kamil Habeeb, Hadeel Hatem Hassan, Faten |
| Author_xml | – sequence: 1 givenname: Hadeel surname: Kamil Habeeb fullname: Kamil Habeeb, Hadeel – sequence: 2 givenname: Faten surname: Hatem Hassan fullname: Hatem Hassan, Faten |
| BookMark | eNot0EFOwzAQBVALFYlSegMWvkCCx3biZFkVCpWKiqD7yJ2MkSFNwHZBvT2lsJqv-dJfvEs26oeeGLsGkctaCX3z9olxl0shixxMrqSW-oyNZSEhU4XQo2MWRmdCCn3BpjH6rdDaFFCXYszcjD8P231M_CXZ5GPyaDu-CHZH30N4524IfL1PnafAbykRJj_03PYtX6bIl73r9tQj8ePzKVDrj_0X8cehpc73r3yGuA8WD1fs3Nku0vT_TthmcbeZP2Sr9f1yPltlWIPODCo0FgQIhBLaipRtwUBli21JrUM0oFSNVS0dUF0BlSUZ0sIIpwkFqgnTf7MYhhgDueYj-J0NhwZEc9JqTlrNr1YDpjlpqR_xfGHb |
| ContentType | Journal Article |
| DBID | AAYXX CITATION |
| DOI | 10.29304/jqcsm.2025.17.32424 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 2521-3504 |
| ExternalDocumentID | 10_29304_jqcsm_2025_17_32424 |
| GroupedDBID | AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION OK1 |
| ID | FETCH-LOGICAL-c914-7c3c7a1010c161d8e3ad1718a5b6edfcc71339c892f1e981e66e7e4070f4ec0c3 |
| ISSN | 2074-0204 |
| IngestDate | Wed Nov 05 20:54:12 EST 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Issue | 3 |
| Language | English |
| License | http://creativecommons.org/licenses/by-nc-nd/4.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c914-7c3c7a1010c161d8e3ad1718a5b6edfcc71339c892f1e981e66e7e4070f4ec0c3 |
| OpenAccessLink | https://jqcsm.qu.edu.iq/index.php/journalcm/article/download/2424/1122 |
| ParticipantIDs | crossref_primary_10_29304_jqcsm_2025_17_32424 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-09-30 |
| PublicationDateYYYYMMDD | 2025-09-30 |
| PublicationDate_xml | – month: 09 year: 2025 text: 2025-09-30 day: 30 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of Al-Qadisiyah for Computer Science and Mathematics |
| PublicationYear | 2025 |
| SSID | ssib044751960 ssib016479590 ssib032177102 ssib046619541 |
| Score | 1.9233497 |
| Snippet | Outliers, defined as observations that deviate substantially from the majority of data, pose a serious challenge to predictive modeling by distorting... |
| SourceID | crossref |
| SourceType | Index Database |
| Title | A Robust Statistical Framework for Outlier Detection and Its Influence on Predictive Modeling Accuracy |
| Volume | 17 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2521-3504 dateEnd: 99991231 omitProxy: false ssIdentifier: ssib044751960 issn: 2074-0204 databaseCode: M~E dateStart: 20090101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELaWwoELAgHiLR-4IZc4L8fHFXTVA5Qi7aG3yDs7EZWW0G6yVbn0b_TvMuPESUAVogcu0e5k13nMp89jz0uIt9mKZjkHlSoqF6tUR4VyxhqFBgxZt5Gx4Iu4fjJHR8XJiT2eza5DLszFxtR1cXlpz_6rqklGyubU2VuoexiUBPSZlE5HUjsd_0nxc46W3jWttyN9GWbSwiLEYPmwwi-7dsNZJh-xxb5VOEcFtw3RRd-zhJ0Ix1v24vjYIm6Z5hPX5wC7rYPfnMFTo3ajvrr1aXP6033zlwpdIwYS6QI7Qq3Y0ZXEGy3vDt0K0XuIiBERNyM_tvidZE3T7dcu6Hs93bCIsxBdEXgt5hBQzsntpqBORlaESrJeFojZTACY3MT3ZKtEKRP-OTRcViDO9rXZZxsxHee34NP_Y9obghFpGeTHKf0oJY9SalP6Ue6Iu7HJLNPl56uDQFRcg81mo3sxoXWdmdRh4yKKxGzD-ZSMIJv5LqrDw3dpnP7C72-4_YmZNLF3lg_Fg16nct4B7JGYYf1YVHPZgUtOwCUHcEnSuOzBJQdwSVK5JHDJAVyShCO4ZACXDOB6IpaLg-WHQ9V36lBgdaoMJGAckXsEtIBYF5i4tSajx2WrHNcVAO-EWChsXGm0hcY8R4MpzTZVihBB8lTs1T9qfCakTXS6on_nXOcvT8GiK3KNZEZywJaJnwsV3kt51tVjKf-mwBe3_P1LcX_E7Cux1253-Frcgwt6p9s3HgW_AJF8gPI |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Robust+Statistical+Framework+for+Outlier+Detection+and+Its+Influence+on+Predictive+Modeling+Accuracy&rft.jtitle=Journal+of+Al-Qadisiyah+for+Computer+Science+and+Mathematics&rft.au=Kamil+Habeeb%2C+Hadeel&rft.au=Hatem+Hassan%2C+Faten&rft.date=2025-09-30&rft.issn=2074-0204&rft.eissn=2521-3504&rft.volume=17&rft.issue=3&rft_id=info:doi/10.29304%2Fjqcsm.2025.17.32424&rft.externalDBID=n%2Fa&rft.externalDocID=10_29304_jqcsm_2025_17_32424 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2074-0204&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2074-0204&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2074-0204&client=summon |