Diabetes Risk Prediction using Shapley Additive Explanations for Feature Engineering
Saved in:
| Title: | Diabetes Risk Prediction using Shapley Additive Explanations for Feature Engineering |
|---|---|
| Authors: | Chinwe Miracle Chituru, Sin-Ban Ho, Ian Chai |
| Source: | Journal of Informatics and Web Engineering, Vol 4, Iss 2, Pp 18-35 (2025) |
| Publisher Information: | MMU Press, 2025. |
| Publication Year: | 2025 |
| Collection: | LCC:Electronic computers. Computer science LCC:Information technology |
| Subject Terms: | diabetes risk prediction, decision tree algorithm, additive explanations, feature engineering, data visualization, Electronic computers. Computer science, QA75.5-76.95, Information technology, T58.5-58.64 |
| Description: | Diabetes is prevalent globally, expected to increase in the next few years. This includes people with different types of diabetes including type 1 diabetes and type 2 diabetes. There are several causes for the increase: dietary decisions and lack of exercise as the main ones. This global health challenge calls for effective prediction and early management of the disease. This research focuses on the decision tree algorithm utilization to predict the risk of diabetes and model interpretability with the integration of SHapley Additive exPlanations (SHAP) for feature engineering. Random forest and gradient boosting models were developed to identify the risk factors and compare the prediction with the decision tree model. The performance of these classifiers was evaluated using the metrics for accuracy, f1-score, precision, and recall. Understanding the features that drive predictions can enhance clinical decision-making as much as predictive accuracy. With the use of a comprehensive dataset having 520 instances with 17 features including the target output, the proposed decision tree model had an accuracy of 97%. The decision tree model’s categorical variables enable straightforward data visualization. The SHAP tool was applied to interpret the model’s prediction after developing the model. This is crucial for healthcare practitioners as it provides specific health metrics to identify high-risk diabetic patients. Preliminary results indicate that a combination of polyuria, polydipsia, and age are predictors of diabetes risk. This study highlights the benefits that the integration of SHAP and decision trees algorithm provides predictive capability and transparent model interpretability. It also contributes to the growing body of literature on machine learning in the healthcare industry. The results advocate for the application of this methodology in clinical settings for prediction fostering trust between the approach and practitioners and patients alike. |
| Document Type: | article |
| File Description: | electronic resource |
| Language: | English |
| ISSN: | 2821-370X |
| Relation: | https://journals.mmupress.com/index.php/jiwe/article/view/1387; https://doaj.org/toc/2821-370X |
| DOI: | 10.33093/jiwe.2025.4.2.2 |
| Access URL: | https://doaj.org/article/51efe5b545bb42fbb252885db4a6b77c |
| Accession Number: | edsdoj.51efe5b545bb42fbb252885db4a6b77c |
| Database: | Directory of Open Access Journals |
| Abstract: | Diabetes is prevalent globally, expected to increase in the next few years. This includes people with different types of diabetes including type 1 diabetes and type 2 diabetes. There are several causes for the increase: dietary decisions and lack of exercise as the main ones. This global health challenge calls for effective prediction and early management of the disease. This research focuses on the decision tree algorithm utilization to predict the risk of diabetes and model interpretability with the integration of SHapley Additive exPlanations (SHAP) for feature engineering. Random forest and gradient boosting models were developed to identify the risk factors and compare the prediction with the decision tree model. The performance of these classifiers was evaluated using the metrics for accuracy, f1-score, precision, and recall. Understanding the features that drive predictions can enhance clinical decision-making as much as predictive accuracy. With the use of a comprehensive dataset having 520 instances with 17 features including the target output, the proposed decision tree model had an accuracy of 97%. The decision tree model’s categorical variables enable straightforward data visualization. The SHAP tool was applied to interpret the model’s prediction after developing the model. This is crucial for healthcare practitioners as it provides specific health metrics to identify high-risk diabetic patients. Preliminary results indicate that a combination of polyuria, polydipsia, and age are predictors of diabetes risk. This study highlights the benefits that the integration of SHAP and decision trees algorithm provides predictive capability and transparent model interpretability. It also contributes to the growing body of literature on machine learning in the healthcare industry. The results advocate for the application of this methodology in clinical settings for prediction fostering trust between the approach and practitioners and patients alike. |
|---|---|
| ISSN: | 2821370X |
| DOI: | 10.33093/jiwe.2025.4.2.2 |
Nájsť tento článok vo Web of Science