A novel machine learning framework for stroke type identification in resource constrained settings with robustness to missing data

Uloženo v:
Podrobná bibliografie
Název: A novel machine learning framework for stroke type identification in resource constrained settings with robustness to missing data
Autoři: Aman Bhardwaj, Yamini Antil, M. V. Padma Srivastava, Pulikottil Wilson Vinny, Venugopalan Y. Vishnu, Rahul Garg
Zdroj: Scientific Reports, Vol 15, Iss 1, Pp 1-16 (2025)
Informace o vydavateli: Nature Portfolio, 2025.
Rok vydání: 2025
Sbírka: LCC:Medicine
LCC:Science
Témata: Machine learning, Stroke classification, Resource limited settings, Multiple imputation by chained equation, MICE, SHAP, Medicine, Science
Popis: Abstract Stroke is the third leading cause of disability and mortality worldwide. Accurate identification of stroke-type—ischemic or hemorrhagic—is critical for guiding treatment; however, it typically requires costly neuroimaging, which is often inaccessible in rural areas of developing countries. This study employs machine learning (ML) to identify stroke-type using only clinical data, aiming to develop a cost-effective method, particularly for resource-limited settings lacking neuroimaging facilities. A dataset from 2,190 stroke patients with 79 clinical attributes has been collected in-house and used for the development of the proposed ML-framework. The framework robustly addresses missing data through Multiple Imputation by Chained Equation (MICE), ensuring it to function robustly even when some laboratory test results are unavailable. Further, the research addresses target leakage through statistical tests and utilizes SHAP-analysis to identify the most important attributes for classification. The proposed framework achieves 82.42% weighted accuracy, 82.33% accuracy, 82.19% sensitivity, 82.65% specificity, and an 86.68% F1-score. Notably, with only the 19 most significant attributes identified via SHAP, the framework maintains a weighted accuracy of 82.20%. Prospective validation on an independent dataset demonstrates a 16.42% improvement over the best-performing clinical score, Siriraj. The proposed ML-framework may help reduce the time to treatment for patients in resource-limited settings by enabling prompt primary care and timely referral to stroke-ready facilities.
Druh dokumentu: article
Popis souboru: electronic resource
Jazyk: English
ISSN: 2045-2322
Relation: https://doaj.org/toc/2045-2322
DOI: 10.1038/s41598-025-16660-8
Přístupová URL adresa: https://doaj.org/article/ba57daa4281d4398a05c9af80a6d40b7
Přístupové číslo: edsdoj.ba57daa4281d4398a05c9af80a6d40b7
Databáze: Directory of Open Access Journals
Popis
Abstrakt:Abstract Stroke is the third leading cause of disability and mortality worldwide. Accurate identification of stroke-type—ischemic or hemorrhagic—is critical for guiding treatment; however, it typically requires costly neuroimaging, which is often inaccessible in rural areas of developing countries. This study employs machine learning (ML) to identify stroke-type using only clinical data, aiming to develop a cost-effective method, particularly for resource-limited settings lacking neuroimaging facilities. A dataset from 2,190 stroke patients with 79 clinical attributes has been collected in-house and used for the development of the proposed ML-framework. The framework robustly addresses missing data through Multiple Imputation by Chained Equation (MICE), ensuring it to function robustly even when some laboratory test results are unavailable. Further, the research addresses target leakage through statistical tests and utilizes SHAP-analysis to identify the most important attributes for classification. The proposed framework achieves 82.42% weighted accuracy, 82.33% accuracy, 82.19% sensitivity, 82.65% specificity, and an 86.68% F1-score. Notably, with only the 19 most significant attributes identified via SHAP, the framework maintains a weighted accuracy of 82.20%. Prospective validation on an independent dataset demonstrates a 16.42% improvement over the best-performing clinical score, Siriraj. The proposed ML-framework may help reduce the time to treatment for patients in resource-limited settings by enabling prompt primary care and timely referral to stroke-ready facilities.
ISSN:20452322
DOI:10.1038/s41598-025-16660-8