Machine learning-based models for screening of anemia and leukemia using features of complete blood count reports

Complete blood count (CBC) report features are routinely used to screen a wide array of hematological disorders. However, the complexity of disease overlap increases the probability of neglecting the underlying patterns between these features, and the heterogeneity associated with the subjective ass...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Scientific reports Ročník 15; číslo 1; s. 33333 - 14
Hlavní autoři:	Amjad, Hafsa, Hussain, Zamir, Hasan, Mahnoor, Ul Hassan, Mahmood
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	London Nature Publishing Group UK 29.09.2025 Nature Publishing Group Nature Portfolio
Témata:	631/67 631/67/2322 Adult Age groups Aged Algorithms Anemia Anemia - blood Anemia - diagnosis Artificial intelligence Blood Blood Cell Count - methods Blood diseases Blood tests CBC reports Clinical decision support Data collection Datasets Feature selection Female Females Gender Hematological diseases Hematology Hemoglobin Heterogeneity Humanities and Social Sciences Humans Laboratories Learning algorithms Leukemia Leukemia - blood Leukemia - diagnosis Leukocytes Machine Learning Male Medical personnel Medical research Middle Aged multidisciplinary Neutrophils Performance evaluation Science Science (multidisciplinary) Statistics Support Vector Machine Clinical decision support Anemia CBC reports Leukemia Machine learning
ISSN:	2045-2322, 2045-2322
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Complete blood count (CBC) report features are routinely used to screen a wide array of hematological disorders. However, the complexity of disease overlap increases the probability of neglecting the underlying patterns between these features, and the heterogeneity associated with the subjective assessment of CBC reports often lead to random clinical testing. Such disease prediction analyses can be enhanced by the incorporation of machine learning (ML) algorithms for efficient handling of CBC features. Hybrid synthetic data are generated based on the statistical distribution of features to overcome the constraint of small sample size ( N = 287). To the extent of our knowledge, our study is the first to employ hybrid synthetic data for modeling hematological parameters. Six ML models i.e., decision tree, random forest, support vector machine, logistic regression, gradient boosting machine, and multilayer perceptron are tested for disease prediction. This research presents ML-based models for the screening of two common blood disorders – anemia and leukemia, using CBC report features. A ‘fingerprint’ of 14 out of 21 features based on both statistical and clinical relevance is selected for model development. Exceptional performance has been observed by the random forest algorithm with 98% accuracy and 97, 98, 99, and 2% macro-averages of precision, recall, specificity, and miss-rate respectively for all classes. However, external validation of the model reveal poor generalizability on a different demographic dataset, as the model obtained an accuracy of 74%. The proposed methodology may serve as an efficient support system for the screening of anemia and leukemia. However, extensive optimization with regards to its generalizability are warranted.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-025-21279-w