A deep learning framework with hybrid stacked sparse autoencoder for type 2 diabetes prediction

Sparse numerical datasets are dominant in fields such as applied mathematics, astronomy, finance, and healthcare, presenting challenges due to their high dimensionality and sparse distribution. The predominance of zero values complicates optimal feature selection, making data analysis and model perf...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Scientific reports Ročník 15; číslo 1; s. 36678 - 22
Hlavní autoři: Abdussamad, Daud, Hanita, Sokkalingam, Rajalingam, Zubair, Muhammad, Khan, Iliyas Karim, Mahmood, Zafar
Médium: Journal Article
Jazyk:angličtina
Vydáno: London Nature Publishing Group UK 21.10.2025
Nature Publishing Group
Nature Portfolio
Témata:
ISSN:2045-2322, 2045-2322
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Sparse numerical datasets are dominant in fields such as applied mathematics, astronomy, finance, and healthcare, presenting challenges due to their high dimensionality and sparse distribution. The predominance of zero values complicates optimal feature selection, making data analysis and model performance more complex. To overcome this challenge, this study introduces a deep learning-based algorithm, Hybrid Stacked Sparse Autoencoder (HSSAE), which integrates and regularization with binary cross-entropy loss to improve feature selection efficiency, where regularization penalizes large weights, simplifying data representations, while regularization prevents overfitting by limiting the total weight size. Additionally, the dropout technique enhances the algorithm’s performance by randomly deactivating neurons during training, avoiding over-reliance on specific features. Meanwhile, batch normalization stabilizes weight distributions, reducing computational complexity and accelerating the convergence. The proposed algorithm, HSSAE, was evaluated against traditional classifiers, including Decision Tree, Random Forest, K-Nearest Neighbors, and Naïve Bayes, as well as deep learning-based models, such as Convolutional Neural Network, Long Short-Term Memory, and Stacked Sparse Autoencoder, in terms of Precision, Recall, Accuracy, F1-score, AUC, and Hamming Loss. Quantitatively, the proposed algorithm, HSSAE, was tested on two different sparse datasets, demonstrating superior performance with the highest accuracy of 89% on the health indicator dataset and 93% on the EHRs diabetes prediction dataset, respectively, and outperforming competing classifiers. The proposed algorithm, HSSAE, extracts features effectively and enhances robustness, making it well-suited for sparse data applications, particularly in healthcare, where high prediction accuracy is crucial.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-025-20534-4