A deep learning framework with hybrid stacked sparse autoencoder for type 2 diabetes prediction
Sparse numerical datasets are dominant in fields such as applied mathematics, astronomy, finance, and healthcare, presenting challenges due to their high dimensionality and sparse distribution. The predominance of zero values complicates optimal feature selection, making data analysis and model perf...
Gespeichert in:
| Veröffentlicht in: | Scientific reports Jg. 15; H. 1; S. 36678 - 22 |
|---|---|
| Hauptverfasser: | , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
London
Nature Publishing Group UK
21.10.2025
Nature Publishing Group Nature Portfolio |
| Schlagworte: | |
| ISSN: | 2045-2322, 2045-2322 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Sparse numerical datasets are dominant in fields such as applied mathematics, astronomy, finance, and healthcare, presenting challenges due to their high dimensionality and sparse distribution. The predominance of zero values complicates optimal feature selection, making data analysis and model performance more complex. To overcome this challenge, this study introduces a deep learning-based algorithm, Hybrid Stacked Sparse Autoencoder (HSSAE), which integrates
and
regularization with binary cross-entropy loss to improve feature selection efficiency, where
regularization penalizes large weights, simplifying data representations, while
regularization prevents overfitting by limiting the total weight size. Additionally, the dropout technique enhances the algorithm’s performance by randomly deactivating neurons during training, avoiding over-reliance on specific features. Meanwhile, batch normalization stabilizes weight distributions, reducing computational complexity and accelerating the convergence. The proposed algorithm, HSSAE, was evaluated against traditional classifiers, including Decision Tree, Random Forest, K-Nearest Neighbors, and Naïve Bayes, as well as deep learning-based models, such as Convolutional Neural Network, Long Short-Term Memory, and Stacked Sparse Autoencoder, in terms of Precision, Recall, Accuracy, F1-score, AUC, and Hamming Loss. Quantitatively, the proposed algorithm, HSSAE, was tested on two different sparse datasets, demonstrating superior performance with the highest accuracy of 89% on the health indicator dataset and 93% on the EHRs diabetes prediction dataset, respectively, and outperforming competing classifiers. The proposed algorithm, HSSAE, extracts features effectively and enhances robustness, making it well-suited for sparse data applications, particularly in healthcare, where high prediction accuracy is crucial. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 2045-2322 2045-2322 |
| DOI: | 10.1038/s41598-025-20534-4 |