Hybrid stacked sparse autoencoder for robust feature extraction and classification in sparse data across multiple domains

Tabular data is the most used data format in applied mathematics, cybersecurity, finance, and healthcare, and it presents distinct issues due to its intrinsic sparsity, with the majority of values being zero. These factors inhibit effective feature selection and reduce prediction accuracy. The Stack...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Machine learning with applications Jg. 22; S. 100764
Hauptverfasser: Abdussamad, Abdulkadir, Said Jadid, Daud, Hanita, Sokkalingam, Rajalingam, Khan, Iliyas Karim
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 01.12.2025
Schlagworte:
ISSN:2666-8270, 2666-8270
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Tabular data is the most used data format in applied mathematics, cybersecurity, finance, and healthcare, and it presents distinct issues due to its intrinsic sparsity, with the majority of values being zero. These factors inhibit effective feature selection and reduce prediction accuracy. The Stacked Sparse Autoencoder (SSAE) model has shown great promise for feature selection in the prediction challenge. However, SSAE struggles to extract meaningful features for sparse data prediction and requires an additional machine learning classifier on the latent space for accurate predictions, thereby increasing the computational complexity. This paper presents a Hybrid-Stacked Sparse Autoencoder (HSSAE) algorithm, with a custom hybrid loss function α(L1)+(1−α)L2 with binary cross-entropy to address these limitations. The proposed algorithm offers a unified framework that seamlessly integrates feature selection and prediction tasks in sparse data to improve feature extraction and reduce the computational complexity of sparse data. Three datasets, with sparsity levels of 43%, 53.32%, and 74.41%, were used in experiments to assess the performance of the HSSAE algorithm. Analyzed using several criteria, the HSSAE algorithm was shown to be much better than conventional SSAE latent space paired with machine learning classifiers such as Logistic Regression (LR), Support Vector Machine (SVM), XGBoost, and AdaBoost. Furthermore, HSSAE also surpasses deep learning algorithms, including Convolutional Neural Networks (CNN), Multilayer Perceptron Networks (MLP), and Recurrent Neural Networks (RNN), establishing its superiority in handling sparse data prediction tasks. The ability of the proposed HSSAE algorithm to generate effective feature selection makes the model robust and suitable for any sparse data applications, especially for sensitive applications such as healthcare and cybersecurity, which require high accuracy in prediction.
ISSN:2666-8270
2666-8270
DOI:10.1016/j.mlwa.2025.100764