Hybrid stacked sparse autoencoder for robust feature extraction and classification in sparse data across multiple domains

Tabular data is the most used data format in applied mathematics, cybersecurity, finance, and healthcare, and it presents distinct issues due to its intrinsic sparsity, with the majority of values being zero. These factors inhibit effective feature selection and reduce prediction accuracy. The Stack...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Machine learning with applications Ročník 22; s. 100764
Hlavní autoři: Abdussamad, Abdulkadir, Said Jadid, Daud, Hanita, Sokkalingam, Rajalingam, Khan, Iliyas Karim
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.12.2025
Témata:
ISSN:2666-8270, 2666-8270
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Tabular data is the most used data format in applied mathematics, cybersecurity, finance, and healthcare, and it presents distinct issues due to its intrinsic sparsity, with the majority of values being zero. These factors inhibit effective feature selection and reduce prediction accuracy. The Stacked Sparse Autoencoder (SSAE) model has shown great promise for feature selection in the prediction challenge. However, SSAE struggles to extract meaningful features for sparse data prediction and requires an additional machine learning classifier on the latent space for accurate predictions, thereby increasing the computational complexity. This paper presents a Hybrid-Stacked Sparse Autoencoder (HSSAE) algorithm, with a custom hybrid loss function α(L1)+(1−α)L2 with binary cross-entropy to address these limitations. The proposed algorithm offers a unified framework that seamlessly integrates feature selection and prediction tasks in sparse data to improve feature extraction and reduce the computational complexity of sparse data. Three datasets, with sparsity levels of 43%, 53.32%, and 74.41%, were used in experiments to assess the performance of the HSSAE algorithm. Analyzed using several criteria, the HSSAE algorithm was shown to be much better than conventional SSAE latent space paired with machine learning classifiers such as Logistic Regression (LR), Support Vector Machine (SVM), XGBoost, and AdaBoost. Furthermore, HSSAE also surpasses deep learning algorithms, including Convolutional Neural Networks (CNN), Multilayer Perceptron Networks (MLP), and Recurrent Neural Networks (RNN), establishing its superiority in handling sparse data prediction tasks. The ability of the proposed HSSAE algorithm to generate effective feature selection makes the model robust and suitable for any sparse data applications, especially for sensitive applications such as healthcare and cybersecurity, which require high accuracy in prediction.
ISSN:2666-8270
2666-8270
DOI:10.1016/j.mlwa.2025.100764