Elastic deep autoencoder for text embedding clustering by an improved graph regularization

Text clustering is a task for grouping extracted information of the text in different clusters, which has many applications in recommender systems, sentiment analysis, and more. Deep learning-based methods have become increasingly popular due to their high accuracy in identifying nonlinear structure...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications Jg. 238; S. 121780
Hauptverfasser: Daneshfar, Fatemeh, Soleymanbaigi, Sayvan, Nafisi, Ali, Yamini, Pedram
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 15.03.2024
Schlagworte:
ISSN:0957-4174, 1873-6793
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Text clustering is a task for grouping extracted information of the text in different clusters, which has many applications in recommender systems, sentiment analysis, and more. Deep learning-based methods have become increasingly popular due to their high accuracy in identifying nonlinear structures. They usually consist of two major parts: dimensionality reduction and clustering. Autoencoders are simple unsupervised neural networks used for better representation of low-dimensional data and have shown good performance in dealing with non-linear features. However, while they utilize the Frobenius norm to deal well with Gaussian noise, they are sensitive to outlier data and Laplacian noise. In this paper, a deep autoencoder with an adapted elastic loss for text embedding clustering (EDA-TEC) is proposed. The elastic loss is a combination of the Frobenius norm and L2,1-norm to consider both types of noises. Additionally, to maintain the high-dimensional data geometric structure, a modified graph regularization term based on the weighted cosine similarity measure is used. EDA-TEC also improves clustering results by considering the sparsity regularization of the manifold representation data. In this jointly end-to-end deep learning model, better representation and text clustering results are achieved with high accuracy on common datasets compared to existing methods.11https://github.com/safinal/text-embedding-clustering.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.121780