Elastic deep autoencoder for text embedding clustering by an improved graph regularization

Text clustering is a task for grouping extracted information of the text in different clusters, which has many applications in recommender systems, sentiment analysis, and more. Deep learning-based methods have become increasingly popular due to their high accuracy in identifying nonlinear structure...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications Jg. 238; S. 121780
Hauptverfasser:	Daneshfar, Fatemeh, Soleymanbaigi, Sayvan, Nafisi, Ali, Yamini, Pedram
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier Ltd 15.03.2024
Schlagworte:	Deep autoencoder Graph regularization Text clustering Text embedding Text clustering Graph regularization Text embedding Deep autoencoder
ISSN:	0957-4174, 1873-6793
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Text clustering is a task for grouping extracted information of the text in different clusters, which has many applications in recommender systems, sentiment analysis, and more. Deep learning-based methods have become increasingly popular due to their high accuracy in identifying nonlinear structures. They usually consist of two major parts: dimensionality reduction and clustering. Autoencoders are simple unsupervised neural networks used for better representation of low-dimensional data and have shown good performance in dealing with non-linear features. However, while they utilize the Frobenius norm to deal well with Gaussian noise, they are sensitive to outlier data and Laplacian noise. In this paper, a deep autoencoder with an adapted elastic loss for text embedding clustering (EDA-TEC) is proposed. The elastic loss is a combination of the Frobenius norm and L2,1-norm to consider both types of noises. Additionally, to maintain the high-dimensional data geometric structure, a modified graph regularization term based on the weighted cosine similarity measure is used. EDA-TEC also improves clustering results by considering the sparsity regularization of the manifold representation data. In this jointly end-to-end deep learning model, better representation and text clustering results are achieved with high accuracy on common datasets compared to existing methods.11https://github.com/safinal/text-embedding-clustering.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2023.121780