Deep embedding clustering based on contractive autoencoder

Clustering large and high-dimensional document data has got a great interest. However, current clustering algorithms lack efficient representation learning. Implementing deep learning techniques in document clustering can strengthen the learning processes. In this work, we simultaneously disentangle...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neurocomputing (Amsterdam) Jg. 433; S. 96 - 107
Hauptverfasser: Diallo, Bassoma, Hu, Jie, Li, Tianrui, Khan, Ghufran Ahmad, Liang, Xinyan, Zhao, Yimiao
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 14.04.2021
Schlagworte:
ISSN:0925-2312
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Clustering large and high-dimensional document data has got a great interest. However, current clustering algorithms lack efficient representation learning. Implementing deep learning techniques in document clustering can strengthen the learning processes. In this work, we simultaneously disentangle the problem of learned representation by preserving important information from the initial data while pushing the original samples and their augmentations together in one hand. Furthermore, we handle the cluster locality preservation issue by pushing neighboring data points together. To that end, we first introduce Contractive Autoencoders. Then we propose a deep embedding clustering framework based on contractive autoencoder (DECCA) to learn document representations. Furthermore, to grasp relevant document or word features, we append the Frobenius norm as penalty term to the conventional autoencoder framework, which helps the autoencoder to perform better. In this way, the contractive autoencoders apprehend the local manifold structure of the input data and compete with the representations learned by existing methods. Finally, we confirm the supremacy of our proposed algorithm over the state-of-the-art results on six real-world images and text datasets.
ISSN:0925-2312
DOI:10.1016/j.neucom.2020.12.094