A structure noise-aware tensor dictionary learning method for high-dimensional data clustering

With the development of data acquisition technology, high-dimensional data clustering is an important yet challenging task in data mining. Despite advances achieved by current clustering methods, they can be further improved. First, many of them usually unfold the high-dimensional data into a large...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information sciences Jg. 612; S. 87 - 106
Hauptverfasser:	Yang, Jing-Hua, Chen, Chuan, Dai, Hong-Ning, Fu, Le-Le, Zheng, Zibin
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier Inc 01.10.2022
Schlagworte:	High-dimensional data clustering Proximal alternating minimization Structural sparsity Structure noise Tensor dictionary learning Tensor low-rank representation Tensor dictionary learning High-dimensional data clustering Proximal alternating minimization Structure noise Tensor low-rank representation Structural sparsity
ISSN:	0020-0255, 1872-6291
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	With the development of data acquisition technology, high-dimensional data clustering is an important yet challenging task in data mining. Despite advances achieved by current clustering methods, they can be further improved. First, many of them usually unfold the high-dimensional data into a large matrix, consequently resulting in destroying the intrinsic structural property. Second, some methods assume that the noise in the dataset conforms to a predefined distribution (e.g., the Gaussian or Laplacian distribution), which violates real-world applications and eventually decreases the clustering performance. To address these issues, in this paper, we propose a novel tensor dictionary learning method for clustering high-dimensional data with the coexistence of structure noise. We adopt tensors, the natural and powerful tools for the generalizations of vectors and matrices, to characterize high-dimensional data. Meanwhile, to depict the noise accurately, we decompose the observed data into clean data, structure noise, and Gaussian noise. Furthermore, we use low-rank tensor modeling to characterize the inherent correlations of clean data and adopt tensor dictionary learning to adaptively and accurately describe the structure noise instead of using the predefined distribution. We design the proximal alternating minimization algorithm to solve the proposed model with the theoretical convergence guarantee. Experimental results on both simulated and real datasets show that the proposed method outperforms the compared methods for high-dimensional data clustering.
ISSN:	0020-0255 1872-6291
DOI:	10.1016/j.ins.2022.08.081