A structure noise-aware tensor dictionary learning method for high-dimensional data clustering
With the development of data acquisition technology, high-dimensional data clustering is an important yet challenging task in data mining. Despite advances achieved by current clustering methods, they can be further improved. First, many of them usually unfold the high-dimensional data into a large...
Gespeichert in:
| Veröffentlicht in: | Information sciences Jg. 612; S. 87 - 106 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Elsevier Inc
01.10.2022
|
| Schlagworte: | |
| ISSN: | 0020-0255, 1872-6291 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | With the development of data acquisition technology, high-dimensional data clustering is an important yet challenging task in data mining. Despite advances achieved by current clustering methods, they can be further improved. First, many of them usually unfold the high-dimensional data into a large matrix, consequently resulting in destroying the intrinsic structural property. Second, some methods assume that the noise in the dataset conforms to a predefined distribution (e.g., the Gaussian or Laplacian distribution), which violates real-world applications and eventually decreases the clustering performance. To address these issues, in this paper, we propose a novel tensor dictionary learning method for clustering high-dimensional data with the coexistence of structure noise. We adopt tensors, the natural and powerful tools for the generalizations of vectors and matrices, to characterize high-dimensional data. Meanwhile, to depict the noise accurately, we decompose the observed data into clean data, structure noise, and Gaussian noise. Furthermore, we use low-rank tensor modeling to characterize the inherent correlations of clean data and adopt tensor dictionary learning to adaptively and accurately describe the structure noise instead of using the predefined distribution. We design the proximal alternating minimization algorithm to solve the proposed model with the theoretical convergence guarantee. Experimental results on both simulated and real datasets show that the proposed method outperforms the compared methods for high-dimensional data clustering. |
|---|---|
| ISSN: | 0020-0255 1872-6291 |
| DOI: | 10.1016/j.ins.2022.08.081 |