CDC: A Simple Framework for Complex Data Clustering

In today's digital era driven by data, the amount and complexity of the collected data, such as multiview, non-Euclidean, and multirelational, are growing exponentially or even faster. Clustering, which unsupervisedly extracts valid knowledge from data, is extremely useful in practice. However,...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE transaction on neural networks and learning systems Ročník 36; číslo 7; s. 13177 - 13188
Hlavní autori:	Kang, Zhao, Xie, Xuanting, Li, Bingheng, Pan, Erlin
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	United States IEEE 01.07.2025
Predmet:	Anchor graph Clustering methods Complexity theory Laplace equations large-scale data Learning systems multiview learning Noise Scalability Topology topology structure Training Uncertainty Vectors
ISSN:	2162-237X, 2162-2388, 2162-2388
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	In today's digital era driven by data, the amount and complexity of the collected data, such as multiview, non-Euclidean, and multirelational, are growing exponentially or even faster. Clustering, which unsupervisedly extracts valid knowledge from data, is extremely useful in practice. However, existing methods are independently developed to handle one particular challenge at the expense of the others. In this work, we propose a simple but effective framework for complex data clustering (CDC) that can efficiently process different types of data with linear complexity. We first use graph filtering (GF) to fuse geometric structure and attribute information. We then reduce complexity with high-quality anchors that are adaptively learned via a novel similarity-preserving (SP) regularizer. We illustrate the cluster-ability of our proposed method theoretically and experimentally. In particular, we deploy CDC to graph data of size 111 M.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2024.3473618