Large-Scale Subspace Clustering by Independent Distributed and Parallel Coding

Subspace clustering is a popular method to discover underlying low-dimensional structures of high-dimensional multimedia data (e.g., images, videos, and texts). In this article, we consider a large-scale subspace clustering (LS 2 C) problem, that is, partitioning million data points with a millon di...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE transactions on cybernetics Ročník 52; číslo 9; s. 9090 - 9100
Hlavní autoři:	Li, Jun, Tao, Zhiqiang, Wu, Yue, Zhong, Bineng, Fu, Yun
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	United States IEEE 01.09.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Big Data Clustering Clustering methods Columns (structural) Data points Dictionaries Distributed and parallel computing Distributed databases least-squares regression (LSR) low-rank representation (LRR) Massive data points Matrix decomposition Multimedia Optimization over-high dimensional big data Regularization Sparse matrices sparse subspace clustering (SSC) subspace clustering Subspaces Video
ISSN:	2168-2267, 2168-2275, 2168-2275
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Subspace clustering is a popular method to discover underlying low-dimensional structures of high-dimensional multimedia data (e.g., images, videos, and texts). In this article, we consider a large-scale subspace clustering (LS 2 C) problem, that is, partitioning million data points with a millon dimensions. To address this, we explore an independent distributed and parallel framework by dividing big data/variable matrices and regularization by both columns and rows. Specifically, LS 2 C is independently decomposed into many subproblems by distributing those matrices into different machines by columns since the regularization of the code matrix is equal to a sum of that of its submatrices (e.g., square-of-Frobenius/<inline-formula> <tex-math notation="LaTeX">\ell _{1} </tex-math></inline-formula>-norm). Consensus optimization is designed to solve these subproblems in a parallel way for saving communication costs. Moreover, we provide theoretical guarantees that LS 2 C can recover consensus subspace representations of high-dimensional data points under broad conditions. Compared with the state-of-the-art LS 2 C methods, our approach achieves better clustering results in public datasets, including a million images and videos.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2168-2267 2168-2275 2168-2275
DOI:	10.1109/TCYB.2021.3052056