Distributed Center-Based Clustering: A Unified Framework

We develop a family of distributed center-based clustering algorithms that work over connected networks of users. In the proposed scenario, users contain a local dataset and communicate only with their immediate neighbours, with the aim of finding a clustering of the full, joint data. The proposed f...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on signal processing Ročník 73; s. 903 - 918
Hlavní autoři: Armacki, Aleksandar, Bajovic, Dragana, Jakovetic, Dusan, Kar, Soummya
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1053-587X, 1941-0476
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:We develop a family of distributed center-based clustering algorithms that work over connected networks of users. In the proposed scenario, users contain a local dataset and communicate only with their immediate neighbours, with the aim of finding a clustering of the full, joint data. The proposed family, termed Distributed Gradient Clustering (DGC-<inline-formula><tex-math notation="LaTeX">\mathcal{F}_{\rho}</tex-math></inline-formula>), is parametrized by <inline-formula><tex-math notation="LaTeX">\rho\geq 1</tex-math></inline-formula>, controlling the proximity of users' center estimates, with <inline-formula><tex-math notation="LaTeX">\mathcal{F}</tex-math></inline-formula> determining the clustering loss. Our framework allows for a broad class of smooth convex loss functions, including popular clustering losses like <inline-formula><tex-math notation="LaTeX">K</tex-math></inline-formula>-means and Huber loss. Specialized to <inline-formula><tex-math notation="LaTeX">K</tex-math></inline-formula>-means and Huber loss, DGC-<inline-formula><tex-math notation="LaTeX">\mathcal{F}_{\rho}</tex-math></inline-formula> gives rise to novel distributed clustering algorithms DGC-KM<inline-formula><tex-math notation="LaTeX">{}_{\rho}</tex-math></inline-formula> and DGC-HL<inline-formula><tex-math notation="LaTeX">{}_{\rho}</tex-math></inline-formula>, while novel clustering losses based on the logistic and fair loss lead to DGC-LL<inline-formula><tex-math notation="LaTeX">{}_{\rho}</tex-math></inline-formula> and DGC-FL<inline-formula><tex-math notation="LaTeX">{}_{\rho}</tex-math></inline-formula>. We provide a unified analysis and establish several strong results, under mild assumptions. First, the sequence of centers generated by the methods converges to a well-defined notion of fixed point, under any center initialization and value of <inline-formula><tex-math notation="LaTeX">\rho</tex-math></inline-formula>. Second, as <inline-formula><tex-math notation="LaTeX">\rho</tex-math></inline-formula> increases, the family of fixed points produced by DGC-<inline-formula><tex-math notation="LaTeX">\mathcal{F}_{\rho}</tex-math></inline-formula> converges to a notion of consensus fixed points. We show that consensus fixed points of DGC-<inline-formula><tex-math notation="LaTeX">\mathcal{F}_{\rho}</tex-math></inline-formula> are equivalent to fixed points of gradient clustering over the full data, guaranteeing a clustering of the full data is produced. For the special case of Bregman losses, we show that our fixed points converge to the set of Lloyd points. Numerical experiments on real data confirm our theoretical findings and demonstrate strong performance of the methods.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1053-587X
1941-0476
DOI:10.1109/TSP.2025.3531292