A multiscale environment for learning by diffusion

•We introduce the MELD data model: a diffusion framework for multiscale clustering.•We show how cluster coherence and separation interact with diffusion in MELD clusterings.•We introduce the M-LUND multiscale clustering algorithm and guarantee its performance.•We guarantee that M-LUND recovers the M...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Applied and computational harmonic analysis Ročník 57; s. 58 - 100
Hlavní autoři:	Murphy, James M., Polk, Sam L.
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier Inc 01.03.2022
Témata:	Clustering Diffusion geometry Hierarchical clustering Machine learning Spectral graph theory Clustering Hierarchical clustering Spectral graph theory Diffusion geometry Machine learning
ISSN:	1063-5203
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	•We introduce the MELD data model: a diffusion framework for multiscale clustering.•We show how cluster coherence and separation interact with diffusion in MELD clusterings.•We introduce the M-LUND multiscale clustering algorithm and guarantee its performance.•We guarantee that M-LUND recovers the MELD data model from many datasets.•We show M-LUND extracts latent multiscale structure in synthetic and real datasets. Clustering algorithms partition a dataset into groups of similar points. The clustering problem is very general, and different partitions of the same dataset could be considered correct and useful. To fully understand such data, it must be considered at a variety of scales, ranging from coarse to fine. We introduce the Multiscale Environment for Learning by Diffusion (MELD) data model, which is a family of clusterings parameterized by nonlinear diffusion on the dataset. We show that the MELD data model precisely captures latent multiscale structure in data and facilitates its analysis. To efficiently learn the multiscale structure observed in many real datasets, we introduce the Multiscale Learning by Unsupervised Nonlinear Diffusion (M-LUND) clustering algorithm, which is derived from a diffusion process at a range of temporal scales. We provide theoretical guarantees for the algorithm's performance and establish its computational efficiency. Finally, we show that the M-LUND clustering algorithm detects the latent structure in a range of synthetic and real datasets.
ISSN:	1063-5203
DOI:	10.1016/j.acha.2021.11.004