Revisiting agglomerative clustering

Hierarchical agglomerative methods stand out as particularly effective and popular approaches for clustering data. Yet, these methods have not been systematically compared regarding the important issue of false positives while searching for clusters. A model of clusters involving a higher density nu...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Physica A Ročník 585; s. 126433
Hlavní autoři: Tokuda, Eric K., Comin, Cesar H., Costa, Luciano da F.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.01.2022
Témata:
ISSN:0378-4371, 1873-2119
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Hierarchical agglomerative methods stand out as particularly effective and popular approaches for clustering data. Yet, these methods have not been systematically compared regarding the important issue of false positives while searching for clusters. A model of clusters involving a higher density nucleus surrounded by a transition, followed by outliers is adopted as a means to quantify the relevance of the obtained clusters and address the problem of false positives. Six traditional methodologies, namely the single, average, median, complete, centroid and Ward’s linkage criteria are compared with respect to the adopted model. Unimodal and bimodal datasets obeying uniform, gaussian, exponential and power-law distributions are considered for this comparison. The obtained results include the verification that many methods detect two clusters in unimodal data. The single-linkage method was found to be more resilient to false positives. Also, several methods detected clusters not corresponding directly to the nucleus. •Six classical agglomerative clustering methods are compared regarding false positives.•The single-linkage led to fewer false-positives in unimodal distributions.•The single-linkage yielded clusters corresponding more closely to the nuclei.
ISSN:0378-4371
1873-2119
DOI:10.1016/j.physa.2021.126433