Comprehensive survey on hierarchical clustering algorithms and the recent developments

Data clustering is a commonly used data processing technique in many fields, which divides objects into different clusters in terms of some similarity measure between data points. Comparing to partitioning clustering methods which give a flat partition of the data, hierarchical clustering methods ca...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:The Artificial intelligence review Ročník 56; číslo 8; s. 8219 - 8264
Hlavní autoři: Ran, Xingcheng, Xi, Yue, Lu, Yonggang, Wang, Xiangwen, Lu, Zhenyu
Médium: Journal Article
Jazyk:angličtina
Vydáno: Dordrecht Springer Netherlands 01.08.2023
Springer
Springer Nature B.V
Témata:
ISSN:0269-2821, 1573-7462
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Data clustering is a commonly used data processing technique in many fields, which divides objects into different clusters in terms of some similarity measure between data points. Comparing to partitioning clustering methods which give a flat partition of the data, hierarchical clustering methods can give multiple consistent partitions of the data at different levels for the same data without rerunning clustering, it can be used to better analyze the complex structure of the data. There are usually two kinds of hierarchical clustering methods: divisive and agglomerative. For the divisive clustering, the key issue is how to select a cluster for the next splitting procedure according to dissimilarity and how to divide the selected cluster. For agglomerative hierarchical clustering, the key issue is the similarity measure that is used to select the two most similar clusters for the next merge. Although both types of the methods produce the dendrogram of the data as output, the clustering results may be very different depending on the dissimilarity or similarity measure used in the clustering, and different types of methods should be selected according to different types of the data and different application scenarios. So, we have reviewed various hierarchical clustering methods comprehensively, especially the most recently developed methods, in this work. The similarity measure plays a crucial role during hierarchical clustering process, we have reviewed different types of the similarity measure along with the hierarchical clustering. More specifically, different types of hierarchical clustering methods are comprehensively reviewed from six aspects, and their advantages and drawbacks are analyzed. The application of some methods in real life is also discussed. Furthermore, we have also included some recent works in combining deep learning techniques and hierarchical clustering, which is worth serious attention and may improve the hierarchical clustering significantly in the future.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0269-2821
1573-7462
DOI:10.1007/s10462-022-10366-3