DBHC: A DBSCAN-based hierarchical clustering algorithm

Clustering is the process of partitioning objects of a dataset into some groups according to similarities and dissimilarities between its objects. DBSCAN is one of the most important clustering algorithms in the density based approach of clustering. In spite of the numerous advantages of the DBSCAN...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Data & knowledge engineering Ročník 135; s. 101922
Hlavní autoři: Latifi-Pakdehi, Alireza, Daneshpour, Negin
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.09.2021
Témata:
ISSN:0169-023X, 1872-6933
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Clustering is the process of partitioning objects of a dataset into some groups according to similarities and dissimilarities between its objects. DBSCAN is one of the most important clustering algorithms in the density based approach of clustering. In spite of the numerous advantages of the DBSCAN algorithm, it has two important input parameters, MinPts and Eps, which determining their values is still a great challenge. This problem arises because values of these parameters are heavily dependent on data distribution. To overcome this challenge, firstly features of these parameters are investigated and the data distribution are analyzed. Then a DBSCAN-based hierarchical clustering (DBHC) method is proposed in this paper in order to fix this challenge. For this purpose, DBHC first determines values of these parameters using the notion of k nearest neighbor and k-dist plot. Because most of the real world data is not distributed uniformly, it is needed to be produced several values for the Eps parameter. Then, DBHC executes the DBSCAN algorithm several times based on the number of Eps produced earlier. Finally, DBHC method merges obtained clusters if the number of produced clusters is larger than the number which has estimated by the user. To evaluate the performance of the DBHC method, several experiments were performed on some of benchmark datasets of UCI database. Obtained results were compared with other previous works. The obtained results consistently showed that the DBHC method led to better results in comparison to the other works.
ISSN:0169-023X
1872-6933
DOI:10.1016/j.datak.2021.101922