DBHC: A DBSCAN-based hierarchical clustering algorithm
Clustering is the process of partitioning objects of a dataset into some groups according to similarities and dissimilarities between its objects. DBSCAN is one of the most important clustering algorithms in the density based approach of clustering. In spite of the numerous advantages of the DBSCAN...
Saved in:
| Published in: | Data & knowledge engineering Vol. 135; p. 101922 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
01.09.2021
|
| Subjects: | |
| ISSN: | 0169-023X, 1872-6933 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Clustering is the process of partitioning objects of a dataset into some groups according to similarities and dissimilarities between its objects. DBSCAN is one of the most important clustering algorithms in the density based approach of clustering. In spite of the numerous advantages of the DBSCAN algorithm, it has two important input parameters, MinPts and Eps, which determining their values is still a great challenge. This problem arises because values of these parameters are heavily dependent on data distribution. To overcome this challenge, firstly features of these parameters are investigated and the data distribution are analyzed. Then a DBSCAN-based hierarchical clustering (DBHC) method is proposed in this paper in order to fix this challenge. For this purpose, DBHC first determines values of these parameters using the notion of k nearest neighbor and k-dist plot. Because most of the real world data is not distributed uniformly, it is needed to be produced several values for the Eps parameter. Then, DBHC executes the DBSCAN algorithm several times based on the number of Eps produced earlier. Finally, DBHC method merges obtained clusters if the number of produced clusters is larger than the number which has estimated by the user. To evaluate the performance of the DBHC method, several experiments were performed on some of benchmark datasets of UCI database. Obtained results were compared with other previous works. The obtained results consistently showed that the DBHC method led to better results in comparison to the other works. |
|---|---|
| ISSN: | 0169-023X 1872-6933 |
| DOI: | 10.1016/j.datak.2021.101922 |