Hierarchical clustering algorithm based on Crystallized neighborhood graph for identifying complex structured datasets
In data mining, the neighborhood graph is an important method for describing the distribution of datasets. However, existing neighborhood graph methods are often sensitive to parameters settings and the presence of outliers. These traditional neighborhood graphs typically necessitate one or more inp...
Saved in:
| Published in: | Expert systems with applications Vol. 265; p. 125714 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
15.03.2025
|
| Subjects: | |
| ISSN: | 0957-4174 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In data mining, the neighborhood graph is an important method for describing the distribution of datasets. However, existing neighborhood graph methods are often sensitive to parameters settings and the presence of outliers. These traditional neighborhood graphs typically necessitate one or more input parameters, and they may not function optimally when applied to complex datasets that include a substantial number of noise points. To overcome these drawbacks, we have received inspiration from crude salt purification, and propose a non-parameter neighborhood graph method named the Crystallized neighborhood graph (CNG). This method can adaptively capture the distribution structure of complex structured datasets. Based on the CNG, we propose a Hierarchical clustering algorithm based on the Crystallized neighborhood graph for identifying complex structured datasets (HCCNG). It redefines the similarity between sub-graphs using the bridges between sub-graphs and the shortest distance between them. Then, sub-graphs are repeatedly merged according to their similarity until the ideal clustering result is achieved. The experimental results show that the HCCNG algorithm can identify not only popular clusters, but also variable-density spherical clusters. Moreover, it performs well on complex structured datasets with a significant amount of noise.
•A new neighborhood graph is used to identify impurity points.•A new non-parameter neighborhood graph.•A new method for measuring the similarity between sub-graphs•The method can handle datasets with different densities and shapes.•The method only requires one integer parameter, easy to use. |
|---|---|
| ISSN: | 0957-4174 |
| DOI: | 10.1016/j.eswa.2024.125714 |