Hierarchical clustering algorithm based on Crystallized neighborhood graph for identifying complex structured datasets

In data mining, the neighborhood graph is an important method for describing the distribution of datasets. However, existing neighborhood graph methods are often sensitive to parameters settings and the presence of outliers. These traditional neighborhood graphs typically necessitate one or more inp...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Expert systems with applications Ročník 265; s. 125714
Hlavní autori: Chen, Zhongshang, Feng, Ji, Yang, Degang, Cai, Fapeng
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier Ltd 15.03.2025
Predmet:
ISSN:0957-4174
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:In data mining, the neighborhood graph is an important method for describing the distribution of datasets. However, existing neighborhood graph methods are often sensitive to parameters settings and the presence of outliers. These traditional neighborhood graphs typically necessitate one or more input parameters, and they may not function optimally when applied to complex datasets that include a substantial number of noise points. To overcome these drawbacks, we have received inspiration from crude salt purification, and propose a non-parameter neighborhood graph method named the Crystallized neighborhood graph (CNG). This method can adaptively capture the distribution structure of complex structured datasets. Based on the CNG, we propose a Hierarchical clustering algorithm based on the Crystallized neighborhood graph for identifying complex structured datasets (HCCNG). It redefines the similarity between sub-graphs using the bridges between sub-graphs and the shortest distance between them. Then, sub-graphs are repeatedly merged according to their similarity until the ideal clustering result is achieved. The experimental results show that the HCCNG algorithm can identify not only popular clusters, but also variable-density spherical clusters. Moreover, it performs well on complex structured datasets with a significant amount of noise. •A new neighborhood graph is used to identify impurity points.•A new non-parameter neighborhood graph.•A new method for measuring the similarity between sub-graphs•The method can handle datasets with different densities and shapes.•The method only requires one integer parameter, easy to use.
ISSN:0957-4174
DOI:10.1016/j.eswa.2024.125714