Hierarchical clustering algorithm based on Crystallized neighborhood graph for identifying complex structured datasets

In data mining, the neighborhood graph is an important method for describing the distribution of datasets. However, existing neighborhood graph methods are often sensitive to parameters settings and the presence of outliers. These traditional neighborhood graphs typically necessitate one or more inp...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Expert systems with applications Ročník 265; s. 125714
Hlavní autoři: Chen, Zhongshang, Feng, Ji, Yang, Degang, Cai, Fapeng
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 15.03.2025
Témata:
ISSN:0957-4174
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In data mining, the neighborhood graph is an important method for describing the distribution of datasets. However, existing neighborhood graph methods are often sensitive to parameters settings and the presence of outliers. These traditional neighborhood graphs typically necessitate one or more input parameters, and they may not function optimally when applied to complex datasets that include a substantial number of noise points. To overcome these drawbacks, we have received inspiration from crude salt purification, and propose a non-parameter neighborhood graph method named the Crystallized neighborhood graph (CNG). This method can adaptively capture the distribution structure of complex structured datasets. Based on the CNG, we propose a Hierarchical clustering algorithm based on the Crystallized neighborhood graph for identifying complex structured datasets (HCCNG). It redefines the similarity between sub-graphs using the bridges between sub-graphs and the shortest distance between them. Then, sub-graphs are repeatedly merged according to their similarity until the ideal clustering result is achieved. The experimental results show that the HCCNG algorithm can identify not only popular clusters, but also variable-density spherical clusters. Moreover, it performs well on complex structured datasets with a significant amount of noise. •A new neighborhood graph is used to identify impurity points.•A new non-parameter neighborhood graph.•A new method for measuring the similarity between sub-graphs•The method can handle datasets with different densities and shapes.•The method only requires one integer parameter, easy to use.
ISSN:0957-4174
DOI:10.1016/j.eswa.2024.125714