Hierarchical clustering algorithm based on Crystallized neighborhood graph for identifying complex structured datasets

In data mining, the neighborhood graph is an important method for describing the distribution of datasets. However, existing neighborhood graph methods are often sensitive to parameters settings and the presence of outliers. These traditional neighborhood graphs typically necessitate one or more inp...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications Vol. 265; p. 125714
Main Authors: Chen, Zhongshang, Feng, Ji, Yang, Degang, Cai, Fapeng
Format: Journal Article
Language:English
Published: Elsevier Ltd 15.03.2025
Subjects:
ISSN:0957-4174
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In data mining, the neighborhood graph is an important method for describing the distribution of datasets. However, existing neighborhood graph methods are often sensitive to parameters settings and the presence of outliers. These traditional neighborhood graphs typically necessitate one or more input parameters, and they may not function optimally when applied to complex datasets that include a substantial number of noise points. To overcome these drawbacks, we have received inspiration from crude salt purification, and propose a non-parameter neighborhood graph method named the Crystallized neighborhood graph (CNG). This method can adaptively capture the distribution structure of complex structured datasets. Based on the CNG, we propose a Hierarchical clustering algorithm based on the Crystallized neighborhood graph for identifying complex structured datasets (HCCNG). It redefines the similarity between sub-graphs using the bridges between sub-graphs and the shortest distance between them. Then, sub-graphs are repeatedly merged according to their similarity until the ideal clustering result is achieved. The experimental results show that the HCCNG algorithm can identify not only popular clusters, but also variable-density spherical clusters. Moreover, it performs well on complex structured datasets with a significant amount of noise. •A new neighborhood graph is used to identify impurity points.•A new non-parameter neighborhood graph.•A new method for measuring the similarity between sub-graphs•The method can handle datasets with different densities and shapes.•The method only requires one integer parameter, easy to use.
ISSN:0957-4174
DOI:10.1016/j.eswa.2024.125714