A clustering algorithm based on grids for core data and adjacency relationships for edge data

Grid-based clustering algorithms have become a crucial method in the field of data mining due to their efficiency. However, they face challenges such as parameter sensitivity, poor adaptability to density variations, and misclassification of edge data. To address these issues, existing research prim...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Scientific reports Ročník 15; číslo 1; s. 18390 - 36
Hlavný autor: He, Honglei
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: London Nature Publishing Group UK 26.05.2025
Nature Publishing Group
Nature Portfolio
Predmet:
ISSN:2045-2322, 2045-2322
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Grid-based clustering algorithms have become a crucial method in the field of data mining due to their efficiency. However, they face challenges such as parameter sensitivity, poor adaptability to density variations, and misclassification of edge data. To address these issues, existing research primarily focuses on three directions: (1) optimizing the adaptive selection of grid parameters, which struggles to handle variations in cluster density; (2) improving grid division methods (e.g., multi-granularity or dynamic grids), which have limited effectiveness on complex-shaped data; and (3) integrating other clustering techniques, which enhances accuracy but increases algorithmic complexity. This paper proposes a novel improved grid-based clustering algorithm that determines core grids based on data distribution uniformity rather than absolute density and introduces a clustering strategy for non-core grids based on adjacency relationships. This approach effectively identifies clusters with different densities and reduces dependency on initial parameters (density threshold R and grid partition parameters M ). The proposed algorithm integrates grid clustering, partitioning-based clustering, and grid splitting techniques. It employs a regional processing strategy—applying grid clustering to cluster core regions while combining grid and Partitioning techniques for edge regions—to enhance accuracy while maintaining efficiency. Experimental results demonstrate that the proposed algorithm outperforms six other benchmark algorithms on datasets with complex shapes and uneven densities, achieving a balance between efficiency and accuracy.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-025-00532-2