NaGB-DBSCAN: An improved DBSCAN clustering algorithm by natural neighbor and granular-ball

DBSCAN is a robust density-based clustering algorithm, which performs well in handling noisy and irregular datasets. However, it relies on the setting of two parameters (ϵ and m), and parameter adjustment is rather troublesome. Moreover, it needs to scan all data points one by one, resulting in a ti...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Information sciences Ročník 719; s. 122445
Hlavní autoři: Luo, Ranliang, Li, Tianshuo, Pu, Rui, Yang, Juntao, Tang, Dongming, Yang, Lijun
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Inc 01.11.2025
Témata:
ISSN:0020-0255
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:DBSCAN is a robust density-based clustering algorithm, which performs well in handling noisy and irregular datasets. However, it relies on the setting of two parameters (ϵ and m), and parameter adjustment is rather troublesome. Moreover, it needs to scan all data points one by one, resulting in a time complexity of O(N2). To solve these problems, we propose an improved DBSCAN algorithm, which combines Natural Neighbor and Granular-Ball, named NaGB-DBSCAN. Our method has the following several advantages: (1) It requires only a single parameter, which makes it significantly easier to set up. (2) It reduces the data processing workload by combining natural neighbor with granular-ball to cover the original dataset, which lowers the time complexity to O(NlogN). (3) It effectively handles datasets with heterogeneous density and weak connections, enhancing clustering efficiency and quality by optimizing the process and accurately identifying distinct cluster structures. Finally, we validate the effectiveness of our method on 17 synthetic datasets, 14 real datasets, and 1 immune cell dataset. The results show that NaGB-DBSCAN ranks first in average Purity and NMI scores, with both exceeding 90% on the immune cell dataset. Furthermore, pairwise t-tests hypothesis experiment confirm the statistical significance of these results. •We propose a DBSCAN variant based on Natural Neighbor and Granular-Ball.•Replaces ϵ and m with threshold Rt to reduce parameter dependence.•Adaptive granular-ball generation is achieved using the Natural Neighbor.•Effectively handles datasets with complex shapes and varying densities.
ISSN:0020-0255
DOI:10.1016/j.ins.2025.122445