NaGB-DBSCAN: An improved DBSCAN clustering algorithm by natural neighbor and granular-ball

DBSCAN is a robust density-based clustering algorithm, which performs well in handling noisy and irregular datasets. However, it relies on the setting of two parameters (ϵ and m), and parameter adjustment is rather troublesome. Moreover, it needs to scan all data points one by one, resulting in a ti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information sciences Jg. 719; S. 122445
Hauptverfasser: Luo, Ranliang, Li, Tianshuo, Pu, Rui, Yang, Juntao, Tang, Dongming, Yang, Lijun
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Inc 01.11.2025
Schlagworte:
ISSN:0020-0255
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:DBSCAN is a robust density-based clustering algorithm, which performs well in handling noisy and irregular datasets. However, it relies on the setting of two parameters (ϵ and m), and parameter adjustment is rather troublesome. Moreover, it needs to scan all data points one by one, resulting in a time complexity of O(N2). To solve these problems, we propose an improved DBSCAN algorithm, which combines Natural Neighbor and Granular-Ball, named NaGB-DBSCAN. Our method has the following several advantages: (1) It requires only a single parameter, which makes it significantly easier to set up. (2) It reduces the data processing workload by combining natural neighbor with granular-ball to cover the original dataset, which lowers the time complexity to O(NlogN). (3) It effectively handles datasets with heterogeneous density and weak connections, enhancing clustering efficiency and quality by optimizing the process and accurately identifying distinct cluster structures. Finally, we validate the effectiveness of our method on 17 synthetic datasets, 14 real datasets, and 1 immune cell dataset. The results show that NaGB-DBSCAN ranks first in average Purity and NMI scores, with both exceeding 90% on the immune cell dataset. Furthermore, pairwise t-tests hypothesis experiment confirm the statistical significance of these results. •We propose a DBSCAN variant based on Natural Neighbor and Granular-Ball.•Replaces ϵ and m with threshold Rt to reduce parameter dependence.•Adaptive granular-ball generation is achieved using the Natural Neighbor.•Effectively handles datasets with complex shapes and varying densities.
ISSN:0020-0255
DOI:10.1016/j.ins.2025.122445