GriT-DBSCAN: A spatial clustering algorithm for very large databases
•A grid-based algorithm for exact DBSCAN is proposed for large databases.•Grid tree is devised to speed up non-empty neighboring grids queries.•Use the spatial relationships among points to omit unnecessary distance calculations.•The efficiency of the proposed algorithm is proved theoretically and e...
Uložené v:
| Vydané v: | Pattern recognition Ročník 142; s. 109658 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier Ltd
01.10.2023
|
| Predmet: | |
| ISSN: | 0031-3203, 1873-5142 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | •A grid-based algorithm for exact DBSCAN is proposed for large databases.•Grid tree is devised to speed up non-empty neighboring grids queries.•Use the spatial relationships among points to omit unnecessary distance calculations.•The efficiency of the proposed algorithm is proved theoretically and experimentally.
DBSCAN is a fundamental spatial clustering algorithm with numerous practical applications. However, a bottleneck of DBSCAN is its O(n2) worst-case time complexity. To address this limitation, we propose a new grid-based algorithm for exact DBSCAN in Euclidean space called GriT-DBSCAN, which is based on the following two techniques. First, we introduce grid tree to organize the non-empty grids for the purpose of efficient non-empty neighboring grids queries. Second, by utilizing the spatial relationships among points, we propose a technique that iteratively prunes unnecessary distance calculations when determining whether the minimum distance between two sets is less than or equal to a certain threshold. We theoretically demonstrate that GriT-DBSCAN has excellent reliability in terms of time complexity. In addition, we obtain two variants of GriT-DBSCAN by incorporating heuristics, or by combining the second technique with an existing algorithm. Experiments are conducted on both synthetic and real-world data sets to evaluate the efficiency of GriT-DBSCAN and its variants. The results show that our algorithms outperform existing algorithms. |
|---|---|
| ISSN: | 0031-3203 1873-5142 |
| DOI: | 10.1016/j.patcog.2023.109658 |