ParDP: A Parallel Density Peaks-Based Clustering Algorithm

Uloženo v:
Podrobná bibliografie
Název: ParDP: A Parallel Density Peaks-Based Clustering Algorithm
Autoři: Libero Nigro, Franco Cicirelli
Zdroj: Mathematics ; Volume 13 ; Issue 8 ; Pages: 1285
Informace o vydavateli: Multidisciplinary Digital Publishing Institute
Rok vydání: 2025
Sbírka: MDPI Open Access Publishing
Témata: unsupervised clustering, density peaks-based clustering, k-nearest neighbors, principal component analysis, clustering accuracy measures, parallel programming, Java, benchmark and real-world datasets
Popis: This paper proposes ParDP, an algorithm and concrete tool for unsupervised clustering, which belongs to the class of density peaks-based clustering methods. Such methods rely on the observation that cluster representative points (centroids) are points of higher local density surrounded by points of lesser density. Candidate centroids, though, are to be far from each other. A key factor of ParDP is adopting a k-Nearest Neighbors (kNN) technique for estimating the density of points. Complete clustering depends on densities and distances among points. ParDP uses principal component analysis to cope with high-dimensional data points. The current implementation relies on Java parallel streams and the built-in lock-free fork/join mechanism, enabling the exploitation of the computing power of commodity multi/many-core machines. This paper demonstrates ParDP’s clustering capabilities by applying it to several benchmark and real-world datasets. ParDP’s operation can either be directed to observe the number of clusters in a dataset or to finalize clustering with an assigned number of clusters. Different internal and external measures can be used to assess the accuracy of a resultant clustering solution.
Druh dokumentu: text
Popis souboru: application/pdf
Jazyk: English
Relation: E1: Mathematics and Computer Science; https://dx.doi.org/10.3390/math13081285
DOI: 10.3390/math13081285
Dostupnost: https://doi.org/10.3390/math13081285
Rights: https://creativecommons.org/licenses/by/4.0/
Přístupové číslo: edsbas.83AD2E73
Databáze: BASE
Popis
Abstrakt:This paper proposes ParDP, an algorithm and concrete tool for unsupervised clustering, which belongs to the class of density peaks-based clustering methods. Such methods rely on the observation that cluster representative points (centroids) are points of higher local density surrounded by points of lesser density. Candidate centroids, though, are to be far from each other. A key factor of ParDP is adopting a k-Nearest Neighbors (kNN) technique for estimating the density of points. Complete clustering depends on densities and distances among points. ParDP uses principal component analysis to cope with high-dimensional data points. The current implementation relies on Java parallel streams and the built-in lock-free fork/join mechanism, enabling the exploitation of the computing power of commodity multi/many-core machines. This paper demonstrates ParDP’s clustering capabilities by applying it to several benchmark and real-world datasets. ParDP’s operation can either be directed to observe the number of clusters in a dataset or to finalize clustering with an assigned number of clusters. Different internal and external measures can be used to assess the accuracy of a resultant clustering solution.
DOI:10.3390/math13081285