Parallel random swap: An efficient and reliable clustering algorithm in java

Uloženo v:
Podrobná bibliografie
Název: Parallel random swap: An efficient and reliable clustering algorithm in java
Autoři: Nigro, Libero, Cicirelli, Franco, Fränti, Pasi
Přispěvatelé: Tietojenkäsittelytieteen laitos
Informace o vydavateli: Elsevier B.V.
Rok vydání: 2023
Sbírka: University of Eastern Finland: UEF Electronic Publications
Témata: clustering problem, K-means, random swap, parallelism, Java, Streams, Lambda expressions, actors, message-passing, multi-core machines
Popis: Solving large-scale clustering problems requires an efficient algorithm that can also be implemented in parallel. K-means would be suitable, but it can lead to an inaccurate clustering result. To overcome this problem, we present a parallel version of the random swap clustering algorithm. It combines the scalability of k-means with the high clustering accuracy of random swap. The algorithm is implemented in Java in two ways. The first implementation uses Java parallel streams and lambda expressions. The solution exploits a built-in multi-threaded organization capable of offering competitive speedup. The second implementation is achieved on top of the Theatre actor system which ensures better scalability and high-performance computing through fine-grain resource control. The two implementations are then applied to standard benchmark datasets, with a varying population size and distribution of managed records, dimensionality of data points and the number of clusters. The experimental results confirm that high-quality clustering can be obtained together with a very good execution efficiency. Our Java code is publicly available at: https://github.com/uef-machine-learning ; final draft ; peerReviewed
Druh dokumentu: article in journal/newspaper
Jazyk: English
ISSN: 1569-190X
Relation: Simulation modelling practice and theory; 102712; 124; https://erepo.uef.fi/handle/123456789/29490
Dostupnost: https://erepo.uef.fi/handle/123456789/29490
Rights: CC BY-NC-ND 4.0 ; openAccess ; © 2022 Elsevier B.V. ; https://creativecommons.org/licenses/by-nc-nd/4.0/
Přístupové číslo: edsbas.6CE971FA
Databáze: BASE
Popis
Abstrakt:Solving large-scale clustering problems requires an efficient algorithm that can also be implemented in parallel. K-means would be suitable, but it can lead to an inaccurate clustering result. To overcome this problem, we present a parallel version of the random swap clustering algorithm. It combines the scalability of k-means with the high clustering accuracy of random swap. The algorithm is implemented in Java in two ways. The first implementation uses Java parallel streams and lambda expressions. The solution exploits a built-in multi-threaded organization capable of offering competitive speedup. The second implementation is achieved on top of the Theatre actor system which ensures better scalability and high-performance computing through fine-grain resource control. The two implementations are then applied to standard benchmark datasets, with a varying population size and distribution of managed records, dimensionality of data points and the number of clusters. The experimental results confirm that high-quality clustering can be obtained together with a very good execution efficiency. Our Java code is publicly available at: https://github.com/uef-machine-learning ; final draft ; peerReviewed
ISSN:1569190X