K-Means: An Efficient Clustering Algorithm with Adaptive Decision Boundaries

Conventional k -means algorithms often face significant computational burdens and have a high dependence on the number of predefined clusters k . Therefore, this paper proposes the k ∗ -means algorithm, which incorporates the concept of the perceptron classification algorithm to transform the distan...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:International journal of parallel programming Ročník 53; číslo 1; s. 3
Hlavní autoři: Long, Jianwu, Liu, Luping
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Springer US 01.02.2025
Springer Nature B.V
Témata:
ISSN:0885-7458, 1573-7640
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Conventional k -means algorithms often face significant computational burdens and have a high dependence on the number of predefined clusters k . Therefore, this paper proposes the k ∗ -means algorithm, which incorporates the concept of the perceptron classification algorithm to transform the distance-based clustering task into a classification problem, significantly improving clustering efficiency. Moreover, this paper combines the k ∗ -means algorithm with hierarchical clustering methods that can automatically identify the number of clusters. An initial clustering is performed using a large pre-set number of clusters with the k ∗ -means algorithm, followed by merging the sub-clusters through hierarchical clustering. Experimental results show that the proposed k ∗ -means method has significant advantages when handling large-scale datasets. It greatly reduces the number of distance calculations and performs better in terms of runtime compared to the latest accelerated k -means algorithms. And the k ∗ -means algorithm, when combined with hierarchical clustering, shows notable performance on both the four synthetic datasets and the four real datasets. Future work could explore leveraging parallelization techniques to further enhance its scalability and efficiency on even larger datasets.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0885-7458
1573-7640
DOI:10.1007/s10766-024-00779-8