K-Means: An Efficient Clustering Algorithm with Adaptive Decision Boundaries
Conventional k -means algorithms often face significant computational burdens and have a high dependence on the number of predefined clusters k . Therefore, this paper proposes the k ∗ -means algorithm, which incorporates the concept of the perceptron classification algorithm to transform the distan...
Uloženo v:
| Vydáno v: | International journal of parallel programming Ročník 53; číslo 1; s. 3 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
Springer US
01.02.2025
Springer Nature B.V |
| Témata: | |
| ISSN: | 0885-7458, 1573-7640 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Conventional
k
-means algorithms often face significant computational burdens and have a high dependence on the number of predefined clusters
k
. Therefore, this paper proposes the
k
∗
-means algorithm, which incorporates the concept of the perceptron classification algorithm to transform the distance-based clustering task into a classification problem, significantly improving clustering efficiency. Moreover, this paper combines the
k
∗
-means algorithm with hierarchical clustering methods that can automatically identify the number of clusters. An initial clustering is performed using a large pre-set number of clusters with the
k
∗
-means algorithm, followed by merging the sub-clusters through hierarchical clustering. Experimental results show that the proposed
k
∗
-means method has significant advantages when handling large-scale datasets. It greatly reduces the number of distance calculations and performs better in terms of runtime compared to the latest accelerated
k
-means algorithms. And the
k
∗
-means algorithm, when combined with hierarchical clustering, shows notable performance on both the four synthetic datasets and the four real datasets. Future work could explore leveraging parallelization techniques to further enhance its scalability and efficiency on even larger datasets. |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0885-7458 1573-7640 |
| DOI: | 10.1007/s10766-024-00779-8 |