Zobrazit v EDS

PERFORMANCE COMPARISON OF K-MEANS, PARALLEL K-MEANS AND K-MEANS++.

Uloženo v:

Podrobná bibliografie
Název:	PERFORMANCE COMPARISON OF K-MEANS, PARALLEL K-MEANS AND K-MEANS++.
Autoři:	Aliguliyev, Ramiz, Tahirzada, Shalala F.
Zdroj:	Reliability: Theory & Applications; 2025 Special Issue, Vol. 20, p169-176, 8p
Témata:	CLUSTERING algorithms, PATTERN recognition systems, DATA structures, PARALLEL programming, MULTICORE processors, K-means clustering
Abstrakt:	K-means clustering is a fundamental unsupervised machine learning technique widely applied in various domains such as data analysis, pattern recognition, and clustering-based tasks. However, its efficiency and scalability can be challenged, particularly when dealing with large-scale datasets and complex data structures. This thesis explores strategies to improve the performance of the K-means clustering algorithm through parallelism and iterative techniques. Parallelism leverages modern parallel computing architectures, including multi-core processors and distributed frameworks like Apache Spark, to enhance computational efficiency and scalability. On the other hand, an iterative approach involves refining clustering results through multiple iterations, adjusting cluster centroids, and optimizing convergence criteria. It delves into the design frameworks of these approaches, highlighting their respective advantages and limitations. Comparative analyses are conducted to evaluate the effectiveness of parallelism and iterative techniques in terms of execution time, scalability, clustering accuracy, and convergence speed. The findings contribute to advancing the understanding of how parallelism and iterative strategies can significantly improve K-means clustering performance, especially in the context of big data and complex datasets. By comparatively analyzing parallelism and iterative approaches, this paper aims to contribute to the development of more efficient and scalable clustering algorithms in the Big Data context. [ABSTRACT FROM AUTHOR]
	Copyright of Reliability: Theory & Applications is the property of International Group on Reliability and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáze:	Complementary Index

Nájsť tento článok vo Web of Science

Popis
Abstrakt:	K-means clustering is a fundamental unsupervised machine learning technique widely applied in various domains such as data analysis, pattern recognition, and clustering-based tasks. However, its efficiency and scalability can be challenged, particularly when dealing with large-scale datasets and complex data structures. This thesis explores strategies to improve the performance of the K-means clustering algorithm through parallelism and iterative techniques. Parallelism leverages modern parallel computing architectures, including multi-core processors and distributed frameworks like Apache Spark, to enhance computational efficiency and scalability. On the other hand, an iterative approach involves refining clustering results through multiple iterations, adjusting cluster centroids, and optimizing convergence criteria. It delves into the design frameworks of these approaches, highlighting their respective advantages and limitations. Comparative analyses are conducted to evaluate the effectiveness of parallelism and iterative techniques in terms of execution time, scalability, clustering accuracy, and convergence speed. The findings contribute to advancing the understanding of how parallelism and iterative strategies can significantly improve K-means clustering performance, especially in the context of big data and complex datasets. By comparatively analyzing parallelism and iterative approaches, this paper aims to contribute to the development of more efficient and scalable clustering algorithms in the Big Data context. [ABSTRACT FROM AUTHOR]
ISSN:	19322321