Research on K-means Clustering Algorithm Based on MapReduce Distributed Programming Framework
As a classical clustering algorithm, K-means algorithm has a profound research background. In the of big data era, K-means algorithms will play a greater advantage, being able to quickly divide similar data into the same cluster. Combining K-means algorithm with MapReduce distributed computing frame...
Saved in:
| Published in: | Procedia computer science Vol. 228; pp. 262 - 270 |
|---|---|
| Main Author: | |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
2023
|
| Subjects: | |
| ISSN: | 1877-0509, 1877-0509 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | As a classical clustering algorithm, K-means algorithm has a profound research background. In the of big data era, K-means algorithms will play a greater advantage, being able to quickly divide similar data into the same cluster. Combining K-means algorithm with MapReduce distributed computing framework and running on Hadoop big data platform can significantly improve the clustering effect. Based on MapReduce framework structure, this paper studies K-means model, including K-means principle, distance calculation, content validity index and external validity index. On this basis, the K-means clustering flow based on MapReduce big data programming framework is proposed, and the execution process of the algorithm flow is described in detail, which provides a guide for the algorithm implementation. |
|---|---|
| ISSN: | 1877-0509 1877-0509 |
| DOI: | 10.1016/j.procs.2023.11.030 |