Research on K-means Clustering Algorithm Based on MapReduce Distributed Programming Framework

As a classical clustering algorithm, K-means algorithm has a profound research background. In the of big data era, K-means algorithms will play a greater advantage, being able to quickly divide similar data into the same cluster. Combining K-means algorithm with MapReduce distributed computing frame...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Procedia computer science Ročník 228; s. 262 - 270
Hlavný autor: Zhang, Ling
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 2023
Predmet:
ISSN:1877-0509, 1877-0509
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract As a classical clustering algorithm, K-means algorithm has a profound research background. In the of big data era, K-means algorithms will play a greater advantage, being able to quickly divide similar data into the same cluster. Combining K-means algorithm with MapReduce distributed computing framework and running on Hadoop big data platform can significantly improve the clustering effect. Based on MapReduce framework structure, this paper studies K-means model, including K-means principle, distance calculation, content validity index and external validity index. On this basis, the K-means clustering flow based on MapReduce big data programming framework is proposed, and the execution process of the algorithm flow is described in detail, which provides a guide for the algorithm implementation.
AbstractList As a classical clustering algorithm, K-means algorithm has a profound research background. In the of big data era, K-means algorithms will play a greater advantage, being able to quickly divide similar data into the same cluster. Combining K-means algorithm with MapReduce distributed computing framework and running on Hadoop big data platform can significantly improve the clustering effect. Based on MapReduce framework structure, this paper studies K-means model, including K-means principle, distance calculation, content validity index and external validity index. On this basis, the K-means clustering flow based on MapReduce big data programming framework is proposed, and the execution process of the algorithm flow is described in detail, which provides a guide for the algorithm implementation.
Author Zhang, Ling
Author_xml – sequence: 1
  givenname: Ling
  surname: Zhang
  fullname: Zhang, Ling
  email: 442623405@qq.com
  organization: Liaoning Geology Engineering Vocational College, Dandong, China
BookMark eNqFkMFOwzAMhiM0JMbYE3DpC7QkTZu0Bw5jMEAMgSY4oihL3S2jTaYkA_H2tIwD4gC--Jfsz5K_YzQw1gBCpwQnBBN2tkm2ziqfpDilCSEJpvgADUnBeYxzXA5-5CM09n6Du6JFURI-RC8L8CCdWkfWRHdxC9L4aNrsfACnzSqaNCvrdFi30YX0UPVb93K7gGqnILrUPji93IVu8Ojsysm27aFZF-DdutcTdFjLxsP4u4_Q8-zqaXoTzx-ub6eTeaxSRnFMeMYKWVFSMpaRmnDOMMlyUAA1SMq4LHimgOaQyXzJFAWu0pSnrCqXRQqcjlC5v6uc9d5BLZQOMmhrgpO6EQSLXpXYiC9VolclCBGdqo6lv9it0610H_9Q53sKurfeNDjhlQajoNIOVBCV1X_yn4uLhps
CitedBy_id crossref_primary_10_23919_JSEE_2025_000023
crossref_primary_10_3390_electronics13101836
Cites_doi 10.1016/j.isprsjprs.2020.02.012
10.1080/09720510.2022.2130566
10.1007/s00500-022-07758-6
10.1016/j.ins.2022.11.139
10.1016/j.future.2017.03.013
ContentType Journal Article
Copyright 2023
Copyright_xml – notice: 2023
DBID 6I.
AAFTH
AAYXX
CITATION
DOI 10.1016/j.procs.2023.11.030
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1877-0509
EndPage 270
ExternalDocumentID 10_1016_j_procs_2023_11_030
S1877050923018549
GroupedDBID --K
0R~
0SF
1B1
457
5VS
6I.
71M
AACTN
AAEDT
AAEDW
AAFTH
AAIKJ
AALRI
AAQFI
AAXUO
ABMAC
ACGFS
ADBBV
ADEZE
ADVLN
AEXQZ
AFTJW
AGHFR
AITUG
AKRWK
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
E3Z
EBS
EJD
EP3
FDB
FNPLU
HZ~
IXB
KQ8
M41
M~E
NCXOZ
O-L
O9-
OK1
P2P
RIG
ROL
SES
SSZ
9DU
AAYWO
AAYXX
ABWVN
ACRPL
ACVFH
ADCNI
ADNMO
AEUPX
AFPUW
AIGII
AKBMS
AKYEP
CITATION
~HD
ID FETCH-LOGICAL-c2630-17468ad3196641f17760145eceefea367a874ce35e4a5b6c3e7c22726d9b82e73
ISSN 1877-0509
IngestDate Sat Nov 29 03:07:55 EST 2025
Tue Nov 18 21:52:36 EST 2025
Sat Sep 14 18:13:16 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Mapreduce
Clustering Algorithm
K-Means
Algorithm Process
Distributed Programming Framework
Language English
License This is an open access article under the CC BY-NC-ND license.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c2630-17468ad3196641f17760145eceefea367a874ce35e4a5b6c3e7c22726d9b82e73
OpenAccessLink https://dx.doi.org/10.1016/j.procs.2023.11.030
PageCount 9
ParticipantIDs crossref_citationtrail_10_1016_j_procs_2023_11_030
crossref_primary_10_1016_j_procs_2023_11_030
elsevier_sciencedirect_doi_10_1016_j_procs_2023_11_030
PublicationCentury 2000
PublicationDate 2023
2023-00-00
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – year: 2023
  text: 2023
PublicationDecade 2020
PublicationTitle Procedia computer science
PublicationYear 2023
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Natesan, Sathishkumar, Kumar, Maheshwari, Prabhu (bib0003) 2023
Rustam, Nenad, Bassem (bib0007) 2023; 137
Gavua, Kecskemeti (bib0004) 2023; 14
Usha, Verma (bib0010) 2022; 25
Eka, Afdhal, Andry (bib0006) 2023; 216
Wang, Chen, Yu (bib0002) 2020; 162
Ramani, Vimala Devi, Ruba Soundar (bib0005) 2022; 27
Ikotun Abiodun, Ezugwu Absalom, Laith (bib0009) 2023; 622
Li, Liu, Pan (bib0001) 2020; 105
Debasmita, Parthajit (bib0008) 2023; 6
Li (10.1016/j.procs.2023.11.030_bib0001) 2020; 105
Gavua (10.1016/j.procs.2023.11.030_bib0004) 2023; 14
Debasmita (10.1016/j.procs.2023.11.030_bib0008) 2023; 6
Ikotun Abiodun (10.1016/j.procs.2023.11.030_bib0009) 2023; 622
Wang (10.1016/j.procs.2023.11.030_bib0002) 2020; 162
Natesan (10.1016/j.procs.2023.11.030_bib0003) 2023
Rustam (10.1016/j.procs.2023.11.030_bib0007) 2023; 137
Eka (10.1016/j.procs.2023.11.030_bib0006) 2023; 216
Ramani (10.1016/j.procs.2023.11.030_bib0005) 2022; 27
Usha (10.1016/j.procs.2023.11.030_bib0010) 2022; 25
References_xml – volume: 216
  start-page: 356
  year: 2023
  end-page: 363
  ident: bib0006
  article-title: Suhartono Derwin. Clustering models for hospitals in Jakarta using fuzzy c-means and k-means
  publication-title: Procedia Computer Science
– year: 2023
  ident: bib0003
  article-title: Allayear Shaikh Muhammad. A Distributed Framework for Predictive Analytics Using Big Data and MapReduce Parallel Programming
  publication-title: Mathematical Problems in Engineering
– volume: 25
  start-page: 1621
  year: 2022
  end-page: 1632
  ident: bib0010
  article-title: Nahar Pooja. Applicability of K-medoids and K-means algorithms for segmenting students based on their scholastic performance
  publication-title: Journal of Statistics and Management Systems
– volume: 162
  start-page: 137
  year: 2020
  end-page: 147
  ident: bib0002
  article-title: Segmentation of large-scale remotely sensed images on a Spark platform: A strategy for handling massive image tiles with the MapReduce model
  publication-title: ISPRS Journal of Photogrammetry and Remote Sensing
– volume: 622
  start-page: 178
  year: 2023
  end-page: 210
  ident: bib0009
  article-title: Abuhaija Belal, Heming Jia. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data
  publication-title: Information Sciences
– volume: 27
  start-page: 1827
  year: 2022
  ident: bib0005
  article-title: Retraction Note: MapReduce-based big data framework using modified artificial neural network classifier for diabetic chronic disease prediction
  publication-title: Soft Computing
– volume: 6
  start-page: 1
  year: 2023
  end-page: 9
  ident: bib0008
  article-title: Maiti Moinak. A K-means clustering model for analyzing the Bitcoin extreme value returns
  publication-title: Decision Analytics Journal
– volume: 137
  year: 2023
  ident: bib0007
  article-title: Mussabayev Ravil. How to Use K-means for Big Data Clustering?
  publication-title: Pattern Recognition
– volume: 14
  start-page: 12
  year: 2023
  end-page: 22
  ident: bib0004
  article-title: Improving MapReduce Speculative Executions with Global Snapshots
  publication-title: International Journal of Advanced Computer Science and Applications (IJACSA)
– volume: 105
  start-page: 993
  year: 2020
  end-page: 1001
  ident: bib0001
  article-title: Map-Balance-Reduce: An improved parallel programming model for load balancing of MapReduce
  publication-title: Future Generation Computer Systems
– volume: 162
  start-page: 137
  issue: 3
  year: 2020
  ident: 10.1016/j.procs.2023.11.030_bib0002
  article-title: Segmentation of large-scale remotely sensed images on a Spark platform: A strategy for handling massive image tiles with the MapReduce model
  publication-title: ISPRS Journal of Photogrammetry and Remote Sensing
  doi: 10.1016/j.isprsjprs.2020.02.012
– volume: 25
  start-page: 1621
  issue: 7
  year: 2022
  ident: 10.1016/j.procs.2023.11.030_bib0010
  article-title: Nahar Pooja. Applicability of K-medoids and K-means algorithms for segmenting students based on their scholastic performance
  publication-title: Journal of Statistics and Management Systems
  doi: 10.1080/09720510.2022.2130566
– volume: 27
  start-page: 1827
  issue: 3
  year: 2022
  ident: 10.1016/j.procs.2023.11.030_bib0005
  article-title: Retraction Note: MapReduce-based big data framework using modified artificial neural network classifier for diabetic chronic disease prediction
  publication-title: Soft Computing
  doi: 10.1007/s00500-022-07758-6
– volume: 622
  start-page: 178
  issue: 1
  year: 2023
  ident: 10.1016/j.procs.2023.11.030_bib0009
  article-title: Abuhaija Belal, Heming Jia. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data
  publication-title: Information Sciences
  doi: 10.1016/j.ins.2022.11.139
– volume: 14
  start-page: 12
  issue: 1
  year: 2023
  ident: 10.1016/j.procs.2023.11.030_bib0004
  article-title: Improving MapReduce Speculative Executions with Global Snapshots
  publication-title: International Journal of Advanced Computer Science and Applications (IJACSA)
– volume: 6
  start-page: 1
  year: 2023
  ident: 10.1016/j.procs.2023.11.030_bib0008
  article-title: Maiti Moinak. A K-means clustering model for analyzing the Bitcoin extreme value returns
  publication-title: Decision Analytics Journal
– volume: 105
  start-page: 993
  issue: 3
  year: 2020
  ident: 10.1016/j.procs.2023.11.030_bib0001
  article-title: Map-Balance-Reduce: An improved parallel programming model for load balancing of MapReduce
  publication-title: Future Generation Computer Systems
  doi: 10.1016/j.future.2017.03.013
– volume: 137
  year: 2023
  ident: 10.1016/j.procs.2023.11.030_bib0007
  article-title: Mussabayev Ravil. How to Use K-means for Big Data Clustering?
  publication-title: Pattern Recognition
– issue: 1-10
  year: 2023
  ident: 10.1016/j.procs.2023.11.030_bib0003
  article-title: Allayear Shaikh Muhammad. A Distributed Framework for Predictive Analytics Using Big Data and MapReduce Parallel Programming
  publication-title: Mathematical Problems in Engineering
– volume: 216
  start-page: 356
  issue: 1
  year: 2023
  ident: 10.1016/j.procs.2023.11.030_bib0006
  article-title: Suhartono Derwin. Clustering models for hospitals in Jakarta using fuzzy c-means and k-means
  publication-title: Procedia Computer Science
SSID ssj0000388917
Score 2.2728229
Snippet As a classical clustering algorithm, K-means algorithm has a profound research background. In the of big data era, K-means algorithms will play a greater...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 262
SubjectTerms Algorithm Process
Clustering Algorithm
Distributed Programming Framework
K-Means
Mapreduce
Title Research on K-means Clustering Algorithm Based on MapReduce Distributed Programming Framework
URI https://dx.doi.org/10.1016/j.procs.2023.11.030
Volume 228
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1877-0509
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000388917
  issn: 1877-0509
  databaseCode: M~E
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Ja9tAFB7atIdesnQhW8McenNV7NFsOiahIZCF0qaQSxGj0bhxsGXjOCGn_Pa8N4vi4hDaQC9CHjQeMd_wNr33PUI-gZGs-91KgJPjuhkX4LMWgvWzStaVEbLmynqkj9XpqT4_L77FFvdXvp2Aahp9e1tM_ivUMAZgY-nsP8Dd_ikMwD2ADleAHa5_BXzKpcPPAEfZyIEu6uwPr5EQwYdAhr_H08HsYtTZAwVW41MnZvIdGVwdcnGGDljOlxBg5tYIJx2kFK55W9bXGMDx8mnp2BmiE9XpQjD6OOnHGF4Itb8h2rVQ8eIFpFYqQ86YoD8eGYtSlcWa7ygXo8gNKpaFXiEL0jsEEi5Rd1ikUmf5F2RYjR9u_qTF_oGr4qLgQ4HNwYuX5BVTosC0vpO7hzgbst0UvvFy-5qJfMqn-S2s9biBMmd0nK2S5egt0N2A8hp54Zq3ZCV14qBRML8jvxLodNzQCDp9AJ22oFMPOj7Vgk7nQKdzoNMW9Pfk58HXs_3DLDbOyCyTeTcDL1NqU6N0lbzX7ylMfOLCgUHUdyaXymjFrcuF40ZU0uZOWcYUk3VRaeZU_oEsNePGrWPmW2U1GPHCmJprbuBGsp6z2uaVFLXeICztVmkjqzw2NxmWKX3wsvRbXOIWg79ZwhZvkM_tpEkgVXn6cZlgKONBDvZeCQfnqYmbz524Rd7grxBq2yZLs-m1-0he25vZ4Gq64w_YPWrLiIU
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Research+on+K-means+Clustering+Algorithm+Based+on+MapReduce+Distributed+Programming+Framework&rft.jtitle=Procedia+computer+science&rft.au=Zhang%2C+Ling&rft.date=2023&rft.pub=Elsevier+B.V&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=228&rft.spage=262&rft.epage=270&rft_id=info:doi/10.1016%2Fj.procs.2023.11.030&rft.externalDocID=S1877050923018549
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon