Research on K-means Clustering Algorithm Based on MapReduce Distributed Programming Framework
As a classical clustering algorithm, K-means algorithm has a profound research background. In the of big data era, K-means algorithms will play a greater advantage, being able to quickly divide similar data into the same cluster. Combining K-means algorithm with MapReduce distributed computing frame...
Uložené v:
| Vydané v: | Procedia computer science Ročník 228; s. 262 - 270 |
|---|---|
| Hlavný autor: | |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier B.V
2023
|
| Predmet: | |
| ISSN: | 1877-0509, 1877-0509 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | As a classical clustering algorithm, K-means algorithm has a profound research background. In the of big data era, K-means algorithms will play a greater advantage, being able to quickly divide similar data into the same cluster. Combining K-means algorithm with MapReduce distributed computing framework and running on Hadoop big data platform can significantly improve the clustering effect. Based on MapReduce framework structure, this paper studies K-means model, including K-means principle, distance calculation, content validity index and external validity index. On this basis, the K-means clustering flow based on MapReduce big data programming framework is proposed, and the execution process of the algorithm flow is described in detail, which provides a guide for the algorithm implementation. |
|---|---|
| AbstractList | As a classical clustering algorithm, K-means algorithm has a profound research background. In the of big data era, K-means algorithms will play a greater advantage, being able to quickly divide similar data into the same cluster. Combining K-means algorithm with MapReduce distributed computing framework and running on Hadoop big data platform can significantly improve the clustering effect. Based on MapReduce framework structure, this paper studies K-means model, including K-means principle, distance calculation, content validity index and external validity index. On this basis, the K-means clustering flow based on MapReduce big data programming framework is proposed, and the execution process of the algorithm flow is described in detail, which provides a guide for the algorithm implementation. |
| Author | Zhang, Ling |
| Author_xml | – sequence: 1 givenname: Ling surname: Zhang fullname: Zhang, Ling email: 442623405@qq.com organization: Liaoning Geology Engineering Vocational College, Dandong, China |
| BookMark | eNqFkMFOwzAMhiM0JMbYE3DpC7QkTZu0Bw5jMEAMgSY4oihL3S2jTaYkA_H2tIwD4gC--Jfsz5K_YzQw1gBCpwQnBBN2tkm2ziqfpDilCSEJpvgADUnBeYxzXA5-5CM09n6Du6JFURI-RC8L8CCdWkfWRHdxC9L4aNrsfACnzSqaNCvrdFi30YX0UPVb93K7gGqnILrUPji93IVu8Ojsysm27aFZF-DdutcTdFjLxsP4u4_Q8-zqaXoTzx-ub6eTeaxSRnFMeMYKWVFSMpaRmnDOMMlyUAA1SMq4LHimgOaQyXzJFAWu0pSnrCqXRQqcjlC5v6uc9d5BLZQOMmhrgpO6EQSLXpXYiC9VolclCBGdqo6lv9it0610H_9Q53sKurfeNDjhlQajoNIOVBCV1X_yn4uLhps |
| CitedBy_id | crossref_primary_10_23919_JSEE_2025_000023 crossref_primary_10_3390_electronics13101836 |
| Cites_doi | 10.1016/j.isprsjprs.2020.02.012 10.1080/09720510.2022.2130566 10.1007/s00500-022-07758-6 10.1016/j.ins.2022.11.139 10.1016/j.future.2017.03.013 |
| ContentType | Journal Article |
| Copyright | 2023 |
| Copyright_xml | – notice: 2023 |
| DBID | 6I. AAFTH AAYXX CITATION |
| DOI | 10.1016/j.procs.2023.11.030 |
| DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1877-0509 |
| EndPage | 270 |
| ExternalDocumentID | 10_1016_j_procs_2023_11_030 S1877050923018549 |
| GroupedDBID | --K 0R~ 0SF 1B1 457 5VS 6I. 71M AACTN AAEDT AAEDW AAFTH AAIKJ AALRI AAQFI AAXUO ABMAC ACGFS ADBBV ADEZE ADVLN AEXQZ AFTJW AGHFR AITUG AKRWK ALMA_UNASSIGNED_HOLDINGS AMRAJ E3Z EBS EJD EP3 FDB FNPLU HZ~ IXB KQ8 M41 M~E NCXOZ O-L O9- OK1 P2P RIG ROL SES SSZ 9DU AAYWO AAYXX ABWVN ACRPL ACVFH ADCNI ADNMO AEUPX AFPUW AIGII AKBMS AKYEP CITATION ~HD |
| ID | FETCH-LOGICAL-c2630-17468ad3196641f17760145eceefea367a874ce35e4a5b6c3e7c22726d9b82e73 |
| ISSN | 1877-0509 |
| IngestDate | Sat Nov 29 03:07:55 EST 2025 Tue Nov 18 21:52:36 EST 2025 Sat Sep 14 18:13:16 EDT 2024 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Mapreduce Clustering Algorithm K-Means Algorithm Process Distributed Programming Framework |
| Language | English |
| License | This is an open access article under the CC BY-NC-ND license. |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c2630-17468ad3196641f17760145eceefea367a874ce35e4a5b6c3e7c22726d9b82e73 |
| OpenAccessLink | https://dx.doi.org/10.1016/j.procs.2023.11.030 |
| PageCount | 9 |
| ParticipantIDs | crossref_citationtrail_10_1016_j_procs_2023_11_030 crossref_primary_10_1016_j_procs_2023_11_030 elsevier_sciencedirect_doi_10_1016_j_procs_2023_11_030 |
| PublicationCentury | 2000 |
| PublicationDate | 2023 2023-00-00 |
| PublicationDateYYYYMMDD | 2023-01-01 |
| PublicationDate_xml | – year: 2023 text: 2023 |
| PublicationDecade | 2020 |
| PublicationTitle | Procedia computer science |
| PublicationYear | 2023 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Natesan, Sathishkumar, Kumar, Maheshwari, Prabhu (bib0003) 2023 Rustam, Nenad, Bassem (bib0007) 2023; 137 Gavua, Kecskemeti (bib0004) 2023; 14 Usha, Verma (bib0010) 2022; 25 Eka, Afdhal, Andry (bib0006) 2023; 216 Wang, Chen, Yu (bib0002) 2020; 162 Ramani, Vimala Devi, Ruba Soundar (bib0005) 2022; 27 Ikotun Abiodun, Ezugwu Absalom, Laith (bib0009) 2023; 622 Li, Liu, Pan (bib0001) 2020; 105 Debasmita, Parthajit (bib0008) 2023; 6 Li (10.1016/j.procs.2023.11.030_bib0001) 2020; 105 Gavua (10.1016/j.procs.2023.11.030_bib0004) 2023; 14 Debasmita (10.1016/j.procs.2023.11.030_bib0008) 2023; 6 Ikotun Abiodun (10.1016/j.procs.2023.11.030_bib0009) 2023; 622 Wang (10.1016/j.procs.2023.11.030_bib0002) 2020; 162 Natesan (10.1016/j.procs.2023.11.030_bib0003) 2023 Rustam (10.1016/j.procs.2023.11.030_bib0007) 2023; 137 Eka (10.1016/j.procs.2023.11.030_bib0006) 2023; 216 Ramani (10.1016/j.procs.2023.11.030_bib0005) 2022; 27 Usha (10.1016/j.procs.2023.11.030_bib0010) 2022; 25 |
| References_xml | – volume: 216 start-page: 356 year: 2023 end-page: 363 ident: bib0006 article-title: Suhartono Derwin. Clustering models for hospitals in Jakarta using fuzzy c-means and k-means publication-title: Procedia Computer Science – year: 2023 ident: bib0003 article-title: Allayear Shaikh Muhammad. A Distributed Framework for Predictive Analytics Using Big Data and MapReduce Parallel Programming publication-title: Mathematical Problems in Engineering – volume: 25 start-page: 1621 year: 2022 end-page: 1632 ident: bib0010 article-title: Nahar Pooja. Applicability of K-medoids and K-means algorithms for segmenting students based on their scholastic performance publication-title: Journal of Statistics and Management Systems – volume: 162 start-page: 137 year: 2020 end-page: 147 ident: bib0002 article-title: Segmentation of large-scale remotely sensed images on a Spark platform: A strategy for handling massive image tiles with the MapReduce model publication-title: ISPRS Journal of Photogrammetry and Remote Sensing – volume: 622 start-page: 178 year: 2023 end-page: 210 ident: bib0009 article-title: Abuhaija Belal, Heming Jia. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data publication-title: Information Sciences – volume: 27 start-page: 1827 year: 2022 ident: bib0005 article-title: Retraction Note: MapReduce-based big data framework using modified artificial neural network classifier for diabetic chronic disease prediction publication-title: Soft Computing – volume: 6 start-page: 1 year: 2023 end-page: 9 ident: bib0008 article-title: Maiti Moinak. A K-means clustering model for analyzing the Bitcoin extreme value returns publication-title: Decision Analytics Journal – volume: 137 year: 2023 ident: bib0007 article-title: Mussabayev Ravil. How to Use K-means for Big Data Clustering? publication-title: Pattern Recognition – volume: 14 start-page: 12 year: 2023 end-page: 22 ident: bib0004 article-title: Improving MapReduce Speculative Executions with Global Snapshots publication-title: International Journal of Advanced Computer Science and Applications (IJACSA) – volume: 105 start-page: 993 year: 2020 end-page: 1001 ident: bib0001 article-title: Map-Balance-Reduce: An improved parallel programming model for load balancing of MapReduce publication-title: Future Generation Computer Systems – volume: 162 start-page: 137 issue: 3 year: 2020 ident: 10.1016/j.procs.2023.11.030_bib0002 article-title: Segmentation of large-scale remotely sensed images on a Spark platform: A strategy for handling massive image tiles with the MapReduce model publication-title: ISPRS Journal of Photogrammetry and Remote Sensing doi: 10.1016/j.isprsjprs.2020.02.012 – volume: 25 start-page: 1621 issue: 7 year: 2022 ident: 10.1016/j.procs.2023.11.030_bib0010 article-title: Nahar Pooja. Applicability of K-medoids and K-means algorithms for segmenting students based on their scholastic performance publication-title: Journal of Statistics and Management Systems doi: 10.1080/09720510.2022.2130566 – volume: 27 start-page: 1827 issue: 3 year: 2022 ident: 10.1016/j.procs.2023.11.030_bib0005 article-title: Retraction Note: MapReduce-based big data framework using modified artificial neural network classifier for diabetic chronic disease prediction publication-title: Soft Computing doi: 10.1007/s00500-022-07758-6 – volume: 622 start-page: 178 issue: 1 year: 2023 ident: 10.1016/j.procs.2023.11.030_bib0009 article-title: Abuhaija Belal, Heming Jia. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data publication-title: Information Sciences doi: 10.1016/j.ins.2022.11.139 – volume: 14 start-page: 12 issue: 1 year: 2023 ident: 10.1016/j.procs.2023.11.030_bib0004 article-title: Improving MapReduce Speculative Executions with Global Snapshots publication-title: International Journal of Advanced Computer Science and Applications (IJACSA) – volume: 6 start-page: 1 year: 2023 ident: 10.1016/j.procs.2023.11.030_bib0008 article-title: Maiti Moinak. A K-means clustering model for analyzing the Bitcoin extreme value returns publication-title: Decision Analytics Journal – volume: 105 start-page: 993 issue: 3 year: 2020 ident: 10.1016/j.procs.2023.11.030_bib0001 article-title: Map-Balance-Reduce: An improved parallel programming model for load balancing of MapReduce publication-title: Future Generation Computer Systems doi: 10.1016/j.future.2017.03.013 – volume: 137 year: 2023 ident: 10.1016/j.procs.2023.11.030_bib0007 article-title: Mussabayev Ravil. How to Use K-means for Big Data Clustering? publication-title: Pattern Recognition – issue: 1-10 year: 2023 ident: 10.1016/j.procs.2023.11.030_bib0003 article-title: Allayear Shaikh Muhammad. A Distributed Framework for Predictive Analytics Using Big Data and MapReduce Parallel Programming publication-title: Mathematical Problems in Engineering – volume: 216 start-page: 356 issue: 1 year: 2023 ident: 10.1016/j.procs.2023.11.030_bib0006 article-title: Suhartono Derwin. Clustering models for hospitals in Jakarta using fuzzy c-means and k-means publication-title: Procedia Computer Science |
| SSID | ssj0000388917 |
| Score | 2.2728229 |
| Snippet | As a classical clustering algorithm, K-means algorithm has a profound research background. In the of big data era, K-means algorithms will play a greater... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 262 |
| SubjectTerms | Algorithm Process Clustering Algorithm Distributed Programming Framework K-Means Mapreduce |
| Title | Research on K-means Clustering Algorithm Based on MapReduce Distributed Programming Framework |
| URI | https://dx.doi.org/10.1016/j.procs.2023.11.030 |
| Volume | 228 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1877-0509 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: M~E dateStart: 20100101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Ja9tAFB7atIdesnQhW8McenNV7NFsOiahIZCF0qaQSxGj0bhxsGXjOCGn_Pa8N4vi4hDaQC9CHjQeMd_wNr33PUI-gZGs-91KgJPjuhkX4LMWgvWzStaVEbLmynqkj9XpqT4_L77FFvdXvp2Aahp9e1tM_ivUMAZgY-nsP8Dd_ikMwD2ADleAHa5_BXzKpcPPAEfZyIEu6uwPr5EQwYdAhr_H08HsYtTZAwVW41MnZvIdGVwdcnGGDljOlxBg5tYIJx2kFK55W9bXGMDx8mnp2BmiE9XpQjD6OOnHGF4Itb8h2rVQ8eIFpFYqQ86YoD8eGYtSlcWa7ygXo8gNKpaFXiEL0jsEEi5Rd1ikUmf5F2RYjR9u_qTF_oGr4qLgQ4HNwYuX5BVTosC0vpO7hzgbst0UvvFy-5qJfMqn-S2s9biBMmd0nK2S5egt0N2A8hp54Zq3ZCV14qBRML8jvxLodNzQCDp9AJ22oFMPOj7Vgk7nQKdzoNMW9Pfk58HXs_3DLDbOyCyTeTcDL1NqU6N0lbzX7ylMfOLCgUHUdyaXymjFrcuF40ZU0uZOWcYUk3VRaeZU_oEsNePGrWPmW2U1GPHCmJprbuBGsp6z2uaVFLXeICztVmkjqzw2NxmWKX3wsvRbXOIWg79ZwhZvkM_tpEkgVXn6cZlgKONBDvZeCQfnqYmbz524Rd7grxBq2yZLs-m1-0he25vZ4Gq64w_YPWrLiIU |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Research+on+K-means+Clustering+Algorithm+Based+on+MapReduce+Distributed+Programming+Framework&rft.jtitle=Procedia+computer+science&rft.au=Zhang%2C+Ling&rft.date=2023&rft.pub=Elsevier+B.V&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=228&rft.spage=262&rft.epage=270&rft_id=info:doi/10.1016%2Fj.procs.2023.11.030&rft.externalDocID=S1877050923018549 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon |