Dynamic Replication Policy on HDFS Based on Machine Learning Clustering
Data growth in recent years has been swift, leading to the emergence of big data science. Distributed File Systems (DFS) are commonly used to handle big data, like Google File System (GFS), Hadoop Distributed File System (HDFS), and others. The DFS should provide the availability of data and reliabi...
Uloženo v:
| Vydáno v: | IEEE access Ročník 11; s. 1 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Piscataway
IEEE
01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 2169-3536, 2169-3536 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Data growth in recent years has been swift, leading to the emergence of big data science. Distributed File Systems (DFS) are commonly used to handle big data, like Google File System (GFS), Hadoop Distributed File System (HDFS), and others. The DFS should provide the availability of data and reliability of the system in case of failure. The DFS replicates the files in different locations to provide availability and reliability. These replications consume storage space and other resources. The importance of these files differs depending on how frequently they are used in the system. So some of these files do not deserve to replicate many times because it is unimportant in the system. This paper introduces a Dynamic Replication Policy using Machine Learning Clustering (DRPMLC) on HDFS, which uses Machine Learning to cluster the files into different groups and apply other replication policies to each group to reduce the storage consumption, improve the read and write operations time and keep the availability and reliability of HDFS as a High-Performance Distributed Computing (HPDC). |
|---|---|
| AbstractList | Data growth in recent years has been swift, leading to the emergence of big data science. Distributed File Systems (DFS) are commonly used to handle big data, like Google File System (GFS), Hadoop Distributed File System (HDFS), and others. The DFS should provide the availability of data and reliability of the system in case of failure. The DFS replicates the files in different locations to provide availability and reliability. These replications consume storage space and other resources. The importance of these files differs depending on how frequently they are used in the system. So some of these files do not deserve to replicate many times because it is unimportant in the system. This paper introduces a Dynamic Replication Policy using Machine Learning Clustering (DRPMLC) on HDFS, which uses Machine Learning to cluster the files into different groups and apply other replication policies to each group to reduce the storage consumption, improve the read and write operations time and keep the availability and reliability of HDFS as a High-Performance Distributed Computing (HPDC). |
| Author | Ahmed, Motaz A. Shaheen, Masoud E. Khafagy, Mohamed H. Kaseb, Mostafa R. |
| Author_xml | – sequence: 1 givenname: Motaz A. orcidid: 0000-0002-2703-5487 surname: Ahmed fullname: Ahmed, Motaz A. organization: Department of Computer Science, Faculty of Computers and Artificial Intelligence, Fayoum University, Fayoum, Egypt – sequence: 2 givenname: Mohamed H. surname: Khafagy fullname: Khafagy, Mohamed H. organization: Department of Computer Science, Faculty of Computers and Artificial Intelligence, Fayoum University, Fayoum, Egypt – sequence: 3 givenname: Masoud E. orcidid: 0000-0003-4853-3415 surname: Shaheen fullname: Shaheen, Masoud E. organization: Department of Computer Science, Faculty of Computers and Artificial Intelligence, Fayoum University, Fayoum, Egypt – sequence: 4 givenname: Mostafa R. orcidid: 0000-0001-9135-3271 surname: Kaseb fullname: Kaseb, Mostafa R. organization: Department of Computer Science, Faculty of Computers and Artificial Intelligence, Fayoum University, Fayoum, Egypt |
| BookMark | eNp9kU9v3CAQxVGVSknTfIL0YKnn3QKDDRxT56-0UaJse0ZjPE5ZOWaLvYf99mXjVIp6KBceo_d7GvE-saMhDsTYueBLIbj9dlHXV-v1UnIJS5BKC8s_sBMpKruAEqqjd_qYnY3jhudj8qjUJ-zmcj_gS_DFE2374HEKcSgeY5b7Iqvby-t18R1Hag-ve_S_wkDFijANYXgu6n43TpSy_Mw-dtiPdPZ2n7Kf11c_6tvF6uHmrr5YLbzidlo0rQaQncXKt0Z25Ds0umkaalEKYUloryshrUaujQBbKgOqxbZBBV3ZGDhld3NuG3Hjtim8YNq7iMG9DmJ6dpim4HtythIouk4Z25DyUhslS25laSQgB2lz1tc5a5vi7x2Nk9vEXRry-i67uTXARZlddnb5FMcxUed8mF7_aUoYeie4O9Tg5hrcoQb3VkNm4R_278b_p77MVCCidwRXFizAHyHXk0w |
| CODEN | IAECCG |
| CitedBy_id | crossref_primary_10_1002_cpe_8081 crossref_primary_10_3233_JIFS_233579 |
| Cites_doi | 10.2991/ijndc.k.200515.007 10.1007/978-3-031-00828-3_34 10.1109/infos.2014.7036682 10.1145/1327452.1327492 10.1109/tpds.2021.3129973 10.1109/iccasit53235.2021.9633522 10.1109/kst53302.2022.9729076 10.1109/iitsi.2010.74 10.1109/netcod.2015.7176790 10.1016/j.asej.2021.06.024 10.32604/cmc.2020.011313 10.1007/s11432-018-9482-6 10.1109/iske47853.2019.9170274 10.1109/tbdata.2019.2907624 10.1007/978-3-031-16092-9_10 10.1109/clusterw.2012.25 10.1016/j.future.2018.08.015 10.1080/02564602.2016.1260498 10.1016/j.matpr.2021.07.041 10.1109/access.2020.2988796 10.1504/IJWET.2018.092401 10.1109/cscloud/edgecom.2019.00015 10.1109/ibssc47189.2019.8973044 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023 |
| DBID | 97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D DOA |
| DOI | 10.1109/ACCESS.2023.3247190 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Directory of Open Access Journals |
| DatabaseTitle | CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Materials Research Database |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2169-3536 |
| EndPage | 1 |
| ExternalDocumentID | oai_doaj_org_article_961a1ff489be4c2784250925823a0329 10_1109_ACCESS_2023_3247190 10049393 |
| Genre | orig-research |
| GroupedDBID | 0R~ 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS 4.4 AAYXX AGSQL CITATION EJD 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c409t-bd7332f9a6cd82fecfa87bbbeda2119e17c761297a07813954834dadba43f5b83 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 4 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000942270800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2169-3536 |
| IngestDate | Mon Dec 08 04:16:42 EST 2025 Sun Jun 29 15:26:43 EDT 2025 Tue Nov 18 22:19:02 EST 2025 Sat Nov 29 04:02:25 EST 2025 Wed Aug 27 02:14:29 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | https://creativecommons.org/licenses/by/4.0/legalcode |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c409t-bd7332f9a6cd82fecfa87bbbeda2119e17c761297a07813954834dadba43f5b83 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0003-4853-3415 0000-0001-9135-3271 0000-0002-2703-5487 0000-0003-0479-0516 |
| OpenAccessLink | https://doaj.org/article/961a1ff489be4c2784250925823a0329 |
| PQID | 2780983015 |
| PQPubID | 4845423 |
| PageCount | 1 |
| ParticipantIDs | proquest_journals_2780983015 ieee_primary_10049393 crossref_primary_10_1109_ACCESS_2023_3247190 crossref_citationtrail_10_1109_ACCESS_2023_3247190 doaj_primary_oai_doaj_org_article_961a1ff489be4c2784250925823a0329 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-01-01 |
| PublicationDateYYYYMMDD | 2023-01-01 |
| PublicationDate_xml | – month: 01 year: 2023 text: 2023-01-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE access |
| PublicationTitleAbbrev | Access |
| PublicationYear | 2023 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref13 ref12 ref15 ref30 ref11 ref10 Wadkar (ref2) 2014 (ref31) 2022 ref17 Veeraiah (ref18) ref16 Dean (ref3) 2008; 51 ref19 White (ref9) 2012 (ref22) 2022 ref23 (ref14) 2023 ref26 ref25 ref20 ref21 (ref1) 2019 ref28 ref27 ref29 ref8 ref7 Abbas (ref24) 2008; 5 ref4 ref6 ref5 |
| References_xml | – ident: ref23 doi: 10.2991/ijndc.k.200515.007 – ident: ref28 doi: 10.1007/978-3-031-00828-3_34 – ident: ref4 doi: 10.1109/infos.2014.7036682 – volume: 51 start-page: 107 issue: 1 year: 2008 ident: ref3 article-title: MapReduce publication-title: Commun. ACM doi: 10.1145/1327452.1327492 – volume-title: HDFS Erasure Coding year: 2023 ident: ref14 – ident: ref16 doi: 10.1109/tpds.2021.3129973 – ident: ref19 doi: 10.1109/iccasit53235.2021.9633522 – ident: ref27 doi: 10.1109/kst53302.2022.9729076 – start-page: 197 volume-title: Proc. Int. Conf. Inventive Comput. Technol. (ICICT) ident: ref18 article-title: An efficient data duplication system based on Hadoop distributed file system – ident: ref26 doi: 10.1109/iitsi.2010.74 – ident: ref15 doi: 10.1109/netcod.2015.7176790 – ident: ref12 doi: 10.1016/j.asej.2021.06.024 – ident: ref30 doi: 10.32604/cmc.2020.011313 – ident: ref13 doi: 10.1007/s11432-018-9482-6 – ident: ref20 doi: 10.1109/iske47853.2019.9170274 – volume-title: Apache Hadoop 3.3.4—WebHDFS REST API year: 2022 ident: ref22 – ident: ref29 doi: 10.1109/tbdata.2019.2907624 – start-page: 32 volume-title: Hadoop: Definitive Guide year: 2012 ident: ref9 article-title: MapReduce – ident: ref10 doi: 10.1007/978-3-031-16092-9_10 – volume-title: Apache Hadoop year: 2019 ident: ref1 – volume: 5 start-page: 1 issue: 3 year: 2008 ident: ref24 article-title: Comparisons between data clustering algorithms publication-title: Int. Arab J. Inf. Technol. – ident: ref8 doi: 10.1109/clusterw.2012.25 – start-page: 29 volume-title: Pro Apache Hadoop year: 2014 ident: ref2 article-title: Hadoop concepts – ident: ref7 doi: 10.1016/j.future.2018.08.015 – ident: ref5 doi: 10.1080/02564602.2016.1260498 – ident: ref11 doi: 10.1016/j.matpr.2021.07.041 – ident: ref25 doi: 10.1109/access.2020.2988796 – ident: ref6 doi: 10.1504/IJWET.2018.092401 – volume-title: Apache Hadoop Main 3.3.1 API year: 2022 ident: ref31 – ident: ref17 doi: 10.1109/cscloud/edgecom.2019.00015 – ident: ref21 doi: 10.1109/ibssc47189.2019.8973044 |
| SSID | ssj0000816957 |
| Score | 2.2920506 |
| Snippet | Data growth in recent years has been swift, leading to the emergence of big data science. Distributed File Systems (DFS) are commonly used to handle big data,... |
| SourceID | doaj proquest crossref ieee |
| SourceType | Open Website Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1 |
| SubjectTerms | Availability Big Data Clustering Computer networks Data science Distributed computing Distributed processing Feature extraction File systems Hadoop Distributed File System High-Performance Distributed Computing Machine learning Reliability Replicability Replication Replication Policy Support vector machines |
| SummonAdditionalLinks | – databaseName: IEEE Xplore dbid: RIE link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LS8MwHP6h4kEPvsX5ogePdmubdEmOOp1eFEGF3UKeIsgme_j3m1-ajYEoeEtLQtN-TX6PJN8HcEGN90xYnWtRkjzYY5VzL0KUosu4auSp5VFsgj0-8sFAPKXD6vEsjHMubj5zbSzGtXw7MjNMlXWQ3UwQQVZhlbFuc1hrkVBBBQlRs8QsVBaic9XrhZdoo0B4O_gNrMSJd8n6RJL-pKryYyqO9qW__c-e7cBWciSzqwb5XVhxwz3YXKIX3Ie7m0ZuPgte9jw3lzVEwFko3d_0n7PrYMUsXj3EXZUuS4Srb1nvY4YkCqF4AK_925fefZ6EE3ITwrVpri0jpPJCdY3llXfGK8601s4qJHRzJTMseDaCKaT6Icj5RqhVVitKfK05OYS14WjojiCrC4cEOFRRbampjapp9CtV14ZAkFYtqOYfVJrEKo7iFh8yRheFkA0KElGQCYUWXC4afTakGn9Xv0akFlWRETveCBDINMCk6Jaq9J5yoR01uJwanDtR1bwiqiCVaMEBwrb0vAaxFpzOgZdp-E5kaF4IHua--viXZiewgV1skjGnsDYdz9wZrJuv6ftkfB7_zG8QDN4O priority: 102 providerName: IEEE |
| Title | Dynamic Replication Policy on HDFS Based on Machine Learning Clustering |
| URI | https://ieeexplore.ieee.org/document/10049393 https://www.proquest.com/docview/2780983015 https://doaj.org/article/961a1ff489be4c2784250925823a0329 |
| Volume | 11 |
| WOSCitedRecordID | wos000942270800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: DOA dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: M~E dateStart: 20130101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1JS8QwFA4yeNCDuOK4DD14tNo2aZMcnTrjXEYEFbyFrCIMo8zi0d_uS5oZCoJevJS0JLR5eX1Llu9D6IJo5yg3KlU8xyn4Y5kyxyFLUXlYNXLEsEA2Qe_v2csLf2hRffk9YQ08cCO4a17lMneOMK4s0X6ZDJw2L0pWYJnhIhzdyyhvJVPBBrO84iWNMEN5xq9v6hp6dOXZwq8giKC5t8ItVxQQ-yPFyg-7HJzNcBftxCgxuWm-bg9t2Ok-2m5hBx6gu9uGSz6BEHo18ZY0KL8JlEa3w8ekDy7K-Ltx2DJpk4im-prUk6VHSIDiIXoeDp7qURpZEVINudgiVYZiXDguK21Y4ax2klGllDXSo7XZnGoKYQun0uP4YA_ohomRRkmCXakYPkKd6fvUHqOkzKxHtyGSKEN0qWVJQtAoKwNZHim6qFgJSOgIGe6ZKyYipA4ZF41UhZeqiFLtost1o48GMeP36n0v-XVVD3cdHoASiKgE4i8l6KJDP26t90HigznuorPVQIr4b84FNM84A8NWnvzHu0_Rlu9PMy1zhjqL2dKeo039uXibz3pBLeE6_hr0wuHCb2ay4Og |
| linkProvider | Directory of Open Access Journals |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bT9swGP3EZRLbA9ugEx1sy8MeSUlip7YfaaHrNFpNokh9s3xFSKggaPn9-HPcqhICiTcnshUnJ_Z3sX0OwG9qvGfC6lyLkuTBHqucexGiFF3GVSNPLY9iE2w85tOp-J8Oq8ezMM65uPnMdbAY1_LtnVlgquwE2c0EEWQTtmtKq6I5rrVKqaCGhKhZ4hYqC3Fy2u-H1-igRHgneA6sxKl3zf5Emv6kq_JiMo4WZvD5nX37ArvJlcxOG-y_woab7cGnNYLBffhz1gjOZ8HPXmbnsoYKOAul4dngMusFO2bxahT3VbosUa5eZ_3bBdIohGILrgbnk_4wT9IJuQkB2zzXlhFSeaG6xvLKO-MVZ1prZxVSurmSGRZ8G8EUkv0QZH0j1CqrFSW-1px8g63Z3cwdQFYXDilwqKLaUlMbVdPoWaquDaEgrdpQLT-oNIlXHOUtbmWMLwohGxQkoiATCm04XjW6b2g13q7eQ6RWVZETO94IEMg0xKTolqr0nnKhHTW4oBrcO1HVvCKqIJVoQwthW3teg1gbjpbAyzSAH2VoXggeZr_6-yvNfsHOcDK6kBd_x_8O4SN2t0nNHMHW_GHhfsAH8zS_eXz4Gf_SZ8XX4VU |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dynamic+Replication+Policy+on+HDFS+Based+on+Machine+Learning+Clustering&rft.jtitle=IEEE+access&rft.au=Ahmed%2C+Motaz+A.&rft.au=Khafagy%2C+Mohamed+H.&rft.au=Shaheen%2C+Masoud+E.&rft.au=Kaseb%2C+Mostafa+R.&rft.date=2023-01-01&rft.issn=2169-3536&rft.eissn=2169-3536&rft.volume=11&rft.spage=18551&rft.epage=18559&rft_id=info:doi/10.1109%2FACCESS.2023.3247190&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_ACCESS_2023_3247190 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon |