Dynamic Replication Policy on HDFS Based on Machine Learning Clustering

Data growth in recent years has been swift, leading to the emergence of big data science. Distributed File Systems (DFS) are commonly used to handle big data, like Google File System (GFS), Hadoop Distributed File System (HDFS), and others. The DFS should provide the availability of data and reliabi...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access Vol. 11; p. 1
Main Authors: Ahmed, Motaz A., Khafagy, Mohamed H., Shaheen, Masoud E., Kaseb, Mostafa R.
Format: Journal Article
Language:English
Published: Piscataway IEEE 01.01.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2169-3536, 2169-3536
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Data growth in recent years has been swift, leading to the emergence of big data science. Distributed File Systems (DFS) are commonly used to handle big data, like Google File System (GFS), Hadoop Distributed File System (HDFS), and others. The DFS should provide the availability of data and reliability of the system in case of failure. The DFS replicates the files in different locations to provide availability and reliability. These replications consume storage space and other resources. The importance of these files differs depending on how frequently they are used in the system. So some of these files do not deserve to replicate many times because it is unimportant in the system. This paper introduces a Dynamic Replication Policy using Machine Learning Clustering (DRPMLC) on HDFS, which uses Machine Learning to cluster the files into different groups and apply other replication policies to each group to reduce the storage consumption, improve the read and write operations time and keep the availability and reliability of HDFS as a High-Performance Distributed Computing (HPDC).
AbstractList Data growth in recent years has been swift, leading to the emergence of big data science. Distributed File Systems (DFS) are commonly used to handle big data, like Google File System (GFS), Hadoop Distributed File System (HDFS), and others. The DFS should provide the availability of data and reliability of the system in case of failure. The DFS replicates the files in different locations to provide availability and reliability. These replications consume storage space and other resources. The importance of these files differs depending on how frequently they are used in the system. So some of these files do not deserve to replicate many times because it is unimportant in the system. This paper introduces a Dynamic Replication Policy using Machine Learning Clustering (DRPMLC) on HDFS, which uses Machine Learning to cluster the files into different groups and apply other replication policies to each group to reduce the storage consumption, improve the read and write operations time and keep the availability and reliability of HDFS as a High-Performance Distributed Computing (HPDC).
Author Ahmed, Motaz A.
Shaheen, Masoud E.
Khafagy, Mohamed H.
Kaseb, Mostafa R.
Author_xml – sequence: 1
  givenname: Motaz A.
  orcidid: 0000-0002-2703-5487
  surname: Ahmed
  fullname: Ahmed, Motaz A.
  organization: Department of Computer Science, Faculty of Computers and Artificial Intelligence, Fayoum University, Fayoum, Egypt
– sequence: 2
  givenname: Mohamed H.
  surname: Khafagy
  fullname: Khafagy, Mohamed H.
  organization: Department of Computer Science, Faculty of Computers and Artificial Intelligence, Fayoum University, Fayoum, Egypt
– sequence: 3
  givenname: Masoud E.
  orcidid: 0000-0003-4853-3415
  surname: Shaheen
  fullname: Shaheen, Masoud E.
  organization: Department of Computer Science, Faculty of Computers and Artificial Intelligence, Fayoum University, Fayoum, Egypt
– sequence: 4
  givenname: Mostafa R.
  orcidid: 0000-0001-9135-3271
  surname: Kaseb
  fullname: Kaseb, Mostafa R.
  organization: Department of Computer Science, Faculty of Computers and Artificial Intelligence, Fayoum University, Fayoum, Egypt
BookMark eNp9kU9v3CAQxVGVSknTfIL0YKnn3QKDDRxT56-0UaJse0ZjPE5ZOWaLvYf99mXjVIp6KBceo_d7GvE-saMhDsTYueBLIbj9dlHXV-v1UnIJS5BKC8s_sBMpKruAEqqjd_qYnY3jhudj8qjUJ-zmcj_gS_DFE2374HEKcSgeY5b7Iqvby-t18R1Hag-ve_S_wkDFijANYXgu6n43TpSy_Mw-dtiPdPZ2n7Kf11c_6tvF6uHmrr5YLbzidlo0rQaQncXKt0Z25Ds0umkaalEKYUloryshrUaujQBbKgOqxbZBBV3ZGDhld3NuG3Hjtim8YNq7iMG9DmJ6dpim4HtythIouk4Z25DyUhslS25laSQgB2lz1tc5a5vi7x2Nk9vEXRry-i67uTXARZlddnb5FMcxUed8mF7_aUoYeie4O9Tg5hrcoQb3VkNm4R_278b_p77MVCCidwRXFizAHyHXk0w
CODEN IAECCG
CitedBy_id crossref_primary_10_1002_cpe_8081
crossref_primary_10_3233_JIFS_233579
Cites_doi 10.2991/ijndc.k.200515.007
10.1007/978-3-031-00828-3_34
10.1109/infos.2014.7036682
10.1145/1327452.1327492
10.1109/tpds.2021.3129973
10.1109/iccasit53235.2021.9633522
10.1109/kst53302.2022.9729076
10.1109/iitsi.2010.74
10.1109/netcod.2015.7176790
10.1016/j.asej.2021.06.024
10.32604/cmc.2020.011313
10.1007/s11432-018-9482-6
10.1109/iske47853.2019.9170274
10.1109/tbdata.2019.2907624
10.1007/978-3-031-16092-9_10
10.1109/clusterw.2012.25
10.1016/j.future.2018.08.015
10.1080/02564602.2016.1260498
10.1016/j.matpr.2021.07.041
10.1109/access.2020.2988796
10.1504/IJWET.2018.092401
10.1109/cscloud/edgecom.2019.00015
10.1109/ibssc47189.2019.8973044
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID 97E
ESBDL
RIA
RIE
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
DOA
DOI 10.1109/ACCESS.2023.3247190
DatabaseName IEEE Xplore (IEEE)
IEEE Xplore Open Access (Activated by CARLI)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
METADEX
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
METADEX
Computer and Information Systems Abstracts Professional
DatabaseTitleList

Materials Research Database
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2169-3536
EndPage 1
ExternalDocumentID oai_doaj_org_article_961a1ff489be4c2784250925823a0329
10_1109_ACCESS_2023_3247190
10049393
Genre orig-research
GroupedDBID 0R~
5VS
6IK
97E
AAJGR
ABAZT
ABVLG
ACGFS
ADBBV
ALMA_UNASSIGNED_HOLDINGS
BCNDV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
ESBDL
GROUPED_DOAJ
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
OK1
RIA
RIE
RNS
4.4
AAYXX
AGSQL
CITATION
EJD
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c409t-bd7332f9a6cd82fecfa87bbbeda2119e17c761297a07813954834dadba43f5b83
IEDL.DBID DOA
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000942270800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2169-3536
IngestDate Mon Dec 08 04:16:42 EST 2025
Sun Jun 29 15:26:43 EDT 2025
Tue Nov 18 22:19:02 EST 2025
Sat Nov 29 04:02:25 EST 2025
Wed Aug 27 02:14:29 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by/4.0/legalcode
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c409t-bd7332f9a6cd82fecfa87bbbeda2119e17c761297a07813954834dadba43f5b83
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-4853-3415
0000-0001-9135-3271
0000-0002-2703-5487
0000-0003-0479-0516
OpenAccessLink https://doaj.org/article/961a1ff489be4c2784250925823a0329
PQID 2780983015
PQPubID 4845423
PageCount 1
ParticipantIDs proquest_journals_2780983015
ieee_primary_10049393
crossref_primary_10_1109_ACCESS_2023_3247190
crossref_citationtrail_10_1109_ACCESS_2023_3247190
doaj_primary_oai_doaj_org_article_961a1ff489be4c2784250925823a0329
PublicationCentury 2000
PublicationDate 2023-01-01
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – month: 01
  year: 2023
  text: 2023-01-01
  day: 01
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE access
PublicationTitleAbbrev Access
PublicationYear 2023
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref30
ref11
ref10
Wadkar (ref2) 2014
(ref31) 2022
ref17
Veeraiah (ref18)
ref16
Dean (ref3) 2008; 51
ref19
White (ref9) 2012
(ref22) 2022
ref23
(ref14) 2023
ref26
ref25
ref20
ref21
(ref1) 2019
ref28
ref27
ref29
ref8
ref7
Abbas (ref24) 2008; 5
ref4
ref6
ref5
References_xml – ident: ref23
  doi: 10.2991/ijndc.k.200515.007
– ident: ref28
  doi: 10.1007/978-3-031-00828-3_34
– ident: ref4
  doi: 10.1109/infos.2014.7036682
– volume: 51
  start-page: 107
  issue: 1
  year: 2008
  ident: ref3
  article-title: MapReduce
  publication-title: Commun. ACM
  doi: 10.1145/1327452.1327492
– volume-title: HDFS Erasure Coding
  year: 2023
  ident: ref14
– ident: ref16
  doi: 10.1109/tpds.2021.3129973
– ident: ref19
  doi: 10.1109/iccasit53235.2021.9633522
– ident: ref27
  doi: 10.1109/kst53302.2022.9729076
– start-page: 197
  volume-title: Proc. Int. Conf. Inventive Comput. Technol. (ICICT)
  ident: ref18
  article-title: An efficient data duplication system based on Hadoop distributed file system
– ident: ref26
  doi: 10.1109/iitsi.2010.74
– ident: ref15
  doi: 10.1109/netcod.2015.7176790
– ident: ref12
  doi: 10.1016/j.asej.2021.06.024
– ident: ref30
  doi: 10.32604/cmc.2020.011313
– ident: ref13
  doi: 10.1007/s11432-018-9482-6
– ident: ref20
  doi: 10.1109/iske47853.2019.9170274
– volume-title: Apache Hadoop 3.3.4—WebHDFS REST API
  year: 2022
  ident: ref22
– ident: ref29
  doi: 10.1109/tbdata.2019.2907624
– start-page: 32
  volume-title: Hadoop: Definitive Guide
  year: 2012
  ident: ref9
  article-title: MapReduce
– ident: ref10
  doi: 10.1007/978-3-031-16092-9_10
– volume-title: Apache Hadoop
  year: 2019
  ident: ref1
– volume: 5
  start-page: 1
  issue: 3
  year: 2008
  ident: ref24
  article-title: Comparisons between data clustering algorithms
  publication-title: Int. Arab J. Inf. Technol.
– ident: ref8
  doi: 10.1109/clusterw.2012.25
– start-page: 29
  volume-title: Pro Apache Hadoop
  year: 2014
  ident: ref2
  article-title: Hadoop concepts
– ident: ref7
  doi: 10.1016/j.future.2018.08.015
– ident: ref5
  doi: 10.1080/02564602.2016.1260498
– ident: ref11
  doi: 10.1016/j.matpr.2021.07.041
– ident: ref25
  doi: 10.1109/access.2020.2988796
– ident: ref6
  doi: 10.1504/IJWET.2018.092401
– volume-title: Apache Hadoop Main 3.3.1 API
  year: 2022
  ident: ref31
– ident: ref17
  doi: 10.1109/cscloud/edgecom.2019.00015
– ident: ref21
  doi: 10.1109/ibssc47189.2019.8973044
SSID ssj0000816957
Score 2.292147
Snippet Data growth in recent years has been swift, leading to the emergence of big data science. Distributed File Systems (DFS) are commonly used to handle big data,...
SourceID doaj
proquest
crossref
ieee
SourceType Open Website
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms Availability
Big Data
Clustering
Computer networks
Data science
Distributed computing
Distributed processing
Feature extraction
File systems
Hadoop Distributed File System
High-Performance Distributed Computing
Machine learning
Reliability
Replicability
Replication
Replication Policy
Support vector machines
SummonAdditionalLinks – databaseName: IEEE Electronic Library (IEL)
  dbid: RIE
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LS8QwEB5UPOjBt7i-6MGj3d02bZMcdXX1oggq7C0kTSKC7Mo-_P3OpNllQRS8TUtC03xtZjJJvg_gwrGCG5uJ1KJ3TwundapztKzOnWfMl0LaIDbBHx_FYCCf4mH1cBbGORc2n7k2mWEt347qGaXKOsRuJplkq7DKedUc1lokVEhBQpY8MgtlXdm56vXwJdokEN7GuIFnNPAueZ9A0h9VVX4MxcG_9Lf_2bId2IqBZHLVIL8LK264B5tL9IL7cHfTyM0nGGXPc3NJQwScoHV_039OrtGLWbp6CLsqXRIJV9-S3seMSBTQPIDX_u1L7z6NwglpjdO1aWosZyz3Ule1Fbl3tdeCG2McIpBl0mW85hjZSK6J6ocR5xsrrLZGF4iOEewQ1oajoTuCRGAAhR1nmNe2sLzSHuMLWVmctpRENdeCfN6hqo6s4iRu8aHC7KIrVYOCIhRURKEFl4tKnw2pxt_FrwmpRVFixA43EAIVfzAlq0xn3hdCGlfUtJyKDZR5KXKmuyyXLTgg2Jae1yDWgtM58Cr-vhOF1btS4NhXHv9S7QQ2qIlNMuYU1qbjmTuD9fpr-j4Zn4cv8xs-cd5Z
  priority: 102
  providerName: IEEE
Title Dynamic Replication Policy on HDFS Based on Machine Learning Clustering
URI https://ieeexplore.ieee.org/document/10049393
https://www.proquest.com/docview/2780983015
https://doaj.org/article/961a1ff489be4c2784250925823a0329
Volume 11
WOSCitedRecordID wos000942270800001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: DOA
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: M~E
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3LT8MgGCdm8aAH42PG-Vh68GhdKbTA0VXnLjMmauKNQAFjYqbZw6N_ux-ULU1M9OKloQ2E8uPr96Dw-xA6t4QybTBPDVj3lFqlUpVDyajcOkJcwYUJySbY3R1_fhb3rVRffk9YQw_cADcQJVbYOcqFtrT2v8nAaIu84DlRGcnD0b2MiVYwFXQwx6UoWKQZwpkYXFUVjOjSZwu_BCeCYa-FW6YoMPbHFCs_9HIwNqNdtBO9xOSqebs9tGGn-2i7xR14gG6vm1zyCbjQq4W3pGH5TaA0vh49JEMwUcbfTcKWSZtENtWXpHpbeoYEKHbR0-jmsRqnMStCWkMstki1YYTkTqiyNjx3tnaKM621BXgxFhazmoHbIpjyPD7EE7oRapTRigL0mpND1Jm-T-0RSjh4RwCEJk4ZalipHDgPojSlx5hS2kP5CiBZR8pwn7niTYbQIROyQVV6VGVEtYcu1o0-GsaM36sPPfLrqp7uOjwAIZBRCORfQtBDXT9vrf4g8CGC9NDpaiJl_DbnEppngoNiK47_o-8TtOXH0yzLnKLOYra0Z2iz_ly8zmf9IJZwnXzd9MPhwm-U1uEz
linkProvider Directory of Open Access Journals
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1RT9swED4NhsT2sMHGRDc28sAjKYnt1PYjlHVFQIUEk3iz7NieJqGCoOX3c-e4VaVpk_Z2iWzF8Zf4zmf7-wAOAhfS-VqVHr17KYK1pWVoectC5Dw2SvskNiEnE3V7q6_yYfV0FiaEkDafhT6ZaS3f37dzSpUdEbuZ5pqvwetGCFZ1x7WWKRXSkNCNzNxCdaWPjodDfI0-SYT3MXKQNQ29K_4n0fRnXZU_BuPkYUbv_7NtW_Auh5LFcYf9NrwK0w_wdoVg8CP8OO0E5wuMsxfZuaKjAi7QGp-OrosT9GOeri7TvspQZMrVX8Xwbk40CmjuwM_R95vhuMzSCWWLE7ZZ6bzknEVtB61XLIY2WiWdcwExqGsdatlKjG20tET2w4n1jQtvvbMC8XGKf4L16f007EKhMITCjnM8Wi-8HNiIEYYeeJy4NEQ21wO26FDTZl5xkre4M2l-UWnToWAIBZNR6MHhstJDR6vx7-InhNSyKHFipxsIgcm_mNGD2tYxCqVdEC0tqGIDNWsU47biTPdgh2BbeV6HWA_2FsCb_AM_GaxeaYWjX_P5L9X2YXN8c3lhLs4m51_gDTW3S83swfrscR6-wkb7PPv99PgtfaUv-LjhoA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Dynamic+Replication+Policy+on+HDFS+Based+on+Machine+Learning+Clustering&rft.jtitle=IEEE+access&rft.au=Motaz+A.+Ahmed&rft.au=Mohamed+H.+Khafagy&rft.au=Masoud+E.+Shaheen&rft.au=Mostafa+R.+Kaseb&rft.date=2023-01-01&rft.pub=IEEE&rft.eissn=2169-3536&rft.volume=11&rft.spage=18551&rft.epage=18559&rft_id=info:doi/10.1109%2FACCESS.2023.3247190&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_961a1ff489be4c2784250925823a0329
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon