A Review of Data Placement and Replication Strategies Based on Machine Learning

The global increase in data volumes has brought forth the need for scalable distributed systems that can provide satisfactory quality of service. Data placement and replication are well known techniques that provide increased performance, improved fault tolerance and higher availability. These techn...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings - International Conference on Parallel and Distributed Systems pp. 278 - 285
Main Authors: Najjar, Amir, Mokadem, Riad, Pierson, Jean-Marc
Format: Conference Proceeding
Language:English
Published: IEEE 10.10.2024
Subjects:
ISSN:2690-5965
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The global increase in data volumes has brought forth the need for scalable distributed systems that can provide satisfactory quality of service. Data placement and replication are well known techniques that provide increased performance, improved fault tolerance and higher availability. These techniques often require threshold-based activation mechanisms that can vary due to the nature of the workload and the underlying system architecture. Hence, setting and adjusting those thresholds usually require human intervention. In this context, machine learning presents a promising facet to automatically define such thresholds to adapt to different workloads and architectures. In this paper, we study the data placement and replication strategies proposed in the literature that employ machine learning. We classify such strategies based on the machine learning method, the platform on which they are deployed, the dynamicity and the achieved objectives. We describe the approach applied by each strategy as well as possible limitations. In addition, we provide insights into metrics used to evaluate the strategies. We highlight the need to design data placement and replication strategies that respond better to modern needs for distributed systems. We also motivate the use of machine learning to achieve autonomy in distributed systems.
AbstractList The global increase in data volumes has brought forth the need for scalable distributed systems that can provide satisfactory quality of service. Data placement and replication are well known techniques that provide increased performance, improved fault tolerance and higher availability. These techniques often require threshold-based activation mechanisms that can vary due to the nature of the workload and the underlying system architecture. Hence, setting and adjusting those thresholds usually require human intervention. In this context, machine learning presents a promising facet to automatically define such thresholds to adapt to different workloads and architectures. In this paper, we study the data placement and replication strategies proposed in the literature that employ machine learning. We classify such strategies based on the machine learning method, the platform on which they are deployed, the dynamicity and the achieved objectives. We describe the approach applied by each strategy as well as possible limitations. In addition, we provide insights into metrics used to evaluate the strategies. We highlight the need to design data placement and replication strategies that respond better to modern needs for distributed systems. We also motivate the use of machine learning to achieve autonomy in distributed systems.
Author Mokadem, Riad
Pierson, Jean-Marc
Najjar, Amir
Author_xml – sequence: 1
  givenname: Amir
  surname: Najjar
  fullname: Najjar, Amir
  email: amir.najjar@irit.fr
  organization: Université de Toulouse,Institut de Recherche en Informatique de Toulouse (IRIT),Toulouse,France
– sequence: 2
  givenname: Riad
  surname: Mokadem
  fullname: Mokadem, Riad
  email: riad.mokadem@irit.fr
  organization: Université de Toulouse,Institut de Recherche en Informatique de Toulouse (IRIT),Toulouse,France
– sequence: 3
  givenname: Jean-Marc
  surname: Pierson
  fullname: Pierson, Jean-Marc
  email: jean-marc.pierson@irit.fr
  organization: Université de Toulouse,Institut de Recherche en Informatique de Toulouse (IRIT),Toulouse,France
BookMark eNotjN1OAjEQhavRREDewJi-wOK03W2nlwj-kGAgotdkdneKNVDI7kbj27uJXp2T7-Q7Q3GRjomFuFUwUQr83WK2ns431pgCJhp0PgGAPD8TY-88GqMKVXhrz8VAWw9Z34srMWzbTwANvTMQq6l85a_I3_IY5Jw6kus9VXzg1ElKdT-e9rGiLh6T3HQNdbyL3Mp7armWPXuh6iMmlkumJsW0uxaXgfYtj_9zJN4fH95mz9ly9bSYTZdZVM52mXas6sChNg4tVLrwlTOMPq89ow3GgWJC77VGLFVZInpNiMGBCaVxhRmJm7_fyMzbUxMP1PxsFThrEHPzC8-zUKM
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICPADS63350.2024.00044
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore Digital Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore Digital Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798331515966
EISSN 2690-5965
EndPage 285
ExternalDocumentID 10763884
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i176t-27e1dfefd37860c259c73e894d9e86f3701ea8992288b1bb8892a88f703fb3753
IEDL.DBID RIE
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001481011800034&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 01:59:30 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i176t-27e1dfefd37860c259c73e894d9e86f3701ea8992288b1bb8892a88f703fb3753
PageCount 8
ParticipantIDs ieee_primary_10763884
PublicationCentury 2000
PublicationDate 2024-Oct.-10
PublicationDateYYYYMMDD 2024-10-10
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-Oct.-10
  day: 10
PublicationDecade 2020
PublicationTitle Proceedings - International Conference on Parallel and Distributed Systems
PublicationTitleAbbrev ICPADS
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020350
Score 2.2846026
Snippet The global increase in data volumes has brought forth the need for scalable distributed systems that can provide satisfactory quality of service. Data...
SourceID ieee
SourceType Publisher
StartPage 278
SubjectTerms Costs
Data Placement
Data Replication
Distributed databases
Distributed Systems
Fault tolerance
Fault tolerant systems
Machine Learning
Quality of service
Reinforcement learning
Taxonomy
Time factors
Tuning
Unsupervised learning
Title A Review of Data Placement and Replication Strategies Based on Machine Learning
URI https://ieeexplore.ieee.org/document/10763884
WOSCitedRecordID wos001481011800034&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwMhECa28eCpPmp8h4NXdIHtMhxrq9FE6yY-0lvDLoPpZWvarb9foNsaDx68kSGBZMgwzAzfN4Rcep-FpUgd00ZolgouWXg3MwlgCyGdzSLjzfujGo1gPNZ5A1aPWBhEjJ_P8CoMYy3fzsplSJV5C_fWAJC2SEupbAXW2kRXoUTWQIB5oq8fBnl_-JJJL_VRoEgjLWf6q4dKdCF3nX9uvku6P2A8mm_czB7ZwmqfdNbdGGhjnAfkuU9XiX46c3RoakPzkCIPq1JTWT-5KVXTNSctLuiNd2OWetlT_FaJtGFc_eiSt7vb18E9a9olsClXWc2EQm4dOisVZEnp45pSSQSdWo2QOakSjgYCDy1AwYsCQAsD4LzNu0L6sOWQtKtZhUeESrRJz_ZQy55NwSjDS21kUhiDRYkCjkk3KGjyuWLEmKx1c_KH_JTshDMIdz5Pzki7ni_xnGyXX_V0Mb-I5_gN2DidkQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0UTfSEHxi_7cHr6rZddqdHRAlEwE1Ew410t1PDZTGw-Ptty4Lx4MFbM03apM3kdWb63hByazELcx6ZQCoug4gzEbh3cyAAdMaF0bFXvHnvJ8MhjMcyrcjqnguDiP7zGd65oa_l61m-dKky6-HWGwCibbLjWmdVdK1NfOWKZBUJmIXyvtdOW4-vsbBWGwfyyAtzRr-6qHgQ6dT_uf0BafzQ8Wi6AZpDsoXFEamv-zHQyj2PyUuLrlL9dGbooyoVTV2S3K1KVaHt5KZYTdeqtLigDxbINLW2gf9YibTSXP1okLfO06jdDaqGCcGUJXEZ8ASZNmi0SCAOcxvZ5IlAkJGWCLERSchQgVOiBchYlgFIrgCM9XqTCRu4nJBaMSvwlFCBOmzqJkrR1BGoRLFcKhFmSmGWI4cz0nAHNPlcaWJM1mdz_of9hux1R4P-pN8bPl-QfXcfDgFYeElq5XyJV2Q3_yqni_m1v9NvPjag2g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+International+Conference+on+Parallel+and+Distributed+Systems&rft.atitle=A+Review+of+Data+Placement+and+Replication+Strategies+Based+on+Machine+Learning&rft.au=Najjar%2C+Amir&rft.au=Mokadem%2C+Riad&rft.au=Pierson%2C+Jean-Marc&rft.date=2024-10-10&rft.pub=IEEE&rft.eissn=2690-5965&rft.spage=278&rft.epage=285&rft_id=info:doi/10.1109%2FICPADS63350.2024.00044&rft.externalDocID=10763884