A Review of Data Placement and Replication Strategies Based on Machine Learning

The global increase in data volumes has brought forth the need for scalable distributed systems that can provide satisfactory quality of service. Data placement and replication are well known techniques that provide increased performance, improved fault tolerance and higher availability. These techn...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings - International Conference on Parallel and Distributed Systems s. 278 - 285
Hlavní autori: Najjar, Amir, Mokadem, Riad, Pierson, Jean-Marc
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 10.10.2024
Predmet:
ISSN:2690-5965
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract The global increase in data volumes has brought forth the need for scalable distributed systems that can provide satisfactory quality of service. Data placement and replication are well known techniques that provide increased performance, improved fault tolerance and higher availability. These techniques often require threshold-based activation mechanisms that can vary due to the nature of the workload and the underlying system architecture. Hence, setting and adjusting those thresholds usually require human intervention. In this context, machine learning presents a promising facet to automatically define such thresholds to adapt to different workloads and architectures. In this paper, we study the data placement and replication strategies proposed in the literature that employ machine learning. We classify such strategies based on the machine learning method, the platform on which they are deployed, the dynamicity and the achieved objectives. We describe the approach applied by each strategy as well as possible limitations. In addition, we provide insights into metrics used to evaluate the strategies. We highlight the need to design data placement and replication strategies that respond better to modern needs for distributed systems. We also motivate the use of machine learning to achieve autonomy in distributed systems.
AbstractList The global increase in data volumes has brought forth the need for scalable distributed systems that can provide satisfactory quality of service. Data placement and replication are well known techniques that provide increased performance, improved fault tolerance and higher availability. These techniques often require threshold-based activation mechanisms that can vary due to the nature of the workload and the underlying system architecture. Hence, setting and adjusting those thresholds usually require human intervention. In this context, machine learning presents a promising facet to automatically define such thresholds to adapt to different workloads and architectures. In this paper, we study the data placement and replication strategies proposed in the literature that employ machine learning. We classify such strategies based on the machine learning method, the platform on which they are deployed, the dynamicity and the achieved objectives. We describe the approach applied by each strategy as well as possible limitations. In addition, we provide insights into metrics used to evaluate the strategies. We highlight the need to design data placement and replication strategies that respond better to modern needs for distributed systems. We also motivate the use of machine learning to achieve autonomy in distributed systems.
Author Mokadem, Riad
Pierson, Jean-Marc
Najjar, Amir
Author_xml – sequence: 1
  givenname: Amir
  surname: Najjar
  fullname: Najjar, Amir
  email: amir.najjar@irit.fr
  organization: Université de Toulouse,Institut de Recherche en Informatique de Toulouse (IRIT),Toulouse,France
– sequence: 2
  givenname: Riad
  surname: Mokadem
  fullname: Mokadem, Riad
  email: riad.mokadem@irit.fr
  organization: Université de Toulouse,Institut de Recherche en Informatique de Toulouse (IRIT),Toulouse,France
– sequence: 3
  givenname: Jean-Marc
  surname: Pierson
  fullname: Pierson, Jean-Marc
  email: jean-marc.pierson@irit.fr
  organization: Université de Toulouse,Institut de Recherche en Informatique de Toulouse (IRIT),Toulouse,France
BookMark eNotjN1OAjEQhavRREDewJi-wOK03W2nlwj-kGAgotdkdneKNVDI7kbj27uJXp2T7-Q7Q3GRjomFuFUwUQr83WK2ns431pgCJhp0PgGAPD8TY-88GqMKVXhrz8VAWw9Z34srMWzbTwANvTMQq6l85a_I3_IY5Jw6kus9VXzg1ElKdT-e9rGiLh6T3HQNdbyL3Mp7armWPXuh6iMmlkumJsW0uxaXgfYtj_9zJN4fH95mz9ly9bSYTZdZVM52mXas6sChNg4tVLrwlTOMPq89ow3GgWJC77VGLFVZInpNiMGBCaVxhRmJm7_fyMzbUxMP1PxsFThrEHPzC8-zUKM
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICPADS63350.2024.00044
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore (IEEE/IET Electronic Library - IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore (IEEE/IET Electronic Library - IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798331515966
EISSN 2690-5965
EndPage 285
ExternalDocumentID 10763884
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i176t-27e1dfefd37860c259c73e894d9e86f3701ea8992288b1bb8892a88f703fb3753
IEDL.DBID RIE
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001481011800034&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 01:59:30 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i176t-27e1dfefd37860c259c73e894d9e86f3701ea8992288b1bb8892a88f703fb3753
PageCount 8
ParticipantIDs ieee_primary_10763884
PublicationCentury 2000
PublicationDate 2024-Oct.-10
PublicationDateYYYYMMDD 2024-10-10
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-Oct.-10
  day: 10
PublicationDecade 2020
PublicationTitle Proceedings - International Conference on Parallel and Distributed Systems
PublicationTitleAbbrev ICPADS
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020350
Score 2.2843595
Snippet The global increase in data volumes has brought forth the need for scalable distributed systems that can provide satisfactory quality of service. Data...
SourceID ieee
SourceType Publisher
StartPage 278
SubjectTerms Costs
Data Placement
Data Replication
Distributed databases
Distributed Systems
Fault tolerance
Fault tolerant systems
Machine Learning
Quality of service
Reinforcement learning
Taxonomy
Time factors
Tuning
Unsupervised learning
Title A Review of Data Placement and Replication Strategies Based on Machine Learning
URI https://ieeexplore.ieee.org/document/10763884
WOSCitedRecordID wos001481011800034&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxcBUHkW85YE1EMdu7IylBYEEJRIPdav8OKMuKWpTfj9nNy1iYGCLLlEi3fly-u7xHSGXTvIenh2dWAynibCIUwyaPclV7oHrPEtjN-H7oxyN1HhclM2wepyFAYDYfAZX4TLW8t3MLkOqDD0cvUEp0SItKeVqWGuDrkKJrBkBZmlx_TAo-8OXnKMUUWAmIi2n-LVDJYaQu84_P75Luj_DeLTchJk9sgXVPumstzHQxjkPyHOfrhL9dObpUNealiFFHt5KdeXw5qZUTdectLCgNxjGHEXZU2yrBNowrn50ydvd7evgPmnWJSRTJvM6ySQw58E7LlWeWsQ1VnJQhXAFoOa5TBloFXholTLMGKWKTCvl0ee94QhbDkm7mlVwRCg4HvJTtvDWiMx6BT2DDzChrbDWwjHpBgVNPleMGJO1bk7-kJ-SnWCD8M9n6Rlp1_MlnJNt-1VPF_OLaMdv4oSdmQ
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagIMFUHkW88cAaiGM3ccZSqFrRlkgU1K1y7DPqkqI25fdzdtMiBga26BLZ0l1Op3t83xFyaxLexH9HBRrDaSA05ik5mj2IZWyBqzgK_TThez8ZDuV4nGYVWN1jYQDAD5_BnXv0vXwz00tXKkMPR2-QUmyTnaYQEVvBtTb5lWuSVSBgFqb3vXbWenyNOUoxD4yEJ-YUv7ao-CDSqf_z-gPS-IHj0WwTaA7JFhRHpL7ex0Ar9zwmLy26KvXTmaWPqlQ0c0VydypVhcGXm2Y1XbPSwoI-YCAzFGUDP1gJtOJc_WiQt87TqN0NqoUJwZQlcRlECTBjwRqeyDjUmNnohINMhUkBdc-TkIGSjolWypzluZRppKS06PU255i4nJBaMSvglFAw3FWodGp1LiJtJTRz_IAJpYXWGs5Iwylo8rnixJisdXP-h_yG7HVHg_6k3xs-X5B9Zw8XAVh4SWrlfAlXZFd_ldPF_Nrb9BtFMaDg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+International+Conference+on+Parallel+and+Distributed+Systems&rft.atitle=A+Review+of+Data+Placement+and+Replication+Strategies+Based+on+Machine+Learning&rft.au=Najjar%2C+Amir&rft.au=Mokadem%2C+Riad&rft.au=Pierson%2C+Jean-Marc&rft.date=2024-10-10&rft.pub=IEEE&rft.eissn=2690-5965&rft.spage=278&rft.epage=285&rft_id=info:doi/10.1109%2FICPADS63350.2024.00044&rft.externalDocID=10763884