Correlation-Aware Stripe Organization for Efficient Writes in Erasure-Coded Storage: Algorithms and Evaluation

Erasure coding has been extensively employed for data availability protection in production storage systems by maintaining a low degree of data redundancy. However, how to mitigate the parity update overhead of partial stripe writes in erasure-coded storage systems is still a critical concern. In th...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on parallel and distributed systems Ročník 30; číslo 7; s. 1552 - 1564
Hlavní autoři: Shen, Zhirong, Lee, Patrick P. C., Shu, Jiwu, Guo, Wenzhong
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.07.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1045-9219, 1558-2183
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Erasure coding has been extensively employed for data availability protection in production storage systems by maintaining a low degree of data redundancy. However, how to mitigate the parity update overhead of partial stripe writes in erasure-coded storage systems is still a critical concern. In this paper, we study this problem from two new perspectives: data correlation and stripe organization. We propose \mathsf{CASO}CASO, a correlation-aware stripe organization algorithm, which captures data correlation of a data access stream and uses the data correlation characteristics for stripe organization. It packs correlated data into a small number of stripes to reduce the incurred I/Os in partial stripe writes, and further organizes uncorrelated data into stripes to leverage the spatial locality in later access. We implement \mathsf{CASO}CASO over Reed-Solomon codes and Azure's Local Reconstruction Codes, and show via extensive trace-driven evaluation that \mathsf{CASO}CASO reduces up to 29.8 percent of parity updates and reduces the write time by up to 46.7 percent.
AbstractList Erasure coding has been extensively employed for data availability protection in production storage systems by maintaining a low degree of data redundancy. However, how to mitigate the parity update overhead of partial stripe writes in erasure-coded storage systems is still a critical concern. In this paper, we study this problem from two new perspectives: data correlation and stripe organization. We propose \mathsf{CASO}CASO, a correlation-aware stripe organization algorithm, which captures data correlation of a data access stream and uses the data correlation characteristics for stripe organization. It packs correlated data into a small number of stripes to reduce the incurred I/Os in partial stripe writes, and further organizes uncorrelated data into stripes to leverage the spatial locality in later access. We implement \mathsf{CASO}CASO over Reed-Solomon codes and Azure's Local Reconstruction Codes, and show via extensive trace-driven evaluation that \mathsf{CASO}CASO reduces up to 29.8 percent of parity updates and reduces the write time by up to 46.7 percent.
Author Lee, Patrick P. C.
Guo, Wenzhong
Shen, Zhirong
Shu, Jiwu
Author_xml – sequence: 1
  givenname: Zhirong
  orcidid: 0000-0003-2673-5868
  surname: Shen
  fullname: Shen, Zhirong
  email: zhirong.shen2601@gmail.com
  organization: Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong
– sequence: 2
  givenname: Patrick P. C.
  orcidid: 0000-0002-4501-4364
  surname: Lee
  fullname: Lee, Patrick P. C.
  email: pclee@cse.cuhk.edu.hk
  organization: Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong
– sequence: 3
  givenname: Jiwu
  surname: Shu
  fullname: Shu, Jiwu
  email: shujw@tsinghua.edu.cn
  organization: Department of Computer Science and Technology, Tsinghua University, Beijing, China
– sequence: 4
  givenname: Wenzhong
  orcidid: 0000-0003-4118-8823
  surname: Guo
  fullname: Guo, Wenzhong
  email: guowenzhong@fzu.edu.cn
  organization: College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China
BookMark eNp9kMFuEzEQhi1UJNrCAyAuljhvOrbjjc0tCoEiVSpSiziuJrvj4Gprh_EGBE-Pk1QcOKA5eCT_34zmuxBnKScS4rWCmVLgr-4_v7-baVBupp2H1thn4lxZ6xqtnDmrPcxt47XyL8RFKQ8Aam5hfi7SKjPTiFPMqVn-RCZ5N3HckbzlLab4-_gjQ2a5DiH2kdIkv3KcqMiY5Jqx7JmaVR5oqGRm3NI7uRy3uWa-PRaJaZDrHzjuj4NeiucBx0Kvnt5L8eXD-n513dzcfvy0Wt40vfZmajQZ18PgaymzoblXfUBYwCagg4A4WLA-9Bs0CzL1kKHFQQNoq9G1i00wl-Ltae6O8_c9lal7yHtOdWWntfFtFdT6mlKnVM-5FKbQ7Tg-Iv_qFHQHrd1Ba3fQ2j1prcziH6aP0_G2iTGO_yXfnMhIRH83Oeuds2D-AB3jiPU
CODEN ITDSEO
CitedBy_id crossref_primary_10_1145_3617998
crossref_primary_10_1109_TC_2025_3575914
crossref_primary_10_1109_TC_2023_3271064
crossref_primary_10_1007_s10766_024_00773_0
crossref_primary_10_1109_ACCESS_2020_3028381
crossref_primary_10_1155_2022_5392474
Cites_doi 10.1109/18.746809
10.1109/DSN.2015.24
10.1109/SRDS.2017.18
10.1109/SRDS.2015.20
10.1109/IPDPS.2011.78
10.1145/1416944.1416949
10.1109/DSN.2014.57
10.1109/DSN.2011.5958220
10.1109/TC.2007.70830
10.14778/2536222.2536234
10.1137/0108018
10.1109/SRDS.2016.041
10.1016/j.ipl.2004.10.009
10.1145/971701.50214
10.1109/MSST.2010.5496972
10.1145/176979.176981
10.14778/2535573.2488339
10.1007/3-540-45748-8_31
10.1109/MSST.2011.5937230
10.1109/TPDS.2016.2525770
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TPDS.2018.2890635
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1558-2183
EndPage 1564
ExternalDocumentID 10_1109_TPDS_2018_2890635
8598850
Genre orig-research
GrantInformation_xml – fundername: Fujian Provincial Natural Science Foundation
  grantid: 2017J05102
– fundername: Research Grants Council of Hong Kong
  grantid: GRF 14216316; CRF C7036-15G
– fundername: National Natural Science Foundation of China
  grantid: 61602120; 61672159; U1705262; 61832011
  funderid: 10.13039/501100001809
GroupedDBID --Z
-~X
.DC
0R~
29I
4.4
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
TN5
TWZ
UHB
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c293t-2e38c0d9d9d13be491cfa070bfa80faad5059fcba37e3450d6ad200252a867bf3
IEDL.DBID RIE
ISICitedReferencesCount 8
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000472199900008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1045-9219
IngestDate Sun Nov 30 04:50:43 EST 2025
Sat Nov 29 06:06:47 EST 2025
Tue Nov 18 21:44:59 EST 2025
Wed Aug 27 05:56:09 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c293t-2e38c0d9d9d13be491cfa070bfa80faad5059fcba37e3450d6ad200252a867bf3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-4501-4364
0000-0003-4118-8823
0000-0003-2673-5868
PQID 2239663569
PQPubID 85437
PageCount 13
ParticipantIDs proquest_journals_2239663569
ieee_primary_8598850
crossref_citationtrail_10_1109_TPDS_2018_2890635
crossref_primary_10_1109_TPDS_2018_2890635
PublicationCentury 2000
PublicationDate 2019-07-01
PublicationDateYYYYMMDD 2019-07-01
PublicationDate_xml – month: 07
  year: 2019
  text: 2019-07-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on parallel and distributed systems
PublicationTitleAbbrev TPDS
PublicationYear 2019
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
zhang (ref34) 2015
ref12
ref15
ref36
ref14
ref31
ref10
soundararajan (ref27) 2008
chan (ref9) 2014
ref1
(ref39) 0
ref17
ref38
ref16
li (ref37) 2004; 6
ref19
ref18
huang (ref2) 2012
plank (ref21) 2008
khan (ref20) 2012
plank (ref33) 2009
li (ref11) 2004
(ref22) 0
li (ref32) 2016
ref25
ref41
schroeder (ref4) 2007
ding (ref26) 2007
ref29
ref8
corbett (ref13) 2004
ref7
xia (ref28) 2015
ford (ref5) 2010
pinheiro (ref3) 2007
ref6
miranda (ref30) 2014
ref40
lueth (ref24) 2004
(ref23) 0
References_xml – year: 2004
  ident: ref24
  article-title: RAID-DP: Network appliance implementation of RAID double parity for data protection
– start-page: 253
  year: 2009
  ident: ref33
  article-title: A performance evaluation and examination of open-source erasure coding libraries for storage
  publication-title: Proc 7th Conf File Storage Technol
– year: 2008
  ident: ref21
  article-title: Jerasure: A library in C/C++ facilitating erasure coding for storage applications-version 1.2
– ident: ref14
  doi: 10.1109/18.746809
– start-page: 61
  year: 2010
  ident: ref5
  article-title: Availability in globally distributed storage systems
  publication-title: Proc 9th USENIX Conf Operating Syst Des Implementation
– start-page: 133
  year: 2014
  ident: ref30
  article-title: CRAID: Online RAID upgrades using dynamic hot data reorganization
  publication-title: Proc USENIX FAST
– ident: ref29
  doi: 10.1109/DSN.2015.24
– year: 0
  ident: ref22
– start-page: 213
  year: 2015
  ident: ref28
  article-title: A tale of two erasure codes in HDFS
  publication-title: Proc Proc 13th USENIX Conf File Storage Technol
– year: 0
  ident: ref23
– ident: ref1
  doi: 10.1109/SRDS.2017.18
– ident: ref40
  doi: 10.1109/SRDS.2015.20
– ident: ref17
  doi: 10.1109/IPDPS.2011.78
– start-page: 2
  year: 2012
  ident: ref2
  article-title: Erasure coding in windows azure storage
  publication-title: Proc USENIX Conf Annu Tech Conf
– ident: ref31
  doi: 10.1145/1416944.1416949
– ident: ref18
  doi: 10.1109/DSN.2014.57
– start-page: 1
  year: 2004
  ident: ref13
  article-title: Row-diagonal parity for double disk failure correction
  publication-title: Proc 3rd USENIX Conf File Storage Technol
– start-page: 377
  year: 2008
  ident: ref27
  article-title: Context-Aware Prefetching at the Storage Server
  publication-title: Proc USENIX Annu Tech Conf
– volume: 6
  start-page: 289
  year: 2004
  ident: ref37
  article-title: CP-miner: A tool for finding copy-paste and related bugs in operating system code
  publication-title: Proc 6th Conf Symp Opearting Syst Des Implementation
– ident: ref16
  doi: 10.1109/DSN.2011.5958220
– ident: ref15
  doi: 10.1109/TC.2007.70830
– start-page: 2
  year: 2007
  ident: ref3
  article-title: Failure trends in a large disk drive population
  publication-title: Proc 5th USENIX Conf File Storage Technol
– ident: ref38
  doi: 10.14778/2536222.2536234
– ident: ref12
  doi: 10.1137/0108018
– ident: ref41
  doi: 10.1109/SRDS.2016.041
– ident: ref25
  doi: 10.1016/j.ipl.2004.10.009
– ident: ref35
  doi: 10.1145/971701.50214
– start-page: 163
  year: 2014
  ident: ref9
  article-title: Parity logging with reserved space: Towards efficient updates and recovery in erasure-coded clustered storage
  publication-title: Proc USENIX FAST
– ident: ref36
  doi: 10.1109/MSST.2010.5496972
– ident: ref8
  doi: 10.1145/176979.176981
– ident: ref7
  doi: 10.14778/2535573.2488339
– year: 2007
  ident: ref26
  article-title: DiskSeen: Exploiting disk layout and access history to enhance I/O prefetch
  publication-title: Proc USENIX Annu Tech Conf Proc USENIX Annu Tech Conf
– year: 2007
  ident: ref4
  article-title: Disk failures in the real world: What does an mttf of 1, 000, 000 hours mean to you?
  publication-title: Proc of the 5th USENIX Conf on File and Storage Technologies
– ident: ref6
  doi: 10.1007/3-540-45748-8_31
– year: 2015
  ident: ref34
  article-title: Native erasure coding support inside HDFS
  publication-title: Proc Strata + Hadoop World
– ident: ref19
  doi: 10.1109/MSST.2011.5937230
– start-page: 125
  year: 2016
  ident: ref32
  article-title: Access characteristic guided read and write cost regulation for performance improvement on flash memory
  publication-title: Proc USENIX FAST
– start-page: 173
  year: 2004
  ident: ref11
  article-title: C-Miner: Mining block correlations in storage systems
  publication-title: Proc 3rd USENIX Conf File Storage Technol
– year: 0
  ident: ref39
– ident: ref10
  doi: 10.1109/TPDS.2016.2525770
– start-page: 251
  year: 2012
  ident: ref20
  article-title: Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads
  publication-title: Proc 10th USENIX Conf File Storage Technol
SSID ssj0014504
Score 2.3089964
Snippet Erasure coding has been extensively employed for data availability protection in production storage systems by maintaining a low degree of data redundancy....
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1552
SubjectTerms Algorithms
Correlation
Data correlation
Distributed databases
Encoding
erasure code
Maintenance engineering
Organizations
Parity
partial stripe writes
Redundancy
Reed-Solomon codes
Storage systems
stripe organization
Title Correlation-Aware Stripe Organization for Efficient Writes in Erasure-Coded Storage: Algorithms and Evaluation
URI https://ieeexplore.ieee.org/document/8598850
https://www.proquest.com/docview/2239663569
Volume 30
WOSCitedRecordID wos000472199900008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1558-2183
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014504
  issn: 1045-9219
  databaseCode: RIE
  dateStart: 19900101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9tAEB4F1EM5lBaoCE3RHnqqusTxYx-9RWkiDiiKFEDcrH2mSOBUSYC_z-zacZGoKiFffNixLX87szM7s_MBfAtG3zCNYQnLOc0tTylGtooWqFw-z73hsR3D9QWfTsXNjZx14Ed7FsY5F4vP3Fm4jbl8uzQPYausLwopRAjQdzhn9VmtNmOQF5EqEKOLgkpUwyaDOUhk_3L2ax6KuMRZyKqxyOz2dw2KpCqvLHFcXib7b_uwj_ChcSPJsMb9E3RcdQD7W4oG0mjsAey96Dd4CNUocHHU1W90-KRWjsw3aDUceXkkk6AfS8axtQS-lWD8ju4oua3IeBX3E-loaZ1FSZw9C_eTDO8WSxzz-35NVGXJuO0ffgRXk_Hl6Jw2hAvU4Kq_oanLhEmsxGuQaZfLgfEKbYL2SiReKYvukvRGq4y7DH-4ZcqGIo8iVYJx7bPPsFstK3cMhEstmFFeMW9znXqVaqe1zQRTwggpupBsIShN0408kGLclTEqSWQZUCsDamWDWhe-tyJ_6lYc_xt8GGBqBzYIdaG3xblslHVdoockg-PF5Mm_pb7Ae3y2rKt0e7C7WT24r_DOPG5u16vTOA-fAQJ23GY
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT9wwEB4hWgk4lJaH2HZbfOipwpBNHMfubbVdRMV2hcSCuEV-bpFottpH-_c7drIpEqhSlUsOHjnK5xnPeMbzAXwMRt9wjWEJZwVltkgpRraK5qhcnjFvitiO4XZUjMfi7k5ebcBJexfGOReLz9xpeI25fDszq3BUdiZyKUQI0F_kjKVJfVurzRmwPJIFYnyRU4mK2OQwe4k8m1x9uQ5lXOI05NV45Hb7uwtFWpUntjhuMOe7__dpr-FV40iSfo38G9hw1R7srkkaSKOze7DzqOPgPlSDwMZR17_R_m81d-R6iXbDkceXMgl6smQYm0vgrAQjeHRIyX1FhvN4okgHM-ssSuL6mbrPpP8wneGY7z8WRFWWDNsO4gdwcz6cDC5oQ7lADe77S5q6TJjESnx6mXZM9oxXaBW0VyLxSll0mKQ3WmWFy_CHW65sKPPIUyV4oX12CJvVrHJHQAqpBTfKK-4t06lXqXZa20xwJYyQogPJGoLSNP3IAy3GQxnjkkSWAbUyoFY2qHXgUyvys27G8a_B-wGmdmCDUAe6a5zLRl0XJfpIMrheXL59XuoYti4m30bl6Ov48h1s4zyyrtntwuZyvnLv4aX5tbxfzD_ENfkHm6ffrQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Correlation-Aware+Stripe+Organization+for+Efficient+Writes+in+Erasure-Coded+Storage%3A+Algorithms+and+Evaluation&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Shen%2C+Zhirong&rft.au=Lee%2C+Patrick+P.+C.&rft.au=Shu%2C+Jiwu&rft.au=Guo%2C+Wenzhong&rft.date=2019-07-01&rft.pub=IEEE&rft.issn=1045-9219&rft.volume=30&rft.issue=7&rft.spage=1552&rft.epage=1564&rft_id=info:doi/10.1109%2FTPDS.2018.2890635&rft.externalDocID=8598850
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon