Correlation-Aware Stripe Organization for Efficient Writes in Erasure-Coded Storage: Algorithms and Evaluation
Erasure coding has been extensively employed for data availability protection in production storage systems by maintaining a low degree of data redundancy. However, how to mitigate the parity update overhead of partial stripe writes in erasure-coded storage systems is still a critical concern. In th...
Uložené v:
| Vydané v: | IEEE transactions on parallel and distributed systems Ročník 30; číslo 7; s. 1552 - 1564 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
New York
IEEE
01.07.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Predmet: | |
| ISSN: | 1045-9219, 1558-2183 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Erasure coding has been extensively employed for data availability protection in production storage systems by maintaining a low degree of data redundancy. However, how to mitigate the parity update overhead of partial stripe writes in erasure-coded storage systems is still a critical concern. In this paper, we study this problem from two new perspectives: data correlation and stripe organization. We propose \mathsf{CASO}CASO, a correlation-aware stripe organization algorithm, which captures data correlation of a data access stream and uses the data correlation characteristics for stripe organization. It packs correlated data into a small number of stripes to reduce the incurred I/Os in partial stripe writes, and further organizes uncorrelated data into stripes to leverage the spatial locality in later access. We implement \mathsf{CASO}CASO over Reed-Solomon codes and Azure's Local Reconstruction Codes, and show via extensive trace-driven evaluation that \mathsf{CASO}CASO reduces up to 29.8 percent of parity updates and reduces the write time by up to 46.7 percent. |
|---|---|
| AbstractList | Erasure coding has been extensively employed for data availability protection in production storage systems by maintaining a low degree of data redundancy. However, how to mitigate the parity update overhead of partial stripe writes in erasure-coded storage systems is still a critical concern. In this paper, we study this problem from two new perspectives: data correlation and stripe organization. We propose $\mathsf{CASO}$CASO, a correlation-aware stripe organization algorithm, which captures data correlation of a data access stream and uses the data correlation characteristics for stripe organization. It packs correlated data into a small number of stripes to reduce the incurred I/Os in partial stripe writes, and further organizes uncorrelated data into stripes to leverage the spatial locality in later access. We implement $\mathsf{CASO}$CASO over Reed-Solomon codes and Azure's Local Reconstruction Codes, and show via extensive trace-driven evaluation that $\mathsf{CASO}$CASO reduces up to 29.8 percent of parity updates and reduces the write time by up to 46.7 percent. |
| Author | Lee, Patrick P. C. Guo, Wenzhong Shen, Zhirong Shu, Jiwu |
| Author_xml | – sequence: 1 givenname: Zhirong orcidid: 0000-0003-2673-5868 surname: Shen fullname: Shen, Zhirong email: zhirong.shen2601@gmail.com organization: Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong – sequence: 2 givenname: Patrick P. C. orcidid: 0000-0002-4501-4364 surname: Lee fullname: Lee, Patrick P. C. email: pclee@cse.cuhk.edu.hk organization: Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong – sequence: 3 givenname: Jiwu surname: Shu fullname: Shu, Jiwu email: shujw@tsinghua.edu.cn organization: Department of Computer Science and Technology, Tsinghua University, Beijing, China – sequence: 4 givenname: Wenzhong orcidid: 0000-0003-4118-8823 surname: Guo fullname: Guo, Wenzhong email: guowenzhong@fzu.edu.cn organization: College of Mathematics and Computer Science, Fuzhou University, Fuzhou, China |
| BookMark | eNp9kMFuEzEQhi1UJNrCAyAuljhvOrbjjc0tCoEiVSpSiziuJrvj4Gprh_EGBE-Pk1QcOKA5eCT_34zmuxBnKScS4rWCmVLgr-4_v7-baVBupp2H1thn4lxZ6xqtnDmrPcxt47XyL8RFKQ8Aam5hfi7SKjPTiFPMqVn-RCZ5N3HckbzlLab4-_gjQ2a5DiH2kdIkv3KcqMiY5Jqx7JmaVR5oqGRm3NI7uRy3uWa-PRaJaZDrHzjuj4NeiucBx0Kvnt5L8eXD-n513dzcfvy0Wt40vfZmajQZ18PgaymzoblXfUBYwCagg4A4WLA-9Bs0CzL1kKHFQQNoq9G1i00wl-Ltae6O8_c9lal7yHtOdWWntfFtFdT6mlKnVM-5FKbQ7Tg-Iv_qFHQHrd1Ba3fQ2j1prcziH6aP0_G2iTGO_yXfnMhIRH83Oeuds2D-AB3jiPU |
| CODEN | ITDSEO |
| CitedBy_id | crossref_primary_10_1145_3617998 crossref_primary_10_1109_TC_2025_3575914 crossref_primary_10_1109_TC_2023_3271064 crossref_primary_10_1007_s10766_024_00773_0 crossref_primary_10_1109_ACCESS_2020_3028381 crossref_primary_10_1155_2022_5392474 |
| Cites_doi | 10.1109/18.746809 10.1109/DSN.2015.24 10.1109/SRDS.2017.18 10.1109/SRDS.2015.20 10.1109/IPDPS.2011.78 10.1145/1416944.1416949 10.1109/DSN.2014.57 10.1109/DSN.2011.5958220 10.1109/TC.2007.70830 10.14778/2536222.2536234 10.1137/0108018 10.1109/SRDS.2016.041 10.1016/j.ipl.2004.10.009 10.1145/971701.50214 10.1109/MSST.2010.5496972 10.1145/176979.176981 10.14778/2535573.2488339 10.1007/3-540-45748-8_31 10.1109/MSST.2011.5937230 10.1109/TPDS.2016.2525770 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TPDS.2018.2890635 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1558-2183 |
| EndPage | 1564 |
| ExternalDocumentID | 10_1109_TPDS_2018_2890635 8598850 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Fujian Provincial Natural Science Foundation grantid: 2017J05102 – fundername: Research Grants Council of Hong Kong grantid: GRF 14216316; CRF C7036-15G – fundername: National Natural Science Foundation of China grantid: 61602120; 61672159; U1705262; 61832011 funderid: 10.13039/501100001809 |
| GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS TN5 TWZ UHB AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c293t-2e38c0d9d9d13be491cfa070bfa80faad5059fcba37e3450d6ad200252a867bf3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 8 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000472199900008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1045-9219 |
| IngestDate | Sun Nov 30 04:50:43 EST 2025 Sat Nov 29 06:06:47 EST 2025 Tue Nov 18 21:44:59 EST 2025 Wed Aug 27 05:56:09 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 7 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c293t-2e38c0d9d9d13be491cfa070bfa80faad5059fcba37e3450d6ad200252a867bf3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-4501-4364 0000-0003-4118-8823 0000-0003-2673-5868 |
| PQID | 2239663569 |
| PQPubID | 85437 |
| PageCount | 13 |
| ParticipantIDs | proquest_journals_2239663569 ieee_primary_8598850 crossref_citationtrail_10_1109_TPDS_2018_2890635 crossref_primary_10_1109_TPDS_2018_2890635 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-07-01 |
| PublicationDateYYYYMMDD | 2019-07-01 |
| PublicationDate_xml | – month: 07 year: 2019 text: 2019-07-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on parallel and distributed systems |
| PublicationTitleAbbrev | TPDS |
| PublicationYear | 2019 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref35 zhang (ref34) 2015 ref12 ref15 ref36 ref14 ref31 ref10 soundararajan (ref27) 2008 chan (ref9) 2014 ref1 (ref39) 0 ref17 ref38 ref16 li (ref37) 2004; 6 ref19 ref18 huang (ref2) 2012 plank (ref21) 2008 khan (ref20) 2012 plank (ref33) 2009 li (ref11) 2004 (ref22) 0 li (ref32) 2016 ref25 ref41 schroeder (ref4) 2007 ding (ref26) 2007 ref29 ref8 corbett (ref13) 2004 ref7 xia (ref28) 2015 ford (ref5) 2010 pinheiro (ref3) 2007 ref6 miranda (ref30) 2014 ref40 lueth (ref24) 2004 (ref23) 0 |
| References_xml | – year: 2004 ident: ref24 article-title: RAID-DP: Network appliance implementation of RAID double parity for data protection – start-page: 253 year: 2009 ident: ref33 article-title: A performance evaluation and examination of open-source erasure coding libraries for storage publication-title: Proc 7th Conf File Storage Technol – year: 2008 ident: ref21 article-title: Jerasure: A library in C/C++ facilitating erasure coding for storage applications-version 1.2 – ident: ref14 doi: 10.1109/18.746809 – start-page: 61 year: 2010 ident: ref5 article-title: Availability in globally distributed storage systems publication-title: Proc 9th USENIX Conf Operating Syst Des Implementation – start-page: 133 year: 2014 ident: ref30 article-title: CRAID: Online RAID upgrades using dynamic hot data reorganization publication-title: Proc USENIX FAST – ident: ref29 doi: 10.1109/DSN.2015.24 – year: 0 ident: ref22 – start-page: 213 year: 2015 ident: ref28 article-title: A tale of two erasure codes in HDFS publication-title: Proc Proc 13th USENIX Conf File Storage Technol – year: 0 ident: ref23 – ident: ref1 doi: 10.1109/SRDS.2017.18 – ident: ref40 doi: 10.1109/SRDS.2015.20 – ident: ref17 doi: 10.1109/IPDPS.2011.78 – start-page: 2 year: 2012 ident: ref2 article-title: Erasure coding in windows azure storage publication-title: Proc USENIX Conf Annu Tech Conf – ident: ref31 doi: 10.1145/1416944.1416949 – ident: ref18 doi: 10.1109/DSN.2014.57 – start-page: 1 year: 2004 ident: ref13 article-title: Row-diagonal parity for double disk failure correction publication-title: Proc 3rd USENIX Conf File Storage Technol – start-page: 377 year: 2008 ident: ref27 article-title: Context-Aware Prefetching at the Storage Server publication-title: Proc USENIX Annu Tech Conf – volume: 6 start-page: 289 year: 2004 ident: ref37 article-title: CP-miner: A tool for finding copy-paste and related bugs in operating system code publication-title: Proc 6th Conf Symp Opearting Syst Des Implementation – ident: ref16 doi: 10.1109/DSN.2011.5958220 – ident: ref15 doi: 10.1109/TC.2007.70830 – start-page: 2 year: 2007 ident: ref3 article-title: Failure trends in a large disk drive population publication-title: Proc 5th USENIX Conf File Storage Technol – ident: ref38 doi: 10.14778/2536222.2536234 – ident: ref12 doi: 10.1137/0108018 – ident: ref41 doi: 10.1109/SRDS.2016.041 – ident: ref25 doi: 10.1016/j.ipl.2004.10.009 – ident: ref35 doi: 10.1145/971701.50214 – start-page: 163 year: 2014 ident: ref9 article-title: Parity logging with reserved space: Towards efficient updates and recovery in erasure-coded clustered storage publication-title: Proc USENIX FAST – ident: ref36 doi: 10.1109/MSST.2010.5496972 – ident: ref8 doi: 10.1145/176979.176981 – ident: ref7 doi: 10.14778/2535573.2488339 – year: 2007 ident: ref26 article-title: DiskSeen: Exploiting disk layout and access history to enhance I/O prefetch publication-title: Proc USENIX Annu Tech Conf Proc USENIX Annu Tech Conf – year: 2007 ident: ref4 article-title: Disk failures in the real world: What does an mttf of 1, 000, 000 hours mean to you? publication-title: Proc of the 5th USENIX Conf on File and Storage Technologies – ident: ref6 doi: 10.1007/3-540-45748-8_31 – year: 2015 ident: ref34 article-title: Native erasure coding support inside HDFS publication-title: Proc Strata + Hadoop World – ident: ref19 doi: 10.1109/MSST.2011.5937230 – start-page: 125 year: 2016 ident: ref32 article-title: Access characteristic guided read and write cost regulation for performance improvement on flash memory publication-title: Proc USENIX FAST – start-page: 173 year: 2004 ident: ref11 article-title: C-Miner: Mining block correlations in storage systems publication-title: Proc 3rd USENIX Conf File Storage Technol – year: 0 ident: ref39 – ident: ref10 doi: 10.1109/TPDS.2016.2525770 – start-page: 251 year: 2012 ident: ref20 article-title: Rethinking erasure codes for cloud file systems: Minimizing I/O for recovery and degraded reads publication-title: Proc 10th USENIX Conf File Storage Technol |
| SSID | ssj0014504 |
| Score | 2.3090909 |
| Snippet | Erasure coding has been extensively employed for data availability protection in production storage systems by maintaining a low degree of data redundancy.... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1552 |
| SubjectTerms | Algorithms Correlation Data correlation Distributed databases Encoding erasure code Maintenance engineering Organizations Parity partial stripe writes Redundancy Reed-Solomon codes Storage systems stripe organization |
| Title | Correlation-Aware Stripe Organization for Efficient Writes in Erasure-Coded Storage: Algorithms and Evaluation |
| URI | https://ieeexplore.ieee.org/document/8598850 https://www.proquest.com/docview/2239663569 |
| Volume | 30 |
| WOSCitedRecordID | wos000472199900008&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2183 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014504 issn: 1045-9219 databaseCode: RIE dateStart: 19900101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT9wwEB1RxKEcoEARC7TygVNVg_NpD7fVdhEnhASVuEV2PAEkmq12l_bvM3ayKVIrpCqXKLITK88fbzzjeQAnpbGZR0fSpUrLHFMtXVY30ilf6LT0qBoXxSb01ZW5u8PrNfg6nIUhohh8RqfhNvry_ax-DltlZ6ZAY4KB_k7rsjurNXgM8iJKBbJ1UUjkYdh7MBOFZ7fX325CEJc5DV61Miq7_VmDoqjKXzNxXF4utv-vYR9gq6eRYtzhvgNr1O7C9kqiQfQjdhc2X-Ub3IN2ErQ4uug3Of5t5yRuljxrkHh9JFMwjxXTmFqCvyrYfmc6Kh5bMZ3H_UQ5mXnyXJN7zz2di_HT_YzLPPxYCNt6MR3yh3-E7xfT28ml7AUXZM2r_lKmlJlaeeQryRzlmNSN5TnBNdaoxlrPdAmb2tlMU8Y_3JfWhyCPIrWm1K7J9mG9nbV0AALRpakPZNHr3Oc55omyOXmX8HNLZgRqBUFV99nIgyjGUxWtEoVVQK0KqFU9aiP4MlT52aXieKvwXoBpKNgjNILjFc5VP1gXFTMkDMSrxMN_1zqC9_xu7KJ0j2F9OX-mT7BR_1o-LuafYz98AeUk3Dw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEB1VBQl6oNCCWCjgAyeEWydxEg-31bJVEWVVqYvUW2THk1KpZKvdLfx9xk42VAIhoVyiyJYjP3-88YznAbwtjM08OpIuVaXUmJbSZXUjnfJ5mRYeVeOi2EQ5m5mLCzzbgvfDXRgiisFndBheoy_fL-rbcFR2ZHI0Jhjo93KtU9Xd1hp8BjqPYoFsX-QSeSL2PsxE4dH87ON5COMyh8GvVkRtt9-7UJRV-WMtjhvM8e7__dpjeNQTSTHukH8CW9Tuwe5GpEH0c3YPdu5kHNyHdhLUOLr4Nzn-aZckzte8bpC4eylTMJMV05hcglsVbMEzIRVXrZgu44minCw8ea7J4-eSPojx9eWCy3z7vhK29WI6ZBB_Cl-Pp_PJiewlF2TN-_5appSZWnnkJ8kcaUzqxvKq4BprVGOtZ8KETe1sVlLGHe4L60OYR55aU5SuyZ7Bdrto6TkIRJemPtBFX2qvNepEWU3eJfzdkhmB2kBQ1X0-8iCLcV1Fu0RhFVCrAmpVj9oI3g1VbrpkHP8qvB9gGgr2CI3gYINz1U_XVcUcCQP1KvDF32u9gQcn8y-n1emn2eeX8JDbwS5m9wC218tbegX36x_rq9XydRyTvwB-eN-D |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Correlation-Aware+Stripe+Organization+for+Efficient+Writes+in+Erasure-Coded+Storage%3A+Algorithms+and+Evaluation&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Shen%2C+Zhirong&rft.au=Lee%2C+Patrick+P.+C.&rft.au=Shu%2C+Jiwu&rft.au=Guo%2C+Wenzhong&rft.date=2019-07-01&rft.issn=1045-9219&rft.eissn=1558-2183&rft.volume=30&rft.issue=7&rft.spage=1552&rft.epage=1564&rft_id=info:doi/10.1109%2FTPDS.2018.2890635&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPDS_2018_2890635 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |