Achieving Better Benefits via Flexible Feature Matching in Post-Deduplication Delta Compression

Cloud or distributed storage systems characterized by high data redundancy necessitate effective data reduction techniques to reduce storage costs. Post-deduplication delta compression has proven effective by eliminating both duplicated and similar yet non-duplicated chunks. However, existing approa...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings - IEEE International Parallel and Distributed Processing Symposium s. 998 - 1010
Hlavní autoři: Yang, Fengkui, Mao, Bo, Liu, Yuhan, Bao, Liang, Jiang, Weipeng, Zhang, Dongying, Li, Chunhua, Zhou, Ke
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 03.06.2025
Témata:
ISSN:1530-2075
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Cloud or distributed storage systems characterized by high data redundancy necessitate effective data reduction techniques to reduce storage costs. Post-deduplication delta compression has proven effective by eliminating both duplicated and similar yet non-duplicated chunks. However, existing approaches often rely on fixed-feature matching for resemblance detection, which, while fast, may lead to lower reduction ratios and not robust benefits across various datasets. In this paper, we introduce BePro, a novel system that integrates Flexible Feature Matching (§IV-A) to achieve better benefits in post-deduplication delta compression. BePro employs Gain Filtering (§IV-B) to identify high-gain chunks while discarding low-gain similar chunks, ensuring robust benefits across different datasets. Additionally, BePro implements a new indexing structure, LSH-Delta (§IV-C), to search for similar chunks and utilizes Index Load Balancer (§IV-D) for efficient resemblance detection by exploiting the distribution characteristics of similar chunks. Furthermore, the Index Manager (§IV-E) skillfully manages memory space overhead, ensuring memory efficiency. We implemented a pipeline prototyping framework to facilitate the evaluation of BePro and other leading techniques. Extensive experiments demonstrate that BePro improves the data-reduction ratios by up to 1.15 \times-2.35 \times while achieving comparable speed.
AbstractList Cloud or distributed storage systems characterized by high data redundancy necessitate effective data reduction techniques to reduce storage costs. Post-deduplication delta compression has proven effective by eliminating both duplicated and similar yet non-duplicated chunks. However, existing approaches often rely on fixed-feature matching for resemblance detection, which, while fast, may lead to lower reduction ratios and not robust benefits across various datasets. In this paper, we introduce BePro, a novel system that integrates Flexible Feature Matching (§IV-A) to achieve better benefits in post-deduplication delta compression. BePro employs Gain Filtering (§IV-B) to identify high-gain chunks while discarding low-gain similar chunks, ensuring robust benefits across different datasets. Additionally, BePro implements a new indexing structure, LSH-Delta (§IV-C), to search for similar chunks and utilizes Index Load Balancer (§IV-D) for efficient resemblance detection by exploiting the distribution characteristics of similar chunks. Furthermore, the Index Manager (§IV-E) skillfully manages memory space overhead, ensuring memory efficiency. We implemented a pipeline prototyping framework to facilitate the evaluation of BePro and other leading techniques. Extensive experiments demonstrate that BePro improves the data-reduction ratios by up to 1.15 \times-2.35 \times while achieving comparable speed.
Author Yang, Fengkui
Zhou, Ke
Bao, Liang
Zhang, Dongying
Mao, Bo
Jiang, Weipeng
Liu, Yuhan
Li, Chunhua
Author_xml – sequence: 1
  givenname: Fengkui
  surname: Yang
  fullname: Yang, Fengkui
  organization: Huazhong University of Science and Technology,Wuhan National Laboratory for Optoelectronics,Wuhan,China
– sequence: 2
  givenname: Bo
  surname: Mao
  fullname: Mao, Bo
  organization: School of Informatics at Xiamen University,Xiamen,Fujian,China
– sequence: 3
  givenname: Yuhan
  surname: Liu
  fullname: Liu, Yuhan
  organization: Huazhong University of Science and Technology,Wuhan National Laboratory for Optoelectronics,Wuhan,China
– sequence: 4
  givenname: Liang
  surname: Bao
  fullname: Bao, Liang
  organization: Huazhong University of Science and Technology,Wuhan National Laboratory for Optoelectronics,Wuhan,China
– sequence: 5
  givenname: Weipeng
  surname: Jiang
  fullname: Jiang, Weipeng
  organization: Theorylab, 2012 labs, Huawei Technologies Co., Ltd.,Beijing,China
– sequence: 6
  givenname: Dongying
  surname: Zhang
  fullname: Zhang, Dongying
  organization: Huazhong University of Science and Technology,Wuhan National Laboratory for Optoelectronics,Wuhan,China
– sequence: 7
  givenname: Chunhua
  surname: Li
  fullname: Li, Chunhua
  organization: Huazhong University of Science and Technology,Wuhan National Laboratory for Optoelectronics,Wuhan,China
– sequence: 8
  givenname: Ke
  surname: Zhou
  fullname: Zhou, Ke
  email: zhke@hust.edu.cn
  organization: Huazhong University of Science and Technology,Wuhan National Laboratory for Optoelectronics,Wuhan,China
BookMark eNotUFFLwzAYjKLgNvcPFPIHOr8kTdI8zs3pYGJBfR5p9kUjXTuabOi_N6BPdxx3B3djctH1HRJyy2DGGJi7db2sX1UplZpx4HIGAEackanRphKCScGFVudklBkUHLS8IuMYvwA4iNKMyHbuPgOeQvdB7zElHDJ06EOK9BQsXbX4HZoW6QptOg5In23KgewOHa37mIol7o6HNjibQt_RJbbJ0kW_PwwYY1auyaW3bcTpP07I--rhbfFUbF4e14v5pggMVCoaK60yqJ1nlai8MVIaB95BnubzBKYapiQ60zQ7jZVXHNA2FhwqKDlYMSE3f70BEbeHIezt8LPNF-mq1KX4BXu7V_s
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IPDPS64566.2025.00093
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798331532376
EISSN 1530-2075
EndPage 1010
ExternalDocumentID 11078474
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62232007
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-i106t-ba5a69e7cf1838f99559c0fc0645f23716b165ec9bbd7e8f620eaba0ce60420a3
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001552207700085&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:13:52 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i106t-ba5a69e7cf1838f99559c0fc0645f23716b165ec9bbd7e8f620eaba0ce60420a3
PageCount 13
ParticipantIDs ieee_primary_11078474
PublicationCentury 2000
PublicationDate 2025-June-3
PublicationDateYYYYMMDD 2025-06-03
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-3
  day: 03
PublicationDecade 2020
PublicationTitle Proceedings - IEEE International Parallel and Distributed Processing Symposium
PublicationTitleAbbrev IPDPS
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020349
Score 1.910854
Snippet Cloud or distributed storage systems characterized by high data redundancy necessitate effective data reduction techniques to reduce storage costs....
SourceID ieee
SourceType Publisher
StartPage 998
SubjectTerms Benefit
Delta Compression
Flexible Feature Matching
Index
Resemblance Detection
Title Achieving Better Benefits via Flexible Feature Matching in Post-Deduplication Delta Compression
URI https://ieeexplore.ieee.org/document/11078474
WOSCitedRecordID wos001552207700085&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFLVoxcBUHkW85YHV1Ekcux6BUsFAFQmQulW2cyMiVSlK034_vm5aWBiYEmWIJVvX93nOIeRWp4EpK2JRlORMAAjmo3DLYsu9N5bBpwaxCTWZDKdTnbVg9YCFAYAwfAZ3-Bp6-fnCrbBUNsBcxd-mokM6SqkNWGuXXSHRSgvRibgevGSj7E368ADnEGIsnHBsLv-SUAkeZNz759qHpP-DxaPZzssckT2ojklvK8ZAW9s8IbN791kClgfoQ4Do-EcFRdks6bo0dIzEl3YOFGO-VQ301d_BWH2iZUVRsZeNvNPZNbPpCOaNobjMZlC26pOP8dP74zNr1RNY6dO8hlmTGqlBucJb7bDQSDXneOGQoK6IE58n2Uim4LS1uYJhIWMOxhruQHpD5iY5Jd1qUcEZoamxVrsUIAUhcidsLpD3zv8jlyrS7pz0ccNmXxuCjNl2ry7--H5JDvBMwsRVckW6Tb2Ca7Lv1k25rG_CsX4DzWGj5w
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NT8IwGG4UTfSEHxi_7cFrdR_tRo8qEohAlogJt6Xt3sUlZBgY_H77loFePHjassOatHn7fj7PQ8i9FI4py2e-H2aMA3Bmo3DNAu1Zbxw5n-rEJuLRqD2ZyKQGqzssDAC44TN4wFfXy89mZomlskfMVextynfJnuA88NdwrW1-hVQrNUjH9-RjP-kk75ENEHASIcDSiYft5V8iKs6HdJv_XP2ItH7QeDTZ-pljsgPlCWlu5BhobZ2nJH0ynwVggYA-O5COfZSQF9WCrgpFu0h9qadAMepbzoEO7S2M9SdalBQ1e1nHup1tO5t2YFopisusR2XLFvnovo5feqzWT2CFTfQqppVQkYTY5NZu27lEsjnj5QYp6vIgtJmS9iMBRmqdxdDOo8ADpZVnILKm7KnwjDTKWQnnhAqltTQCQADnmeE648h8Z_-RRbEvzQVp4YalX2uKjHSzV5d_fL8jB73xcJAO-qO3K3KI5-Pmr8Jr0qjmS7gh-2ZVFYv5rTvib3-Fpy4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=Achieving+Better+Benefits+via+Flexible+Feature+Matching+in+Post-Deduplication+Delta+Compression&rft.au=Yang%2C+Fengkui&rft.au=Mao%2C+Bo&rft.au=Liu%2C+Yuhan&rft.au=Bao%2C+Liang&rft.date=2025-06-03&rft.pub=IEEE&rft.eissn=1530-2075&rft.spage=998&rft.epage=1010&rft_id=info:doi/10.1109%2FIPDPS64566.2025.00093&rft.externalDocID=11078474