Achieving Better Benefits via Flexible Feature Matching in Post-Deduplication Delta Compression

Cloud or distributed storage systems characterized by high data redundancy necessitate effective data reduction techniques to reduce storage costs. Post-deduplication delta compression has proven effective by eliminating both duplicated and similar yet non-duplicated chunks. However, existing approa...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings - IEEE International Parallel and Distributed Processing Symposium S. 998 - 1010
Hauptverfasser: Yang, Fengkui, Mao, Bo, Liu, Yuhan, Bao, Liang, Jiang, Weipeng, Zhang, Dongying, Li, Chunhua, Zhou, Ke
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 03.06.2025
Schlagworte:
ISSN:1530-2075
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Cloud or distributed storage systems characterized by high data redundancy necessitate effective data reduction techniques to reduce storage costs. Post-deduplication delta compression has proven effective by eliminating both duplicated and similar yet non-duplicated chunks. However, existing approaches often rely on fixed-feature matching for resemblance detection, which, while fast, may lead to lower reduction ratios and not robust benefits across various datasets. In this paper, we introduce BePro, a novel system that integrates Flexible Feature Matching (§IV-A) to achieve better benefits in post-deduplication delta compression. BePro employs Gain Filtering (§IV-B) to identify high-gain chunks while discarding low-gain similar chunks, ensuring robust benefits across different datasets. Additionally, BePro implements a new indexing structure, LSH-Delta (§IV-C), to search for similar chunks and utilizes Index Load Balancer (§IV-D) for efficient resemblance detection by exploiting the distribution characteristics of similar chunks. Furthermore, the Index Manager (§IV-E) skillfully manages memory space overhead, ensuring memory efficiency. We implemented a pipeline prototyping framework to facilitate the evaluation of BePro and other leading techniques. Extensive experiments demonstrate that BePro improves the data-reduction ratios by up to 1.15 \times-2.35 \times while achieving comparable speed.
ISSN:1530-2075
DOI:10.1109/IPDPS64566.2025.00093