Achieving Better Benefits via Flexible Feature Matching in Post-Deduplication Delta Compression
Cloud or distributed storage systems characterized by high data redundancy necessitate effective data reduction techniques to reduce storage costs. Post-deduplication delta compression has proven effective by eliminating both duplicated and similar yet non-duplicated chunks. However, existing approa...
Gespeichert in:
| Veröffentlicht in: | Proceedings - IEEE International Parallel and Distributed Processing Symposium S. 998 - 1010 |
|---|---|
| Hauptverfasser: | , , , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
03.06.2025
|
| Schlagworte: | |
| ISSN: | 1530-2075 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Cloud or distributed storage systems characterized by high data redundancy necessitate effective data reduction techniques to reduce storage costs. Post-deduplication delta compression has proven effective by eliminating both duplicated and similar yet non-duplicated chunks. However, existing approaches often rely on fixed-feature matching for resemblance detection, which, while fast, may lead to lower reduction ratios and not robust benefits across various datasets. In this paper, we introduce BePro, a novel system that integrates Flexible Feature Matching (§IV-A) to achieve better benefits in post-deduplication delta compression. BePro employs Gain Filtering (§IV-B) to identify high-gain chunks while discarding low-gain similar chunks, ensuring robust benefits across different datasets. Additionally, BePro implements a new indexing structure, LSH-Delta (§IV-C), to search for similar chunks and utilizes Index Load Balancer (§IV-D) for efficient resemblance detection by exploiting the distribution characteristics of similar chunks. Furthermore, the Index Manager (§IV-E) skillfully manages memory space overhead, ensuring memory efficiency. We implemented a pipeline prototyping framework to facilitate the evaluation of BePro and other leading techniques. Extensive experiments demonstrate that BePro improves the data-reduction ratios by up to 1.15 \times-2.35 \times while achieving comparable speed. |
|---|---|
| ISSN: | 1530-2075 |
| DOI: | 10.1109/IPDPS64566.2025.00093 |