Pushing Collaborative Data Deduplication to the Network Edge: An Optimization Framework and System Design

Edge computing has become a new computing paradigm with explosive growth in recent years. We consider the problem of pushing data deduplication to the network edge and propose a new framework for distributed edge-facilitated deduplication (EF-dedup). Deduplication at the network edge allows us to ex...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on network science and engineering Jg. 9; H. 4; S. 2110 - 2122
Hauptverfasser: Li, Shijing, Lan, Tian, Balasubramanian, Bharath, Lee, Hee Won, Ra, Moo-Ryong, Panta, Rajesh Krishna
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Piscataway IEEE 01.07.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:
ISSN:2327-4697, 2334-329X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Edge computing has become a new computing paradigm with explosive growth in recent years. We consider the problem of pushing data deduplication to the network edge and propose a new framework for distributed edge-facilitated deduplication (EF-dedup). Deduplication at the network edge allows us to exploit the high degree of geographic- and temporal-correlation in edge data to achieve space efficiency. By leveraging distributed computing power available on the edge in a collaborative fashion, the edge nodes can effectively suppress duplicated edge data, consuming considerably less space and WAN bandwidth. To this end, we partition the edge nodes into disjoint collaborative clusters, maintain a deduplication index structure across them using a distributed key-value store and perform deduplication within those clusters. However, this partitioning problem is very challenging and requires the optimization of a novel tradeoff: edge nodes with highly correlated data may not always be within the same edge cloud, with non-trivial network cost among them. We formulate a joint storage and network optimization problem with different design objectives, such as arbitrary partitioning and balanced partitioning of edge nodes. The problem is shown to be NP-Hard in general. Then, an optimization framework with efficient algorithms is developed and is proven to achieve a closed-form competitive ratio. Our experiments, performed on edge nodes in a corporate lab 1 and a central cloud at AWS, demonstrate that EF-dedup achieves 67.4<inline-formula><tex-math notation="LaTeX">\sim</tex-math></inline-formula>133.7% better deduplication throughput than sole cloud-based techniques and achieves 20.0-62.6<inline-formula><tex-math notation="LaTeX">\%</tex-math></inline-formula> lesser aggregate cost in terms of the network-storage trade-off as compared to approaches that solely favor one over the other.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2327-4697
2334-329X
DOI:10.1109/TNSE.2022.3155357