PIMDup: An Optimized Deduplication Design on a Real Processing-in-Memory System

Data deduplication enhances storage efficiency through non-destructive compression but is often hindered by the chunking process, which requires scanning the entire dataset. While traditional methods leveraging conventional architectures and hardware accelerators (e.g., GPUs and FPGAs) have been dev...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7
Hlavní autoři: Yeh, Chun-Le, Chen, Liang-Chi, Ho, Chien-Chung, Chang, Yu-Ming, Chang, Da-Wei
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 22.06.2025
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Data deduplication enhances storage efficiency through non-destructive compression but is often hindered by the chunking process, which requires scanning the entire dataset. While traditional methods leveraging conventional architectures and hardware accelerators (e.g., GPUs and FPGAs) have been developed to address this issue, they continue to face challenges related to excessive data movement and associated performance degradation. These limitations stem from the von Neumann architecture, where computation and storage are separated in a processor-centric design, necessitating multiple memory hierarchy traversals and causing inefficiencies. To overcome these challenges, we explore UPMEM's DPU, a processing-in-memory (PIM) technology that reduces data movement by performing computations directly within memory. However, designing a deduplication system for DPUs presents unique obstacles, including restricted inter-DPU data sharing, the absence of native multiplication support, and significant DPU-CPU communication overhead. In response, we propose PIMDup, a DPU-optimized deduplication system that addresses these constraints through efficient parallelization, DPU-friendly chunking techniques, and reduced data transfer volumes. Experimental results demonstrate that PIMDup improves chunking performance without compromising deduplication accuracy, achieving a 1.67 \times speedup over CPU-based systems while maintaining 100% result consistency.
DOI:10.1109/DAC63849.2025.11133045