Enhancing Practicality of Memory Compression for GPUs with High-Throughput Simplifications

Uloženo v:
Podrobná bibliografie
Název: Enhancing Practicality of Memory Compression for GPUs with High-Throughput Simplifications
Autoři: Renz, Manuel, Lal, Sohan
Informace o vydavateli: ACM, 2025.
Rok vydání: 2025
Témata: Computer Science, Information and General Works::004: Computer Sciences, GPUs, Technology::621: Applied Physics::621.3: Electrical Engineering, Electronic Engineering, Memory Compression, Throughput
Popis: Memory-bound Graphics Processing Unit (GPU) applications are limited by memory bandwidth, as the rapid growth in computational power has outpaced the slower increase in memory bandwidth. Consequently, approaches such as memory compression, are gaining prominence to synthetically enhance memory bandwidth and accelerate bandwidth-limited applications. Traditionally, compression techniques have been tailored towards achieving high compression ratios without considering the high bandwidth of modern GPU memory systems which makes hardware integration costly and impractical. We analyze several state-of-the-art memory compression techniques and finds that the throughput of Bit-Plane Compression (BPC) and Frequent Pattern Compression (FPC) is limited by Zero Run-Length Encoding (ZRLE), which efficiently compresses zero blocks, however, GPUs often do not benefit as even heavily compressed blocks require the transfer of a full Memory-Access Granularity (MAG). We propose simplifying the BPC and FPC techniques by removing ZRLE and introducing a fixed-size tag section. Together with higher word-level parallelism, our simplifications increase compressor throughput by 14.0× and decompressor throughput by 13.5x without any loss in the effective compression ratio. Additionally, the area required for hardware integration is significantly reduced; for instance, the area cost of BPC is decreased by 3.6x, and power consumption by 1.8x, making hardware integration of memory compression more practical and cost-effective.
Druh dokumentu: Conference object
Jazyk: English
DOI: 10.15480/882.15123
Přístupové číslo: edsair.doi...........3e890d7b8a9e1bda1460ee6a65fe276f
Databáze: OpenAIRE
Popis
Abstrakt:Memory-bound Graphics Processing Unit (GPU) applications are limited by memory bandwidth, as the rapid growth in computational power has outpaced the slower increase in memory bandwidth. Consequently, approaches such as memory compression, are gaining prominence to synthetically enhance memory bandwidth and accelerate bandwidth-limited applications. Traditionally, compression techniques have been tailored towards achieving high compression ratios without considering the high bandwidth of modern GPU memory systems which makes hardware integration costly and impractical. We analyze several state-of-the-art memory compression techniques and finds that the throughput of Bit-Plane Compression (BPC) and Frequent Pattern Compression (FPC) is limited by Zero Run-Length Encoding (ZRLE), which efficiently compresses zero blocks, however, GPUs often do not benefit as even heavily compressed blocks require the transfer of a full Memory-Access Granularity (MAG). We propose simplifying the BPC and FPC techniques by removing ZRLE and introducing a fixed-size tag section. Together with higher word-level parallelism, our simplifications increase compressor throughput by 14.0× and decompressor throughput by 13.5x without any loss in the effective compression ratio. Additionally, the area required for hardware integration is significantly reduced; for instance, the area cost of BPC is decreased by 3.6x, and power consumption by 1.8x, making hardware integration of memory compression more practical and cost-effective.
DOI:10.15480/882.15123