Enhancing Practicality of Memory Compression for GPUs with High-Throughput Simplifications

Gespeichert in:
Bibliographische Detailangaben
Titel: Enhancing Practicality of Memory Compression for GPUs with High-Throughput Simplifications
Autoren: Renz, Manuel, Lal, Sohan
Verlagsinformationen: ACM, 2025.
Publikationsjahr: 2025
Schlagwörter: Computer Science, Information and General Works::004: Computer Sciences, GPUs, Technology::621: Applied Physics::621.3: Electrical Engineering, Electronic Engineering, Memory Compression, Throughput
Beschreibung: Memory-bound Graphics Processing Unit (GPU) applications are limited by memory bandwidth, as the rapid growth in computational power has outpaced the slower increase in memory bandwidth. Consequently, approaches such as memory compression, are gaining prominence to synthetically enhance memory bandwidth and accelerate bandwidth-limited applications. Traditionally, compression techniques have been tailored towards achieving high compression ratios without considering the high bandwidth of modern GPU memory systems which makes hardware integration costly and impractical. We analyze several state-of-the-art memory compression techniques and finds that the throughput of Bit-Plane Compression (BPC) and Frequent Pattern Compression (FPC) is limited by Zero Run-Length Encoding (ZRLE), which efficiently compresses zero blocks, however, GPUs often do not benefit as even heavily compressed blocks require the transfer of a full Memory-Access Granularity (MAG). We propose simplifying the BPC and FPC techniques by removing ZRLE and introducing a fixed-size tag section. Together with higher word-level parallelism, our simplifications increase compressor throughput by 14.0× and decompressor throughput by 13.5x without any loss in the effective compression ratio. Additionally, the area required for hardware integration is significantly reduced; for instance, the area cost of BPC is decreased by 3.6x, and power consumption by 1.8x, making hardware integration of memory compression more practical and cost-effective.
Publikationsart: Conference object
Sprache: English
DOI: 10.15480/882.15123
Dokumentencode: edsair.doi...........3e890d7b8a9e1bda1460ee6a65fe276f
Datenbank: OpenAIRE
Beschreibung
Abstract:Memory-bound Graphics Processing Unit (GPU) applications are limited by memory bandwidth, as the rapid growth in computational power has outpaced the slower increase in memory bandwidth. Consequently, approaches such as memory compression, are gaining prominence to synthetically enhance memory bandwidth and accelerate bandwidth-limited applications. Traditionally, compression techniques have been tailored towards achieving high compression ratios without considering the high bandwidth of modern GPU memory systems which makes hardware integration costly and impractical. We analyze several state-of-the-art memory compression techniques and finds that the throughput of Bit-Plane Compression (BPC) and Frequent Pattern Compression (FPC) is limited by Zero Run-Length Encoding (ZRLE), which efficiently compresses zero blocks, however, GPUs often do not benefit as even heavily compressed blocks require the transfer of a full Memory-Access Granularity (MAG). We propose simplifying the BPC and FPC techniques by removing ZRLE and introducing a fixed-size tag section. Together with higher word-level parallelism, our simplifications increase compressor throughput by 14.0× and decompressor throughput by 13.5x without any loss in the effective compression ratio. Additionally, the area required for hardware integration is significantly reduced; for instance, the area cost of BPC is decreased by 3.6x, and power consumption by 1.8x, making hardware integration of memory compression more practical and cost-effective.
DOI:10.15480/882.15123