BLOOM: Bit-Slice Framework for DNN Acceleration with Mixed-Precision
Deep neural networks (DNNs) have revolutionized numerous AI applications, but their vast model sizes and limited hardware resources present significant deployment challenges. Model quantization offers a promising solution to bridge the gap between DNN size and hardware capacity. While INT8 quantizat...
Uloženo v:
| Vydáno v: | 2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7 |
|---|---|
| Hlavní autoři: | , , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
22.06.2025
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Deep neural networks (DNNs) have revolutionized numerous AI applications, but their vast model sizes and limited hardware resources present significant deployment challenges. Model quantization offers a promising solution to bridge the gap between DNN size and hardware capacity. While INT8 quantization has been widely used, recent research has pushed for even lower precision, such as INT4. However, the presence of outliers-values with unusually large magnitudes-limits the effectiveness of current quantization techniques. Previous compression-based acceleration methods that incorporate outlieraware encoding introduce complex logic. A critical issue we have identified is that serialization and deserialization dominate the encoding/decoding time in these compression workflows, leading to substantial performance penalties during workflow execution. To address this challenge, we introduce a novel computing approach and a compatible architecture design named "BLOOM". BLOOM leverages the strengths of the "bit-slicing" method, effectively combining structured mixed-precision and bit-level sparsity with adaptive dataflow techniques. The key insight of BLOOM is that outliers require higher precision, while normal values can be processed at lower precision. By interleaving 4-bit values, we efficiently exploit the inherent sparsity in the highprecision components. As a result, the BLOOM-based accelerator outperforms the existing outlier-aware accelerators by an average 1.2 \sim 4.0 \times speedup and 24.6 \% \sim 71.3 \% energy reduction, respectively, without model accuracy loss. |
|---|---|
| DOI: | 10.1109/DAC63849.2025.11133246 |