BLOOM: Bit-Slice Framework for DNN Acceleration with Mixed-Precision

Deep neural networks (DNNs) have revolutionized numerous AI applications, but their vast model sizes and limited hardware resources present significant deployment challenges. Model quantization offers a promising solution to bridge the gap between DNN size and hardware capacity. While INT8 quantizat...

Full description

Saved in:
Bibliographic Details
Published in:2025 62nd ACM/IEEE Design Automation Conference (DAC) pp. 1 - 7
Main Authors: Liu, Fangxin, Yang, Ning, Wang, Zongwu, Zhu, Xuanpeng, Yao, Haidong, Xiong, Xiankui, Jiang, Li, Guan, Haibing
Format: Conference Proceeding
Language:English
Published: IEEE 22.06.2025
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep neural networks (DNNs) have revolutionized numerous AI applications, but their vast model sizes and limited hardware resources present significant deployment challenges. Model quantization offers a promising solution to bridge the gap between DNN size and hardware capacity. While INT8 quantization has been widely used, recent research has pushed for even lower precision, such as INT4. However, the presence of outliers-values with unusually large magnitudes-limits the effectiveness of current quantization techniques. Previous compression-based acceleration methods that incorporate outlieraware encoding introduce complex logic. A critical issue we have identified is that serialization and deserialization dominate the encoding/decoding time in these compression workflows, leading to substantial performance penalties during workflow execution. To address this challenge, we introduce a novel computing approach and a compatible architecture design named "BLOOM". BLOOM leverages the strengths of the "bit-slicing" method, effectively combining structured mixed-precision and bit-level sparsity with adaptive dataflow techniques. The key insight of BLOOM is that outliers require higher precision, while normal values can be processed at lower precision. By interleaving 4-bit values, we efficiently exploit the inherent sparsity in the highprecision components. As a result, the BLOOM-based accelerator outperforms the existing outlier-aware accelerators by an average 1.2 \sim 4.0 \times speedup and 24.6 \% \sim 71.3 \% energy reduction, respectively, without model accuracy loss.
DOI:10.1109/DAC63849.2025.11133246