DenSparSA: A Balanced Systolic Array Approach for Dense and Sparse Matrix Multiplication

Numerous studies have proposed hardware architectures to accelerate sparse matrix multiplication, but these approaches often incur substantial area and power overhead, significantly compromising their usage in dense scenarios. On the other hand, systolic arrays deliver high efficiency for dense matr...

Full description

Saved in:
Bibliographic Details
Published in:2025 62nd ACM/IEEE Design Automation Conference (DAC) pp. 1 - 7
Main Authors: Wang, Ziheng, Sun, Ruiqi, He, Xin, Ma, Tianrui, Zou, An
Format: Conference Proceeding
Language:English
Published: IEEE 22.06.2025
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Numerous studies have proposed hardware architectures to accelerate sparse matrix multiplication, but these approaches often incur substantial area and power overhead, significantly compromising their usage in dense scenarios. On the other hand, systolic arrays deliver high efficiency for dense matrix operations, but their application to sparse matrices remains challenging. An ideal design should process both dense and sparse matrices with high efficiency to satisfy performance and versatility requirements.In this paper, we introduce DenSparSA, a balanced systolic array centralized architecture that can execute sparse matrix computations with minimal overhead to original dense matrix computations. DenSparSA supports both single-side and dual-side unstructured sparse matrix multiplications with high efficiency. At the same time, the additional hardware required for managing sparsity is compact and decoupled from the conventional systolic array, allowing for minimal power overhead when switched back to dense matrix operations via circuit gating. The proposed design is implemented with Nangate 45 nm. Implementation results show that DenSparSA achieves a speedup ranging from 1.9 \times to 22 \times compared to the classic systolic array for sparse workloads, while maintaining relatively low area and power overhead. For dense workloads, the power overhead can be reduced to \mathbf{1 2 \%} for BF16 and 5% for FP32. Compared with existing solutions for sparse acceleration, DenSparSA delivers competitive (0.82 \times-1.32 \times) efficiency in sparse scenarios and 1.17 \times-2.28 \times better efficiency for dense scenarios, indicating a better balance between both situations.
DOI:10.1109/DAC63849.2025.11133069