FeKAN: Efficient Kolmogorov-Arnold Networks Accelerator Using FeFET-based CAM and LUT

Kolmogorov-Arnold networks (KANs) have emerged as a promising alternative to MLP due to their adaptive learning capabilities for complex dependencies through B-spline basis activations (BBA). However, existing in-memory accelerators optimized for MLP-based DNNs are primarily designed for vector-matr...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7
Hlavní autoři:	Yu, Xuliang, Qian, Yu, Yin, Xunzhao, Zhuo, Cheng, Zhao, Liang
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 22.06.2025
Témata:	Computational efficiency Encoding Energy efficiency Interpolation Memory management Optimization Space exploration Splines (mathematics) Table lookup Throughput
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Kolmogorov-Arnold networks (KANs) have emerged as a promising alternative to MLP due to their adaptive learning capabilities for complex dependencies through B-spline basis activations (BBA). However, existing in-memory accelerators optimized for MLP-based DNNs are primarily designed for vector-matrix multiplication (VMM), making them inefficient for the dynamic and recursive B-spline interpolation (BSI) operations required by KANs. In this work, we propose FeKAN, an FeFET-based architecture designed to accelerate BBA operations. First, we develop a software-hardware co-optimized framework for mapping B-spline basis functions (BBF), leveraging a two-stage design space exploration (DSE) algorithm in combination with FeFET-based Look-Up Tables (LUT) and Content-Addressable Memory (CAM). This framework translated dynamic BSI operations into static codebook lookups, achieving a balanced trade-off between memory and computational efficiency. Second, we propose compress-sparsity-column (CSC) based encoding for B-spline basis function and grouped-computation strategy for memory and energy reduction. Third, we propose a groupedpipeline optimization strategy to mitigate data dependencies, significantly enhancing computation efficiency. Experimental results demonstrate that FeKAN achieves up to 150.68 \mathrm{~K} \times and 4664 \times higher throughput and up to 606.87 \times and 11196 \times greater energy efficiency over Intel Xeon Silver 4310 CPU and NVIDIA A6000 GPU, respectively.
DOI:	10.1109/DAC63849.2025.11132687