FeKAN: Efficient Kolmogorov-Arnold Networks Accelerator Using FeFET-based CAM and LUT
Kolmogorov-Arnold networks (KANs) have emerged as a promising alternative to MLP due to their adaptive learning capabilities for complex dependencies through B-spline basis activations (BBA). However, existing in-memory accelerators optimized for MLP-based DNNs are primarily designed for vector-matr...
Uloženo v:
| Vydáno v: | 2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
22.06.2025
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Kolmogorov-Arnold networks (KANs) have emerged as a promising alternative to MLP due to their adaptive learning capabilities for complex dependencies through B-spline basis activations (BBA). However, existing in-memory accelerators optimized for MLP-based DNNs are primarily designed for vector-matrix multiplication (VMM), making them inefficient for the dynamic and recursive B-spline interpolation (BSI) operations required by KANs. In this work, we propose FeKAN, an FeFET-based architecture designed to accelerate BBA operations. First, we develop a software-hardware co-optimized framework for mapping B-spline basis functions (BBF), leveraging a two-stage design space exploration (DSE) algorithm in combination with FeFET-based Look-Up Tables (LUT) and Content-Addressable Memory (CAM). This framework translated dynamic BSI operations into static codebook lookups, achieving a balanced trade-off between memory and computational efficiency. Second, we propose compress-sparsity-column (CSC) based encoding for B-spline basis function and grouped-computation strategy for memory and energy reduction. Third, we propose a groupedpipeline optimization strategy to mitigate data dependencies, significantly enhancing computation efficiency. Experimental results demonstrate that FeKAN achieves up to 150.68 \mathrm{~K} \times and 4664 \times higher throughput and up to 606.87 \times and 11196 \times greater energy efficiency over Intel Xeon Silver 4310 CPU and NVIDIA A6000 GPU, respectively. |
|---|---|
| DOI: | 10.1109/DAC63849.2025.11132687 |