Flex-SFU: Activation Function Acceleration With Nonuniform Piecewise Approximation
Modern deep neural networks (DNN) increasingly use activation functions with computationally complex operations. This creates a challenge for current hardware accelerators, which are primarily optimized for convolutions and matrix-matrix multiplications. This work introduces Flex-SFU, a lightweight...
Gespeichert in:
| Veröffentlicht in: | IEEE transactions on computer-aided design of integrated circuits and systems Jg. 44; H. 11; S. 4236 - 4248 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
IEEE
01.11.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 0278-0070, 1937-4151 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Modern deep neural networks (DNN) increasingly use activation functions with computationally complex operations. This creates a challenge for current hardware accelerators, which are primarily optimized for convolutions and matrix-matrix multiplications. This work introduces Flex-SFU, a lightweight hardware accelerator for activation functions that uses nonuniform piecewise interpolation and supports multiple data formats, including both linear and quadratic function segments. We optimize the parameters of these function approximations offline to provide drop-in replacements for existing activation functions. Flex-SFU incorporates an address decoding unit based on a hardware binary-tree search, enabling nonuniform interpolation and floating-point support. This approach achieves, on average, a <inline-formula> <tex-math notation="LaTeX">22.3\times </tex-math></inline-formula> improvement in mean squared error compared to previous piecewise linear interpolation methods. Our evaluations, conducted on more than 600 state-of-the-art neural networks and 100 vision transformers, demonstrate that Flex-SFU can, on average, enhance the end-to-end performance of AI hardware accelerators by 35.7%, achieving up to a <inline-formula> <tex-math notation="LaTeX">3.3\times </tex-math></inline-formula> speedup with negligible impact on model accuracy. This improvement comes with an area and power overhead of only 5.9% and 0.8%, respectively, relative to the baseline vector processing unit. Additionally, we demonstrate that Flex-SFU can accelerate training by up to 15.8% through the interpolation of derivatives for common activation functions during backpropagation, achieving this improvement without impacting either convergence speed or final accuracy. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0278-0070 1937-4151 |
| DOI: | 10.1109/TCAD.2025.3558140 |