Flex-SFU: Activation Function Acceleration With Nonuniform Piecewise Approximation

Modern deep neural networks (DNN) increasingly use activation functions with computationally complex operations. This creates a challenge for current hardware accelerators, which are primarily optimized for convolutions and matrix-matrix multiplications. This work introduces Flex-SFU, a lightweight...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on computer-aided design of integrated circuits and systems Jg. 44; H. 11; S. 4236 - 4248
Hauptverfasser:	Andri, Renzo, Reggiani, Enrico, Cavigelli, Lukas
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York IEEE 01.11.2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	Accelerators Accuracy Activation function Approximation Artificial intelligence Artificial neural networks Back propagation networks Computer architecture Deep learning Floating point arithmetic function interpolation Hardware Hardware acceleration Integrated circuits Interpolation Neural networks piece-wise approximation Polynomials Quadratic equations Table lookup Vector processing (computers)
ISSN:	0278-0070, 1937-4151
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Modern deep neural networks (DNN) increasingly use activation functions with computationally complex operations. This creates a challenge for current hardware accelerators, which are primarily optimized for convolutions and matrix-matrix multiplications. This work introduces Flex-SFU, a lightweight hardware accelerator for activation functions that uses nonuniform piecewise interpolation and supports multiple data formats, including both linear and quadratic function segments. We optimize the parameters of these function approximations offline to provide drop-in replacements for existing activation functions. Flex-SFU incorporates an address decoding unit based on a hardware binary-tree search, enabling nonuniform interpolation and floating-point support. This approach achieves, on average, a <inline-formula> <tex-math notation="LaTeX">22.3\times </tex-math></inline-formula> improvement in mean squared error compared to previous piecewise linear interpolation methods. Our evaluations, conducted on more than 600 state-of-the-art neural networks and 100 vision transformers, demonstrate that Flex-SFU can, on average, enhance the end-to-end performance of AI hardware accelerators by 35.7%, achieving up to a <inline-formula> <tex-math notation="LaTeX">3.3\times </tex-math></inline-formula> speedup with negligible impact on model accuracy. This improvement comes with an area and power overhead of only 5.9% and 0.8%, respectively, relative to the baseline vector processing unit. Additionally, we demonstrate that Flex-SFU can accelerate training by up to 15.8% through the interpolation of derivatives for common activation functions during backpropagation, achieving this improvement without impacting either convergence speed or final accuracy.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2025.3558140