Flex-SFU: Accelerating DNN Activation Functions by Non-Uniform Piecewise Approximation

Modern DNN workloads increasingly rely on activation functions consisting of computationally complex operations. This poses a challenge to current accelerators optimized for convolutions and matrix-matrix multiplications. This work presents Flex-SFU, a lightweight hardware accelerator for activation...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2023 60th ACM/IEEE Design Automation Conference (DAC) s. 1 - 6
Hlavní autori: Reggiani, Enrico, Andri, Renzo, Cavigelli, Lukas
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 09.07.2023
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Modern DNN workloads increasingly rely on activation functions consisting of computationally complex operations. This poses a challenge to current accelerators optimized for convolutions and matrix-matrix multiplications. This work presents Flex-SFU, a lightweight hardware accelerator for activation functions implementing non-uniform piecewise interpolation supporting multiple data formats. Non-Uniform segments and floating-point numbers are enabled by implementing a binary-tree comparison within the address decoding unit. An SGD-based optimization algorithm with heuristics is proposed to find the interpolation function reducing the mean squared error. Thanks to non-uniform interpolation and floating-point support, Flex-SFU achieves on average 22.3x better mean squared error compared to previous piecewise linear interpolation approaches. The evaluation with more than 700 computer vision and natural language processing models shows that Flex-SFU can, on average, improve the end-to-end performance of state-of-the-art AI hardware accelerators by 35.7%, achieving up to 3.3x speedup with negligible impact in the models' accuracy when using 32 segments, and only introducing an area and power overhead of 5.9% and 0.8% relative to the baseline vector processing unit.
AbstractList Modern DNN workloads increasingly rely on activation functions consisting of computationally complex operations. This poses a challenge to current accelerators optimized for convolutions and matrix-matrix multiplications. This work presents Flex-SFU, a lightweight hardware accelerator for activation functions implementing non-uniform piecewise interpolation supporting multiple data formats. Non-Uniform segments and floating-point numbers are enabled by implementing a binary-tree comparison within the address decoding unit. An SGD-based optimization algorithm with heuristics is proposed to find the interpolation function reducing the mean squared error. Thanks to non-uniform interpolation and floating-point support, Flex-SFU achieves on average 22.3x better mean squared error compared to previous piecewise linear interpolation approaches. The evaluation with more than 700 computer vision and natural language processing models shows that Flex-SFU can, on average, improve the end-to-end performance of state-of-the-art AI hardware accelerators by 35.7%, achieving up to 3.3x speedup with negligible impact in the models' accuracy when using 32 segments, and only introducing an area and power overhead of 5.9% and 0.8% relative to the baseline vector processing unit.
Author Reggiani, Enrico
Cavigelli, Lukas
Andri, Renzo
Author_xml – sequence: 1
  givenname: Enrico
  surname: Reggiani
  fullname: Reggiani, Enrico
  organization: Huawei Zurich Research Center,Computing Systems Lab,Switzerland
– sequence: 2
  givenname: Renzo
  surname: Andri
  fullname: Andri, Renzo
  email: renzo.andri@huawei.com
  organization: Huawei Zurich Research Center,Computing Systems Lab,Switzerland
– sequence: 3
  givenname: Lukas
  surname: Cavigelli
  fullname: Cavigelli, Lukas
  organization: Huawei Zurich Research Center,Computing Systems Lab,Switzerland
BookMark eNo1T9tKxDAUjKCgrv0DkfxA11ybxLfStSosVdD6uiTZUwl009JW3f176-1pLjAzzDk6jl0EhK4oWVJKzPUqL2RmmFkywviSEiaUlvIIJUYZzSXhjAtNT1EyjsGRjEgtSCbO0GvZwj59LusbnHsPLQx2CvENr6pqNqbwMcsu4vI9-m8yYnfAVRfTOoamG3b4KYCHzzACzvt-6PZh9xO4QCeNbUdI_nCB6vL2pbhP1493D0W-Ti0zZEqpYNYbLoRy0nPtlJHMKZh_KO0zrTLTyGZLORhttVUCvKNENVtBnDUcgC_Q5W9vAIBNP8zzw2Hzf59_AfOoUqs
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/DAC56929.2023.10247855
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350323481
EndPage 6
ExternalDocumentID 10247855
Genre orig-research
GroupedDBID 6IE
6IH
ACM
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a290t-142ac93447b5c38b7952b7e92978c68769f5fd13e98a8a74ecb107fd40ba93ee3
IEDL.DBID RIE
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001073487300168&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:51:00 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a290t-142ac93447b5c38b7952b7e92978c68769f5fd13e98a8a74ecb107fd40ba93ee3
PageCount 6
ParticipantIDs ieee_primary_10247855
PublicationCentury 2000
PublicationDate 2023-July-9
PublicationDateYYYYMMDD 2023-07-09
PublicationDate_xml – month: 07
  year: 2023
  text: 2023-July-9
  day: 09
PublicationDecade 2020
PublicationTitle 2023 60th ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev DAC
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib060584064
Score 2.2905202
Snippet Modern DNN workloads increasingly rely on activation functions consisting of computationally complex operations. This poses a challenge to current accelerators...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Computational modeling
Computer vision
Decoding
Design automation
Heuristic algorithms
Interpolation
Natural language processing
Title Flex-SFU: Accelerating DNN Activation Functions by Non-Uniform Piecewise Approximation
URI https://ieeexplore.ieee.org/document/10247855
WOSCitedRecordID wos001073487300168&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFLVoxcAEiCLe8sDq0jwc22xVS8SAokrQqlvlxw3KkqI-oPw9124DYmBgsyzZUY7tnOvY5x5CbiHGMDniwKThlqWxNkyp2DJrS2ciA5kLPmSTJ1EUcjpVo51YPWhhACBcPoOuL4azfDe3a_-rDFd4nArJeYu0hMi2Yq1m8vjjPSSndKcCjnrqbtgf8Azpv-stwrtN4182KoFF8sN_Pv-IdH70eHT0zTTHZA_qEzJByDfsOR_f0761yB5-LOtXOiwKrGhcy2iOxBXmFjWftJjXDKNMH6jSUQUWPqol0L7PK76ptiLGDhnnDy-DR7ZzSWA6Vr0VixBgq3ziPkQ7kUYoHhsB-N5C2gw_dqrkpYsSUFJLLVKwBrd8pUt7RqsEIDkl7Xpewxmh3GEvcaSd9lnkTKa0FZmTkkvNLW4kz0nHgzJ72ybCmDV4XPxRf0kOPPThdqu6Iu3VYg3XZN--r6rl4iYM3xdy95t9
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELWgIMEEiCK-8cDq0nw4sdmqlqqIElWirbpVtnNBWVLUDyj_nrObgBgY2CJLiZx3jt859rtHyC34mCZ7HJjQ3LDQV5pJ6RtmTJZqT0OUOh-ycT9OEjGZyEEpVndaGABwh8-gYS_dXn46Myv7qwy_cD-MBefbZMdaZ5VyrWr42A0-pKew1AF7TXnXabV5hAlAw5qEN6rbfxmpOB7pHvyzB4ek_qPIo4NvrjkiW1AckzGCvmYv3dE9bRmD_GGjWbzSTpJgQ-VbRrtIXW50Uf1Jk1nBMM-0qSod5GDgI18AbdnK4ut8I2Osk1H3YdjusdIngSlfNpfMQ4iNtKX7EO9A6FhyX8eA7x0LE-F0JzOepV4AUiih4hCMxkVfloZNrWQAEJyQWjEr4JRQnuJTfE-lytaR05FUJo5SIbhQ3OBS8ozULSjTt00pjGmFx_kf7Tdkrzd87k_7j8nTBdm3YXBnXeUlqS3nK7giu-Z9mS_m1y6UX76dnsY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+60th+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=Flex-SFU%3A+Accelerating+DNN+Activation+Functions+by+Non-Uniform+Piecewise+Approximation&rft.au=Reggiani%2C+Enrico&rft.au=Andri%2C+Renzo&rft.au=Cavigelli%2C+Lukas&rft.date=2023-07-09&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FDAC56929.2023.10247855&rft.externalDocID=10247855