Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers

Transformers have transformed the field of natural language processing. Their superior performance is largely attributed to the use of stacked "self-attention" layers, each of which consists of matrix multiplies as well as softmax operations. As a result, unlike other neural networks, the...

Full description

Saved in:
Bibliographic Details
Published in:2021 58th ACM/IEEE Design Automation Conference (DAC) pp. 469 - 474
Main Authors: Stevens, Jacob R., Venkatesan, Rangharajan, Dai, Steve, Khailany, Brucek, Raghunathan, Anand
Format: Conference Proceeding
Language:English
Published: IEEE 05.12.2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Transformers have transformed the field of natural language processing. Their superior performance is largely attributed to the use of stacked "self-attention" layers, each of which consists of matrix multiplies as well as softmax operations. As a result, unlike other neural networks, the softmax operation accounts for a significant fraction of the total run-time of Transformers. To address this, we propose Softermax, a hardware-friendly softmax design. Softermax consists of base replacement, low-precision softmax computations, and an online normalization calculation. We show Softermax results in 2.35x the energy efficiency at 0.90x the size of a comparable baseline, with negligible impact on network accuracy.
AbstractList Transformers have transformed the field of natural language processing. Their superior performance is largely attributed to the use of stacked "self-attention" layers, each of which consists of matrix multiplies as well as softmax operations. As a result, unlike other neural networks, the softmax operation accounts for a significant fraction of the total run-time of Transformers. To address this, we propose Softermax, a hardware-friendly softmax design. Softermax consists of base replacement, low-precision softmax computations, and an online normalization calculation. We show Softermax results in 2.35x the energy efficiency at 0.90x the size of a comparable baseline, with negligible impact on network accuracy.
Author Venkatesan, Rangharajan
Raghunathan, Anand
Dai, Steve
Khailany, Brucek
Stevens, Jacob R.
Author_xml – sequence: 1
  givenname: Jacob R.
  surname: Stevens
  fullname: Stevens, Jacob R.
  organization: Purdue University,West Lafayette
– sequence: 2
  givenname: Rangharajan
  surname: Venkatesan
  fullname: Venkatesan, Rangharajan
  organization: NVIDIA
– sequence: 3
  givenname: Steve
  surname: Dai
  fullname: Dai, Steve
  organization: NVIDIA
– sequence: 4
  givenname: Brucek
  surname: Khailany
  fullname: Khailany, Brucek
  organization: NVIDIA
– sequence: 5
  givenname: Anand
  surname: Raghunathan
  fullname: Raghunathan, Anand
  organization: Purdue University,West Lafayette
BookMark eNotT19LwzAcjKCgzn4CEfIF2uWXf018G93mhMEenM8jTX-Rgk0lKajf3g73cnccdwd3T67jGJGQJ2AVALPL9aoBw2pZccahsspoEPKKFLY2oLWSgteS3ZIi575lmikjZ7wjh7cxTJgG9_NMdy513y7h8uydBW3Gco25_4h0DNRFugmh9z3GiZ4jc4mGMdFjcjHPYsCUH8hNcJ8ZiwsvyPt2c2x25f7w8tqs9qXjpp5K6JgSBrAW4K1XElqwijuL0nfBauWNFMJJrzWX2oHyupOeoXRKadu2XCzI4_9uj4inr9QPLv2eLrfFHxA7T3w
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/DAC18074.2021.9586134
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library (IEL) (UW System Shared)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781665432740
1665432748
EndPage 474
ExternalDocumentID 9586134
Genre orig-research
GroupedDBID 6IE
6IH
ACM
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a287t-1d05381e731c9c541b1952a9e4cdf965c8433a4c66246a15c6d4c0e4a5569bb23
IEDL.DBID RIE
ISICitedReferencesCount 89
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000766079700079&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:28:29 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a287t-1d05381e731c9c541b1952a9e4cdf965c8433a4c66246a15c6d4c0e4a5569bb23
PageCount 6
ParticipantIDs ieee_primary_9586134
PublicationCentury 2000
PublicationDate 2021-Dec.-5
PublicationDateYYYYMMDD 2021-12-05
PublicationDate_xml – month: 12
  year: 2021
  text: 2021-Dec.-5
  day: 05
PublicationDecade 2020
PublicationTitle 2021 58th ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev DAC
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib060584060
Score 2.5447443
Snippet Transformers have transformed the field of natural language processing. Their superior performance is largely attributed to the use of stacked "self-attention"...
SourceID ieee
SourceType Publisher
StartPage 469
SubjectTerms Deep learning
Design automation
Hardware
hardware/software codesign
Natural language processing
neural network accelerators
Neural networks
Software
Transformers
Title Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers
URI https://ieeexplore.ieee.org/document/9586134
WOSCitedRecordID wos000766079700079&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5t8eBJpRXf5ODRbTe7eY036QMPUgtW6a1k8wAvu1K36s83SbcVwYu3EDIJzCSZmczMF4SuDdM8-AEJaMoDqLZLFHc8cc4JC6kGGnG6Xx7EdCoXC5i10M2uFsZaG5PPbD80YyzfVHodnsoGwKTXPrSN2kKITa3Wdu-E6J7XTWlTpENSGIzuhiRAvXgnMCP9hvbXJypRh0wO_rf6Ier9FOPh2U7NHKGWLbvo8aly8VL9usUh-P6pVnYQ-kIDD6tkFDMzcOWwKvE44kT42XEY4omwN1XxfGuzeguwh54n4_nwPmn-RkiU93HqhBh_eiSxIicaNKOkIMAyBZZq44AzLWmeK6o5zyhXxEvEUJ1aqhjjUBRZfow6ZVXaE4SpkiClyb21pWhhGXipEqACSAE5Lcwp6gZmLN828BfLhg9nf3efo_3A75jxwS5Qp16t7SXa0x_16_vqKsrsGwoml1Q
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDI7GQIIToA3xJgeOdGtaJ625oY1piDEmMdBuU5omEpcWjQ34-SRZN4TEhVsUxYlkJ7Ed218Iucy5Es4PCFCBcKDaJpDCiMAYk2gMFYLH6X4ZJMNhOpngqEau1rUwWmuffKZbrulj-XmpFu6prI08tdoHNsgmB4jYslprtXtcfM9qp7Aq02Ehtrs3HebAXqwbGLFWRf3rGxWvRXq7_1t_jzR_yvHoaK1o9klNFw3y-FQaf61-XVMXfv-UM912fa5BO2XQ9bkZtDRUFvTWI0XY2akbYomoNVbpeGW1WhuwSZ57t-NOP6h-Rwik9XLmAcvt-UmZTmKmUHFgGUMeSdSgcoOCqxTiWIISIgIhmZVJDirUIDkXmGVRfEDqRVnoQ0JBppimeWztLQmZ5mjlyhASZBnGkOVHpOGYMX1bAmBMKz4c_919Qbb744fBdHA3vD8hO473Pv-Dn5L6fLbQZ2RLfcxf32fnXn7fGxqamw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2021+58th+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=Softermax%3A+Hardware%2FSoftware+Co-Design+of+an+Efficient+Softmax+for+Transformers&rft.au=Stevens%2C+Jacob+R.&rft.au=Venkatesan%2C+Rangharajan&rft.au=Dai%2C+Steve&rft.au=Khailany%2C+Brucek&rft.date=2021-12-05&rft.pub=IEEE&rft.spage=469&rft.epage=474&rft_id=info:doi/10.1109%2FDAC18074.2021.9586134&rft.externalDocID=9586134