Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers

Transformers have transformed the field of natural language processing. Their superior performance is largely attributed to the use of stacked "self-attention" layers, each of which consists of matrix multiplies as well as softmax operations. As a result, unlike other neural networks, the...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	2021 58th ACM/IEEE Design Automation Conference (DAC) s. 469 - 474
Hlavní autori:	Stevens, Jacob R., Venkatesan, Rangharajan, Dai, Steve, Khailany, Brucek, Raghunathan, Anand
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 05.12.2021
Predmet:	Deep learning Design automation Hardware hardware/software codesign Natural language processing neural network accelerators Neural networks Software Transformers
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Transformers have transformed the field of natural language processing. Their superior performance is largely attributed to the use of stacked "self-attention" layers, each of which consists of matrix multiplies as well as softmax operations. As a result, unlike other neural networks, the softmax operation accounts for a significant fraction of the total run-time of Transformers. To address this, we propose Softermax, a hardware-friendly softmax design. Softermax consists of base replacement, low-precision softmax computations, and an online normalization calculation. We show Softermax results in 2.35x the energy efficiency at 0.90x the size of a comparable baseline, with negligible impact on network accuracy.
DOI:	10.1109/DAC18074.2021.9586134