Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers
Transformers have transformed the field of natural language processing. Their superior performance is largely attributed to the use of stacked "self-attention" layers, each of which consists of matrix multiplies as well as softmax operations. As a result, unlike other neural networks, the...
Uloženo v:
| Vydáno v: | 2021 58th ACM/IEEE Design Automation Conference (DAC) s. 469 - 474 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
05.12.2021
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Transformers have transformed the field of natural language processing. Their superior performance is largely attributed to the use of stacked "self-attention" layers, each of which consists of matrix multiplies as well as softmax operations. As a result, unlike other neural networks, the softmax operation accounts for a significant fraction of the total run-time of Transformers. To address this, we propose Softermax, a hardware-friendly softmax design. Softermax consists of base replacement, low-precision softmax computations, and an online normalization calculation. We show Softermax results in 2.35x the energy efficiency at 0.90x the size of a comparable baseline, with negligible impact on network accuracy. |
|---|---|
| DOI: | 10.1109/DAC18074.2021.9586134 |