Efficient Transformer Inference with Statically Structured Sparse Attention

Self-attention matrices of Transformers are often highly sparse because the relevant context of each token is typically limited to just a few other tokens in the sequence. To reduce the computational burden of self-attention on Transformer inference, we propose static, structured, sparse attention m...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	2023 60th ACM/IEEE Design Automation Conference (DAC) s. 1 - 6
Hlavní autori:	Dai, Steve, Genc, Hasan, Venkatesan, Rangharajan, Khailany, Brucek
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 09.07.2023
Predmet:	Deep learning Design automation Energy consumption Inference algorithms Sparse matrices Task analysis Transformers
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Buďte prvý, kto okomentuje tento záznam!