Efficient Transformer Inference with Statically Structured Sparse Attention

Self-attention matrices of Transformers are often highly sparse because the relevant context of each token is typically limited to just a few other tokens in the sequence. To reduce the computational burden of self-attention on Transformer inference, we propose static, structured, sparse attention m...

Full description

Saved in:

Bibliographic Details
Published in:	2023 60th ACM/IEEE Design Automation Conference (DAC) pp. 1 - 6
Main Authors:	Dai, Steve, Genc, Hasan, Venkatesan, Rangharajan, Khailany, Brucek
Format:	Conference Proceeding
Language:	English
Published:	IEEE 09.07.2023
Subjects:	Deep learning Design automation Energy consumption Inference algorithms Sparse matrices Task analysis Transformers
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!