Efficient Transformer Inference with Statically Structured Sparse Attention

Self-attention matrices of Transformers are often highly sparse because the relevant context of each token is typically limited to just a few other tokens in the sequence. To reduce the computational burden of self-attention on Transformer inference, we propose static, structured, sparse attention m...

Full description

Saved in:
Bibliographic Details
Published in:2023 60th ACM/IEEE Design Automation Conference (DAC) pp. 1 - 6
Main Authors: Dai, Steve, Genc, Hasan, Venkatesan, Rangharajan, Khailany, Brucek
Format: Conference Proceeding
Language:English
Published: IEEE 09.07.2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first