AttentionLib: A Scalable Optimization Framework for Automated Attention Acceleration on FPGA
The self-attention mechanism is a fundamental component within transformer-based models. Nowadays, as the length of sequences processed by large language models (LLMs) continues to increase, the attention mechanism has gradually become a bottleneck in model inference. The LLM inference process can b...
Saved in:
| Published in: | Proceedings - Design, Automation, and Test in Europe Conference and Exhibition pp. 1 - 7 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
EDAA
31.03.2025
|
| Subjects: | |
| ISSN: | 1558-1101 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!