MILLION: MasterIng Long-Context LLM Inference Via Outlier-Immunized KV Product QuaNtization

Large language models (LLMs) are increasingly utilized for complex tasks requiring longer context lengths, with some models supporting up to 128 K or 1 M tokens. This trend, however, presents significant challenges in inference speed and memory management. The primary bottleneck in long-context LLM...

Full description

Saved in:
Bibliographic Details
Published in:2025 62nd ACM/IEEE Design Automation Conference (DAC) pp. 1 - 7
Main Authors: Wang, Zongwu, Xu, Peng, Liu, Fangxin, Hu, Yiwei, Sun, Qingxiao, Li, Gezi, Li, Cheng, Wang, Xuan, Jiang, Li, Guan, Haibing
Format: Conference Proceeding
Language:English
Published: IEEE 22.06.2025
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first