Finding the Pareto Frontier of Low-Precision Data Formats and MAC Architecture for LLM Inference

To accelerate AI applications, numerous data formats and physical implementations of matrix multiplication have been proposed, creating a complex design space. This paper studies the efficient MAC implementation of the integer, floating-point, posit, and logarithmic number system (LNS) data formats...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7
Hlavní autoři: Crafton, Brian, Peng, Xiaochen, Sun, Xiaoyu, Lele, Ashwin, Zhang, Bo, Khwa, Win-San, Akarvardar, Kerem
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 22.06.2025
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:To accelerate AI applications, numerous data formats and physical implementations of matrix multiplication have been proposed, creating a complex design space. This paper studies the efficient MAC implementation of the integer, floating-point, posit, and logarithmic number system (LNS) data formats and Microscaling (MX) and VectorScaled Quantization (VSQ) block data formats. We evaluate the area, power, and numerical accuracy (evaluated as signal-to-quantization noise ratio) of \mathbf{3 5, 0 0 0} MAC designs spanning each data format and several key design parameters such as the inner product size and accumulation width. We find that for the same numerical accuracy, pareto optimal MAC designs with emerging data formats (LNS16, MXINT8, VSQINT4) achieve 1.8 \times 2.2 \times, and 1.9 \times TOPs/W improvement compared to FP16, FP8, and FP4 dot product implementations.
DOI:10.1109/DAC63849.2025.11132989