Finding the Pareto Frontier of Low-Precision Data Formats and MAC Architecture for LLM Inference

To accelerate AI applications, numerous data formats and physical implementations of matrix multiplication have been proposed, creating a complex design space. This paper studies the efficient MAC implementation of the integer, floating-point, posit, and logarithmic number system (LNS) data formats...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7
Hlavní autori:	Crafton, Brian, Peng, Xiaochen, Sun, Xiaoyu, Lele, Ashwin, Zhang, Bo, Khwa, Win-San, Akarvardar, Kerem
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	IEEE 22.06.2025
Predmet:	Accuracy Artificial intelligence Design automation Noise Pareto optimization Quantization (signal)
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	To accelerate AI applications, numerous data formats and physical implementations of matrix multiplication have been proposed, creating a complex design space. This paper studies the efficient MAC implementation of the integer, floating-point, posit, and logarithmic number system (LNS) data formats and Microscaling (MX) and VectorScaled Quantization (VSQ) block data formats. We evaluate the area, power, and numerical accuracy (evaluated as signal-to-quantization noise ratio) of \mathbf{3 5, 0 0 0} MAC designs spanning each data format and several key design parameters such as the inner product size and accumulation width. We find that for the same numerical accuracy, pareto optimal MAC designs with emerging data formats (LNS16, MXINT8, VSQINT4) achieve 1.8 \times 2.2 \times, and 1.9 \times TOPs/W improvement compared to FP16, FP8, and FP4 dot product implementations.
DOI:	10.1109/DAC63849.2025.11132989