Finding the Pareto Frontier of Low-Precision Data Formats and MAC Architecture for LLM Inference

To accelerate AI applications, numerous data formats and physical implementations of matrix multiplication have been proposed, creating a complex design space. This paper studies the efficient MAC implementation of the integer, floating-point, posit, and logarithmic number system (LNS) data formats...

Full description

Saved in:

Bibliographic Details
Published in:	2025 62nd ACM/IEEE Design Automation Conference (DAC) pp. 1 - 7
Main Authors:	Crafton, Brian, Peng, Xiaochen, Sun, Xiaoyu, Lele, Ashwin, Zhang, Bo, Khwa, Win-San, Akarvardar, Kerem
Format:	Conference Proceeding
Language:	English
Published:	IEEE 22.06.2025
Subjects:	Accuracy Artificial intelligence Design automation Noise Pareto optimization Quantization (signal)
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	To accelerate AI applications, numerous data formats and physical implementations of matrix multiplication have been proposed, creating a complex design space. This paper studies the efficient MAC implementation of the integer, floating-point, posit, and logarithmic number system (LNS) data formats and Microscaling (MX) and VectorScaled Quantization (VSQ) block data formats. We evaluate the area, power, and numerical accuracy (evaluated as signal-to-quantization noise ratio) of \mathbf{3 5, 0 0 0} MAC designs spanning each data format and several key design parameters such as the inner product size and accumulation width. We find that for the same numerical accuracy, pareto optimal MAC designs with emerging data formats (LNS16, MXINT8, VSQINT4) achieve 1.8 \times 2.2 \times, and 1.9 \times TOPs/W improvement compared to FP16, FP8, and FP4 dot product implementations.
DOI:	10.1109/DAC63849.2025.11132989