Finding the Pareto Frontier of Low-Precision Data Formats and MAC Architecture for LLM Inference

To accelerate AI applications, numerous data formats and physical implementations of matrix multiplication have been proposed, creating a complex design space. This paper studies the efficient MAC implementation of the integer, floating-point, posit, and logarithmic number system (LNS) data formats...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2025 62nd ACM/IEEE Design Automation Conference (DAC) S. 1 - 7
Hauptverfasser: Crafton, Brian, Peng, Xiaochen, Sun, Xiaoyu, Lele, Ashwin, Zhang, Bo, Khwa, Win-San, Akarvardar, Kerem
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 22.06.2025
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:To accelerate AI applications, numerous data formats and physical implementations of matrix multiplication have been proposed, creating a complex design space. This paper studies the efficient MAC implementation of the integer, floating-point, posit, and logarithmic number system (LNS) data formats and Microscaling (MX) and VectorScaled Quantization (VSQ) block data formats. We evaluate the area, power, and numerical accuracy (evaluated as signal-to-quantization noise ratio) of \mathbf{3 5, 0 0 0} MAC designs spanning each data format and several key design parameters such as the inner product size and accumulation width. We find that for the same numerical accuracy, pareto optimal MAC designs with emerging data formats (LNS16, MXINT8, VSQINT4) achieve 1.8 \times 2.2 \times, and 1.9 \times TOPs/W improvement compared to FP16, FP8, and FP4 dot product implementations.
DOI:10.1109/DAC63849.2025.11132989