Finding the Pareto Frontier of Low-Precision Data Formats and MAC Architecture for LLM Inference
To accelerate AI applications, numerous data formats and physical implementations of matrix multiplication have been proposed, creating a complex design space. This paper studies the efficient MAC implementation of the integer, floating-point, posit, and logarithmic number system (LNS) data formats...
Uloženo v:
| Vydáno v: | 2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
22.06.2025
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | To accelerate AI applications, numerous data formats and physical implementations of matrix multiplication have been proposed, creating a complex design space. This paper studies the efficient MAC implementation of the integer, floating-point, posit, and logarithmic number system (LNS) data formats and Microscaling (MX) and VectorScaled Quantization (VSQ) block data formats. We evaluate the area, power, and numerical accuracy (evaluated as signal-to-quantization noise ratio) of \mathbf{3 5, 0 0 0} MAC designs spanning each data format and several key design parameters such as the inner product size and accumulation width. We find that for the same numerical accuracy, pareto optimal MAC designs with emerging data formats (LNS16, MXINT8, VSQINT4) achieve 1.8 \times 2.2 \times, and 1.9 \times TOPs/W improvement compared to FP16, FP8, and FP4 dot product implementations. |
|---|---|
| DOI: | 10.1109/DAC63849.2025.11132989 |