A Microscaling Multi-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Device

The microscaling (MX) format is an emerging data representation that quantizes high-bitwidth floating-point (FP) values into low-bitwidth FP-like values with a shared-scale (SS) exponent. When implemented with computing-in-memory (CIM), MX allows an attractive tradeoff between accuracy and hardware...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE journal of solid-state circuits S. 1 - 14
Hauptverfasser: Tien, Jen-Chun, Wu, Ping-Chun, Khwa, Win-San, Sanjay Lele, Ashwin, Su, Jian-Wei, Cheng, Chiao-Yen, Hsu, Jun-Ming, Chen, Yu-Chen, Hsieh, Le-Jung, Bai, Jyun-Cheng, Kao, Yu-Sheng, Lou, Tsung-Han, Wu, Jui-Jen, Lo, Chung-Chuan, Liu, Ren-Shuo, Hsieh, Chih-Cheng, Tang, Kea-Tiong, Chang, Meng-Fan
Format: Journal Article
Sprache:Englisch
Veröffentlicht: IEEE 2025
Schlagworte:
ISSN:0018-9200, 1558-173X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The microscaling (MX) format is an emerging data representation that quantizes high-bitwidth floating-point (FP) values into low-bitwidth FP-like values with a shared-scale (SS) exponent. When implemented with computing-in-memory (CIM), MX allows an attractive tradeoff between accuracy and hardware efficiency for specific neural network (NN) workloads. This work presents the first multi-mode gain-cell (GC) CIM macro capable of processing MX, integer (INT), and FP multiply-and-accumulate (MAC) operations with high energy efficiency (EEF) and area efficiency (AEF). The proposed macro employs four important innovations: 1) a multi-mode input processing unit (M2-IPU) with SS-variance-aware MAC flow (SS-VAF) for SS processing and SS alignment within the CIM macro to reduce system-to-CIM data transfer and compute energy; 2) a pattern-aware hybrid adder tree (PAH-ADT), which improves EEF and AEF by optimizing the common input patterns; 3) an accumulation-aware data flow (A2-DF) that adjusts the write path based on accumulation size to reduce data transfer energy; and 4) a 3.xT GC, which boosts data retention time (DRT) by increasing parasitic capacitance without additional area overhead. A 16-nm FinFET 216-kb MX-INT-FP multi-mode GC-CIM macro achieved 133.5 TFLOPS/W for MX-MAC with MXINT8 input, MXINT8 weight, and FP32 output; and 91.9 TFLOPS/W for FP-MAC with BF16 input, BF16 weight, and FP32 output.
ISSN:0018-9200
1558-173X
DOI:10.1109/JSSC.2025.3617944