A Microscaling Multi-Mode Gain-Cell Computing-in-Memory Macro for Advanced AI Edge Device

The microscaling (MX) format is an emerging data representation that quantizes high-bitwidth floating-point (FP) values into low-bitwidth FP-like values with a shared-scale (SS) exponent. When implemented with computing-in-memory (CIM), MX allows an attractive tradeoff between accuracy and hardware...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of solid-state circuits S. 1 - 14
Hauptverfasser:	Tien, Jen-Chun, Wu, Ping-Chun, Khwa, Win-San, Sanjay Lele, Ashwin, Su, Jian-Wei, Cheng, Chiao-Yen, Hsu, Jun-Ming, Chen, Yu-Chen, Hsieh, Le-Jung, Bai, Jyun-Cheng, Kao, Yu-Sheng, Lou, Tsung-Han, Wu, Jui-Jen, Lo, Chung-Chuan, Liu, Ren-Shuo, Hsieh, Chih-Cheng, Tang, Kea-Tiong, Chang, Meng-Fan
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	IEEE 2025
Schlagworte:	Accuracy Adders Artificial intelligence (AI) Artificial neural networks Common Information Model (computing) Common Information Model (electricity) Computer architecture computing-in-memory (CIM) Data transfer Energy consumption gain cell (GC) Hardware In-memory computing microscaling (MX) multiply-and-accumulate (MAC)
ISSN:	0018-9200, 1558-173X
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The microscaling (MX) format is an emerging data representation that quantizes high-bitwidth floating-point (FP) values into low-bitwidth FP-like values with a shared-scale (SS) exponent. When implemented with computing-in-memory (CIM), MX allows an attractive tradeoff between accuracy and hardware efficiency for specific neural network (NN) workloads. This work presents the first multi-mode gain-cell (GC) CIM macro capable of processing MX, integer (INT), and FP multiply-and-accumulate (MAC) operations with high energy efficiency (EEF) and area efficiency (AEF). The proposed macro employs four important innovations: 1) a multi-mode input processing unit (M2-IPU) with SS-variance-aware MAC flow (SS-VAF) for SS processing and SS alignment within the CIM macro to reduce system-to-CIM data transfer and compute energy; 2) a pattern-aware hybrid adder tree (PAH-ADT), which improves EEF and AEF by optimizing the common input patterns; 3) an accumulation-aware data flow (A2-DF) that adjusts the write path based on accumulation size to reduce data transfer energy; and 4) a 3.xT GC, which boosts data retention time (DRT) by increasing parasitic capacitance without additional area overhead. A 16-nm FinFET 216-kb MX-INT-FP multi-mode GC-CIM macro achieved 133.5 TFLOPS/W for MX-MAC with MXINT8 input, MXINT8 weight, and FP32 output; and 91.9 TFLOPS/W for FP-MAC with BF16 input, BF16 weight, and FP32 output.
ISSN:	0018-9200 1558-173X
DOI:	10.1109/JSSC.2025.3617944