iMCU: A 28-nm Digital In-Memory Computing-Based Microcontroller Unit for TinyML

Tiny machine learning (TinyML) envisions executing a deep neural network (DNN)-based inference on an edge device for improving battery life, latency, security, and privacy. Toward this vision, recent microcontroller units (MCUs) integrate in-memory computing (IMC) hardware to leverage its high energ...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE journal of solid-state circuits Vol. 59; no. 8; pp. 2684 - 2693
Main Authors:	Lin, Chuan-Tung, Huang, Paul Xuanyuanliang, Oh, Jonghyun, Wang, Dewei, Seok, Mingoo
Format:	Journal Article
Language:	English
Published:	New York IEEE 01.08.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Artificial neural networks Computation Computer architecture Data models Deep learning Density Energy efficiency Hardware hardware/software co-design In-memory computing in-memory computing (IMC) Machine learning microcontroller units (MCUs) Microcontrollers Multiplication neural network accelerators Random access memory Software Software development State-of-the-art reviews Static random access memory tiny machine learning (TinyML)
ISSN:	0018-9200, 1558-173X
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Tiny machine learning (TinyML) envisions executing a deep neural network (DNN)-based inference on an edge device for improving battery life, latency, security, and privacy. Toward this vision, recent microcontroller units (MCUs) integrate in-memory computing (IMC) hardware to leverage its high energy efficiency and throughput in vector-matrix multiplication (VMM). However, those existing works require large IMC hardware, severely increasing the area overhead. In addition, most existing works use analog-mixed-signal (AMS) IMC hardware, exhibiting limited robustness over process, voltage, and temperature (PVT) variations. Finally, none can support a practical software development framework such as TensorFlow Lite for Microcontrollers (TFLite-micro). Due to these limitations, those MCUs did not present the performance for the standard benchmark MLPerf-Tiny, which makes it difficult to evaluate them against the state-of-the-art neural (not necessarily IMC-based) MCUs. In this article, we design a new IMC-based MCU, titled iMCU, for TinyML to address those challenges. In the design process, we: 1) define the optimal set of acceleration targets and 2) devise an area-efficient computation flow that requires the least amount of IMC hardware yet still provides a significant acceleration. In addition, we develop: 1) state-of-the-art digital IMC macros and 2) create the accelerator based on the macros, which can support the proposed computation flow in a fully pipelined manner. Combining those innovations, we prototyped the iMCU in a 28-nm CMOS. The measurement results show that the iMCU significantly outperforms the prior IMC-based MCUs in compute density, energy efficiency, and SRAM density (total SRAM size/total SRAM area). It also achieves a compact footprint of 2.73 mm2.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9200 1558-173X
DOI:	10.1109/JSSC.2024.3362274