A Full-system, Programmable, and Extensible In-Memory Computing Simulation Framework for Deep Learning

In-memory computing (IMC) has established itself as an attractive alternative to hardware accelerators in addressing the memory wall problem for artificial intelligence (AI) workloads. However, designing programmable IMC-based computing platforms for today's large generative AI models, such as...

Full description

Saved in:
Bibliographic Details
Published in:2025 62nd ACM/IEEE Design Automation Conference (DAC) pp. 1 - 7
Main Authors: Zhou, Kaining, Huang, Jian, Kim, Nam Sung, Shanbhag, Naresh
Format: Conference Proceeding
Language:English
Published: IEEE 22.06.2025
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In-memory computing (IMC) has established itself as an attractive alternative to hardware accelerators in addressing the memory wall problem for artificial intelligence (AI) workloads. However, designing programmable IMC-based computing platforms for today's large generative AI models, such as large language models (LLMs) and diffusion transformers (DiTs), is hindered by the absence of a simulator that is able to address the associated scalability challenges while simultaneously incorporating device and circuit-level behaviors intrinsic to IMCs. To address this challenge, we present IMCsim, a versatile fullsystem IMC simulation framework. IMCsim integrates software runtime libraries for AI models, introduces a new set of ISA extensions to express common tensor operators, and provides flexibility in mapping these operators to various IMC architectures. As such, IMCsim enables designers to explore trade-offs between performance, energy, area, and computational accuracy for various IMC design choices. To demonstrate the functionality, efficiency, and versatility of IMCsim, we model three types of IMCs: (1) embedded non-volatile memory (eNVM)-based, (2) SRAM-based, and (3) digital IMCs. We validate IMCsim using measured data from two laboratory-tested IMC prototype ICs-a 22 nm MRAM-based IMC and a 28 nm SRAM-based IMC-and a digital IMC design in 28 nm. Next, we demonstrate the utility of IMCsim by exploring the architectural design space to obtain insights for maximizing utilization of IMC-based processors for diverse workloads-ResNet-18, Llama, and a DiT-using the three IMC types. Finally, we employ IMCsim as a design tool to obtain an efficient chip architecture and layout in 28 nm for a lightweight DiT.
DOI:10.1109/DAC63849.2025.11132463