A scalable all-digital near-memory computing architecture for edge AIoT applications

Saved in:
Bibliographic Details
Title: A scalable all-digital near-memory computing architecture for edge AIoT applications
Authors: Nouripayam, Masoud, Prieto, Arturo, Rodrigues, Joachim
Contributors: Lund University, Profile areas and other strong research environments, Strategic research areas (SRA), ELLIIT: the Linköping-Lund initiative on IT and mobile communication, Lunds universitet, Profilområden och andra starka forskningsmiljöer, Strategiska forskningsområden (SFO), ELLIIT: the Linköping-Lund initiative on IT and mobile communication, Originator, Lund University, Faculty of Engineering, LTH, LTH Profile areas, LTH Profile Area: AI and Digitalization, Lunds universitet, Lunds Tekniska Högskola, LTH profilområden, LTH profilområde: AI och digitalisering, Originator, Lund University, Faculty of Engineering, LTH, LTH Profile areas, LTH Profile Area: Nanoscience and Semiconductor Technology, Lunds universitet, Lunds Tekniska Högskola, LTH profilområden, LTH profilområde: Nanovetenskap och halvledarteknologi, Originator, Lund University, Faculty of Engineering, LTH, LTH Profile areas, LTH Profile Area: Engineering Health, Lunds universitet, Lunds Tekniska Högskola, LTH profilområden, LTH profilområde: Teknik för hälsa, Originator
Source: IEEE Access. 13:108609-108625
Subject Terms: Engineering and Technology, Electrical Engineering, Electronic Engineering, Information Engineering, Computer Systems, Teknik, Elektroteknik och elektronik, Datorsystem, Natural Sciences, Computer and Information Sciences, Computer Engineering, Naturvetenskap, Data- och informationsvetenskap (Datateknik), Datorteknik
Description: With the growing need to process large volumes of data, edge computing near data collection sources has become increasingly important. However, the resource constraints of edge devices require more efficient data processing techniques. Near-memory computing (NMC) presents an efficient solution, especially for data-intensive applications, by enabling processing that is both energy-efficient and hardware optimized. This work introduces a platform-agnostic NMC architecture tailored for convolutional neural network (CNN) workloads, integrated into the shared cache memory subsystem of a microcontroller unit (MCU). An open-source RISC-V MCU is chosen as the target platform due to its flexibility and low-power architecture. The NMC co-processor, operating alongside the general-purpose RISC-V core, forms a multi-core system-on-chip that combines low hardware cost with high energy efficiency, while maintaining a high degree of flexibility. The proposed design offers a configurable architecture capable of processing a wide range of CNN models with a computational efficiency of 94%. For evaluation purposes, widely recognized CNN benchmark models are utilized, showing a performance of 96GOPS and an energy efficiency of 1828GOPS/W for 8-bit precision at 200MHz. These results represent a significant improvement over both highly customized state-of-the-art hardware accelerators and multi-core MCU solutions.
Access URL: https://doi.org/10.1109/ACCESS.2025.3582013
Database: SwePub
Description
Abstract:With the growing need to process large volumes of data, edge computing near data collection sources has become increasingly important. However, the resource constraints of edge devices require more efficient data processing techniques. Near-memory computing (NMC) presents an efficient solution, especially for data-intensive applications, by enabling processing that is both energy-efficient and hardware optimized. This work introduces a platform-agnostic NMC architecture tailored for convolutional neural network (CNN) workloads, integrated into the shared cache memory subsystem of a microcontroller unit (MCU). An open-source RISC-V MCU is chosen as the target platform due to its flexibility and low-power architecture. The NMC co-processor, operating alongside the general-purpose RISC-V core, forms a multi-core system-on-chip that combines low hardware cost with high energy efficiency, while maintaining a high degree of flexibility. The proposed design offers a configurable architecture capable of processing a wide range of CNN models with a computational efficiency of 94%. For evaluation purposes, widely recognized CNN benchmark models are utilized, showing a performance of 96GOPS and an energy efficiency of 1828GOPS/W for 8-bit precision at 200MHz. These results represent a significant improvement over both highly customized state-of-the-art hardware accelerators and multi-core MCU solutions.
ISSN:21693536
DOI:10.1109/ACCESS.2025.3582013