| Contributors: |
Lund University, Profile areas and other strong research environments, Strategic research areas (SRA), ELLIIT: the Linköping-Lund initiative on IT and mobile communication, Lunds universitet, Profilområden och andra starka forskningsmiljöer, Strategiska forskningsområden (SFO), ELLIIT: the Linköping-Lund initiative on IT and mobile communication, Originator, Lund University, Faculty of Engineering, LTH, LTH Profile areas, LTH Profile Area: AI and Digitalization, Lunds universitet, Lunds Tekniska Högskola, LTH profilområden, LTH profilområde: AI och digitalisering, Originator, Lund University, Faculty of Engineering, LTH, LTH Profile areas, LTH Profile Area: Nanoscience and Semiconductor Technology, Lunds universitet, Lunds Tekniska Högskola, LTH profilområden, LTH profilområde: Nanovetenskap och halvledarteknologi, Originator, Lund University, Faculty of Engineering, LTH, LTH Profile areas, LTH Profile Area: Engineering Health, Lunds universitet, Lunds Tekniska Högskola, LTH profilområden, LTH profilområde: Teknik för hälsa, Originator |
| Description: |
With the growing need to process large volumes of data, edge computing near data collection sources has become increasingly important. However, the resource constraints of edge devices require more efficient data processing techniques. Near-memory computing (NMC) presents an efficient solution, especially for data-intensive applications, by enabling processing that is both energy-efficient and hardware optimized. This work introduces a platform-agnostic NMC architecture tailored for convolutional neural network (CNN) workloads, integrated into the shared cache memory subsystem of a microcontroller unit (MCU). An open-source RISC-V MCU is chosen as the target platform due to its flexibility and low-power architecture. The NMC co-processor, operating alongside the general-purpose RISC-V core, forms a multi-core system-on-chip that combines low hardware cost with high energy efficiency, while maintaining a high degree of flexibility. The proposed design offers a configurable architecture capable of processing a wide range of CNN models with a computational efficiency of 94%. For evaluation purposes, widely recognized CNN benchmark models are utilized, showing a performance of 96GOPS and an energy efficiency of 1828GOPS/W for 8-bit precision at 200MHz. These results represent a significant improvement over both highly customized state-of-the-art hardware accelerators and multi-core MCU solutions. |