Overview of Memory-Efficient Architectures for Deep Learning in Real-Time Systems

With advancements in artificial intelligence (AI), deep learning (DL) has become crucial for real-time data analytics in areas like autonomous driving, healthcare, and predictive maintenance; however, its computational and memory demands often exceed the capabilities of low-end devices. This paper e...

Full description

Saved in:

Bibliographic Details
Published in:	Engineering proceedings Vol. 104; no. 1; p. 77
Main Authors:	Bilgin Demir, Ervin Domazet, Daniela Mechkaroska
Format:	Journal Article
Language:	English
Published:	MDPI AG 01.09.2025
Subjects:	deep learning low-power devices memory-efficient architectures parameter pruning quantization real-time data processing
ISSN:	2673-4591
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	With advancements in artificial intelligence (AI), deep learning (DL) has become crucial for real-time data analytics in areas like autonomous driving, healthcare, and predictive maintenance; however, its computational and memory demands often exceed the capabilities of low-end devices. This paper explores optimizing deep learning architectures for memory efficiency to enable real-time computation in low-power designs. Strategies include model compression, quantization, and efficient network designs. Techniques such as eliminating unnecessary parameters, sparse representations, and optimized data handling significantly enhance system performance. The design addresses cache utilization, memory hierarchies, and data movement, reducing latency and energy use. By comparing memory management methods, this study highlights dynamic pruning and adaptive compression as effective solutions for improving efficiency and performance. These findings guide the development of accurate, power-efficient deep learning systems for real-time applications, unlocking new possibilities for edge and embedded AI.
ISSN:	2673-4591
DOI:	10.3390/engproc2025104077