Multiendpoint DAG-Driven Joint Partitioning-Offloading and Scheduling Optimization for DNN Inference

Model partitioning techniques, which decompose and collaboratively execute subtasks of deep neural networks (DNNs), have emerged as a critical strategy for enhancing distributed inference efficiency. However, in mobile edge computing (MEC), dynamic load fluctuations at edge nodes and the complexity...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE internet of things journal Vol. 12; no. 19; pp. 41087 - 41102
Main Authors:	Yan, Xiukun, Zhang, Xuexue, Zeng, Kai, Bai, Fenhua, Shen, Tao, Cao, Bin
Format:	Journal Article
Language:	English
Published:	Piscataway IEEE 01.10.2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Adaptive algorithms Artificial neural networks Chemical partition Collaboration Computational modeling Constrained scheduling deep neural network (DNN) inference Delays Design optimization distributed computing Dynamic loads Dynamic scheduling Edge computing Inference Internet of Things Load fluctuation Mobile computing model partitioning multiendpoint directed acyclic graph (DAG) Optimization Partitioning Pipelines Queueing Resource management Retrieval Scheduling Servers
ISSN:	2327-4662, 2327-4662
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Model partitioning techniques, which decompose and collaboratively execute subtasks of deep neural networks (DNNs), have emerged as a critical strategy for enhancing distributed inference efficiency. However, in mobile edge computing (MEC), dynamic load fluctuations at edge nodes and the complexity of cross-node task dependencies make the delay minimization problem extremely challenging. Existing studies predominantly adopt a decoupled optimization framework that separately addresses partitioning-offloading and pipeline scheduling, neglecting their inherent cyclic state-dependent coupling. This oversight leads to suboptimal solutions, such as pipeline stagnation caused by mismatched computation and communication timestamps. To address these challenges, we propose a multiendpoint directed acyclic graph (DAG)-driven cooperative optimization approach, enabling partitioning-offloading and pipeline scheduling in MEC. Specifically, the approach involves two core steps: 1) Dynamic prescheduling: We propose an improved DNN scheduling algorithm for constrained subtasks, which simulates node-level queuing delays and pipeline stalls under real-world constraints, translating runtime states into latency objectives. 2) Partitioning and offloading solution retrieval: Based on latency objectives, we introduce a novel multiendpoint DAG structure and design a multinode collaborative optimization retrieval algorithm, enabling adaptive partitioning-offloading remapping of subtasks. Experiments demonstrate the superiority of the proposed method over other advanced methods, reducing the time overhead by an average of 24% and 75% in two different scenarios, respectively. The resource code can be found at: https://github.com/aiheiheiheii/Partition_Scheduling.git .
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2327-4662 2327-4662
DOI:	10.1109/JIOT.2025.3591531