Multiendpoint DAG-Driven Joint Partitioning-Offloading and Scheduling Optimization for DNN Inference

Model partitioning techniques, which decompose and collaboratively execute subtasks of deep neural networks (DNNs), have emerged as a critical strategy for enhancing distributed inference efficiency. However, in mobile edge computing (MEC), dynamic load fluctuations at edge nodes and the complexity...

Full description

Saved in:
Bibliographic Details
Published in:IEEE internet of things journal Vol. 12; no. 19; pp. 41087 - 41102
Main Authors: Yan, Xiukun, Zhang, Xuexue, Zeng, Kai, Bai, Fenhua, Shen, Tao, Cao, Bin
Format: Journal Article
Language:English
Published: Piscataway IEEE 01.10.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:2327-4662, 2327-4662
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Model partitioning techniques, which decompose and collaboratively execute subtasks of deep neural networks (DNNs), have emerged as a critical strategy for enhancing distributed inference efficiency. However, in mobile edge computing (MEC), dynamic load fluctuations at edge nodes and the complexity of cross-node task dependencies make the delay minimization problem extremely challenging. Existing studies predominantly adopt a decoupled optimization framework that separately addresses partitioning-offloading and pipeline scheduling, neglecting their inherent cyclic state-dependent coupling. This oversight leads to suboptimal solutions, such as pipeline stagnation caused by mismatched computation and communication timestamps. To address these challenges, we propose a multiendpoint directed acyclic graph (DAG)-driven cooperative optimization approach, enabling partitioning-offloading and pipeline scheduling in MEC. Specifically, the approach involves two core steps: 1) Dynamic prescheduling: We propose an improved DNN scheduling algorithm for constrained subtasks, which simulates node-level queuing delays and pipeline stalls under real-world constraints, translating runtime states into latency objectives. 2) Partitioning and offloading solution retrieval: Based on latency objectives, we introduce a novel multiendpoint DAG structure and design a multinode collaborative optimization retrieval algorithm, enabling adaptive partitioning-offloading remapping of subtasks. Experiments demonstrate the superiority of the proposed method over other advanced methods, reducing the time overhead by an average of 24% and 75% in two different scenarios, respectively. The resource code can be found at: https://github.com/aiheiheiheii/Partition_Scheduling.git .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2327-4662
2327-4662
DOI:10.1109/JIOT.2025.3591531