Multiendpoint DAG-Driven Joint Partitioning-Offloading and Scheduling Optimization for DNN Inference
Model partitioning techniques, which decompose and collaboratively execute subtasks of deep neural networks (DNNs), have emerged as a critical strategy for enhancing distributed inference efficiency. However, in mobile edge computing (MEC), dynamic load fluctuations at edge nodes and the complexity...
Saved in:
| Published in: | IEEE internet of things journal Vol. 12; no. 19; pp. 41087 - 41102 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Piscataway
IEEE
01.10.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 2327-4662, 2327-4662 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Model partitioning techniques, which decompose and collaboratively execute subtasks of deep neural networks (DNNs), have emerged as a critical strategy for enhancing distributed inference efficiency. However, in mobile edge computing (MEC), dynamic load fluctuations at edge nodes and the complexity of cross-node task dependencies make the delay minimization problem extremely challenging. Existing studies predominantly adopt a decoupled optimization framework that separately addresses partitioning-offloading and pipeline scheduling, neglecting their inherent cyclic state-dependent coupling. This oversight leads to suboptimal solutions, such as pipeline stagnation caused by mismatched computation and communication timestamps. To address these challenges, we propose a multiendpoint directed acyclic graph (DAG)-driven cooperative optimization approach, enabling partitioning-offloading and pipeline scheduling in MEC. Specifically, the approach involves two core steps: 1) Dynamic prescheduling: We propose an improved DNN scheduling algorithm for constrained subtasks, which simulates node-level queuing delays and pipeline stalls under real-world constraints, translating runtime states into latency objectives. 2) Partitioning and offloading solution retrieval: Based on latency objectives, we introduce a novel multiendpoint DAG structure and design a multinode collaborative optimization retrieval algorithm, enabling adaptive partitioning-offloading remapping of subtasks. Experiments demonstrate the superiority of the proposed method over other advanced methods, reducing the time overhead by an average of 24% and 75% in two different scenarios, respectively. The resource code can be found at: https://github.com/aiheiheiheii/Partition_Scheduling.git . |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2327-4662 2327-4662 |
| DOI: | 10.1109/JIOT.2025.3591531 |