Multiendpoint DAG-Driven Joint Partitioning-Offloading and Scheduling Optimization for DNN Inference
Model partitioning techniques, which decompose and collaboratively execute subtasks of deep neural networks (DNNs), have emerged as a critical strategy for enhancing distributed inference efficiency. However, in mobile edge computing (MEC), dynamic load fluctuations at edge nodes and the complexity...
Uložené v:
| Vydané v: | IEEE internet of things journal Ročník 12; číslo 19; s. 41087 - 41102 |
|---|---|
| Hlavní autori: | , , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Piscataway
IEEE
01.10.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Predmet: | |
| ISSN: | 2327-4662, 2327-4662 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Model partitioning techniques, which decompose and collaboratively execute subtasks of deep neural networks (DNNs), have emerged as a critical strategy for enhancing distributed inference efficiency. However, in mobile edge computing (MEC), dynamic load fluctuations at edge nodes and the complexity of cross-node task dependencies make the delay minimization problem extremely challenging. Existing studies predominantly adopt a decoupled optimization framework that separately addresses partitioning-offloading and pipeline scheduling, neglecting their inherent cyclic state-dependent coupling. This oversight leads to suboptimal solutions, such as pipeline stagnation caused by mismatched computation and communication timestamps. To address these challenges, we propose a multiendpoint directed acyclic graph (DAG)-driven cooperative optimization approach, enabling partitioning-offloading and pipeline scheduling in MEC. Specifically, the approach involves two core steps: 1) Dynamic prescheduling: We propose an improved DNN scheduling algorithm for constrained subtasks, which simulates node-level queuing delays and pipeline stalls under real-world constraints, translating runtime states into latency objectives. 2) Partitioning and offloading solution retrieval: Based on latency objectives, we introduce a novel multiendpoint DAG structure and design a multinode collaborative optimization retrieval algorithm, enabling adaptive partitioning-offloading remapping of subtasks. Experiments demonstrate the superiority of the proposed method over other advanced methods, reducing the time overhead by an average of 24% and 75% in two different scenarios, respectively. The resource code can be found at: https://github.com/aiheiheiheii/Partition_Scheduling.git . |
|---|---|
| Bibliografia: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2327-4662 2327-4662 |
| DOI: | 10.1109/JIOT.2025.3591531 |