Multi-agent deep reinforcement learning for trajectory planning in UAVs-assisted mobile edge computing with heterogeneous requirements

In heterogeneous wireless networks, massive user equipments (UEs) generate computing tasks with time-varying heterogeneous requirements. To improve the service quality, this paper formulates a unmanned aerial vehicles (UAVs)-assisted mobile edge computing (MEC) framework for time-varying heterogeneo...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computer networks (Amsterdam, Netherlands : 1999) Ročník 248; s. 110469
Hlavní autoři: Fan, Chenchen, Xu, Hongyu, Wang, Qingling
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.06.2024
Témata:
ISSN:1389-1286, 1872-7069
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:In heterogeneous wireless networks, massive user equipments (UEs) generate computing tasks with time-varying heterogeneous requirements. To improve the service quality, this paper formulates a unmanned aerial vehicles (UAVs)-assisted mobile edge computing (MEC) framework for time-varying heterogeneous task requirements. In the framework, the task delay and the number of successfully executed tasks are optimized by jointly controlling the trajectories of multiple UAVs. To address the considered trajectory planning optimization problem, a collaborative multi-agent deep reinforcement learning (MADRL) algorithm is proposed, where each UAV is regarded as a learning agent. First, a counterfactual inference based personalized policy update mechanism is proposed to evaluate the independent policy of agents by comparing the policy with a designed counterfactual policy. Based on this idea, each agent updates a personalized policy from both group and individual interests to improve its cooperation ability in dynamic and complex environments. Then, a diversified experience sampling mechanism is proposed to enhance the efficiency of policy evaluation and update with rich experiences provided by the environment interaction and the modified whale optimization algorithm. Finally, evaluation results demonstrate the superiority and effectiveness of the proposed MADRL algorithm.
ISSN:1389-1286
1872-7069
DOI:10.1016/j.comnet.2024.110469