Joint Multi-objective Optimization for Radio Access Network Slicing Using Multi-agent Deep Reinforcement Learning

Radio access network (RAN) slices can provide various customized services for next-generation wireless networks. Thus, multiple performance metrics of different types of RAN slices need to be jointly optimized. However, existing efforts in multi-objective optimization problem (MOOP) for RAN slicing...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on vehicular technology Vol. 72; no. 9; pp. 1 - 16
Main Authors:	Zhou, Guorong, Zhao, Liqiang, Zheng, Gan, Xie, Zhijie, Song, Shenghui, Chen, Kwang-Cheng
Format:	Journal Article
Language:	English
Published:	New York IEEE 01.09.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Delays Heuristic algorithms Hierarchies multi-agent deep reinforcement learning multi-objective optimization Multiagent systems Multiple objective analysis Network slicing non-scalarization Optimization Pareto optimization Pareto optimum Performance measurement Polynomials Radio access network slicing rank voting method Resource management Throughput Wireless networks
ISSN:	0018-9545, 1939-9359
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Radio access network (RAN) slices can provide various customized services for next-generation wireless networks. Thus, multiple performance metrics of different types of RAN slices need to be jointly optimized. However, existing efforts in multi-objective optimization problem (MOOP) for RAN slicing are only in the scalar form, which is difficult to achieve simultaneous optimization. In this paper, we consider a non-scalar MOOP for RAN slicing with three types of slices, i.e. , the high-bandwidth slice, the low-delay slice, and the wide-coverage slice over the same underlying physical network. We jointly optimize the throughput, the transmission delay, and the coverage area by user-oriented dynamic virtual base stations (vBSs)' deployment, and sub-channel and power allocation. An improved multi-agent deep deterministic policy gradient (IMADDPG) algorithm, having the characteristics of centralized training and distributed execution, is proposed to solve the above non-deterministic polynomial-time hard (NP-hard) problem. The rank voting method is introduced in the inference process to obtain near-Pareto optimal solutions. Simulation results verify that the proposed scheme can ensure better performance than the traditional scalar utility method and other benchmark algorithms. The proposed scheme has the advantage of flexibly approaching any point of the Pareto boundary, while the traditional scalar method only subjectively approaches one of the Pareto optimal solutions. Furthermore, our proposal strikes a compelling tradeoff among three types of RAN slices due to the non-dominance between Pareto optimal solutions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9545 1939-9359
DOI:	10.1109/TVT.2023.3268671