Joint Multi-objective Optimization for Radio Access Network Slicing Using Multi-agent Deep Reinforcement Learning
Radio access network (RAN) slices can provide various customized services for next-generation wireless networks. Thus, multiple performance metrics of different types of RAN slices need to be jointly optimized. However, existing efforts in multi-objective optimization problem (MOOP) for RAN slicing...
Saved in:
| Published in: | IEEE transactions on vehicular technology Vol. 72; no. 9; pp. 1 - 16 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York
IEEE
01.09.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 0018-9545, 1939-9359 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Radio access network (RAN) slices can provide various customized services for next-generation wireless networks. Thus, multiple performance metrics of different types of RAN slices need to be jointly optimized. However, existing efforts in multi-objective optimization problem (MOOP) for RAN slicing are only in the scalar form, which is difficult to achieve simultaneous optimization. In this paper, we consider a non-scalar MOOP for RAN slicing with three types of slices, i.e. , the high-bandwidth slice, the low-delay slice, and the wide-coverage slice over the same underlying physical network. We jointly optimize the throughput, the transmission delay, and the coverage area by user-oriented dynamic virtual base stations (vBSs)' deployment, and sub-channel and power allocation. An improved multi-agent deep deterministic policy gradient (IMADDPG) algorithm, having the characteristics of centralized training and distributed execution, is proposed to solve the above non-deterministic polynomial-time hard (NP-hard) problem. The rank voting method is introduced in the inference process to obtain near-Pareto optimal solutions. Simulation results verify that the proposed scheme can ensure better performance than the traditional scalar utility method and other benchmark algorithms. The proposed scheme has the advantage of flexibly approaching any point of the Pareto boundary, while the traditional scalar method only subjectively approaches one of the Pareto optimal solutions. Furthermore, our proposal strikes a compelling tradeoff among three types of RAN slices due to the non-dominance between Pareto optimal solutions. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0018-9545 1939-9359 |
| DOI: | 10.1109/TVT.2023.3268671 |