Delay-Aware Power Control for Downlink Multi-User MIMO via Constrained Deep Reinforcement Learning

We investigate the downlink transmission for multi-user multi-input multi-out (MU-MIMO) system, in which the regularized zero forcing (RZF) precoder is adopted and the power allocation and regularization factor are optimized. Our aim is to find a power allocation and regularization factor control po...

Full description

Saved in:

Bibliographic Details
Published in:	2021 IEEE Global Communications Conference (GLOBECOM) pp. 1 - 6
Main Authors:	Tian, Chang, Huang, Guan, Liu, An, Luo, Wu
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.12.2021
Subjects:	Approximation algorithms Costs Downlink Power control Power demand Reinforcement learning Simulation
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We investigate the downlink transmission for multi-user multi-input multi-out (MU-MIMO) system, in which the regularized zero forcing (RZF) precoder is adopted and the power allocation and regularization factor are optimized. Our aim is to find a power allocation and regularization factor control policy that can minimize the long-term average power consumption subject to long-term delay constraint for each user. The induced optimization problem is formulated as a constrained Markov decision process (CMDP), which is efficiently solved by the proposed constrained deep reinforcement learning algorithm, called successive convex approximation policy optimization (SCAPO). The SCAPO is based on solving a sequence of convex objective/feasibility optimization problems obtained by replacing the objective and constraint functions in the original problems with convex surrogate functions. At each iteration, the SCAPO merely needs to estimate the first-order information and solve a convex surrogate problem that can be efficiently parallel tackled. Moreover, the SCAPO enables to reuse old experiences from previous updates, thereby significantly reducing the implementation cost. Numerical results have shown that the novel SCAPO can achieve the state-of-the-art performance over advanced baselines.
DOI:	10.1109/GLOBECOM46510.2021.9685617