Goal-Oriented Reinforcement Learning in THz-Enabled UAV-Aided Network Using Supervised Learning

Deep reinforcement learning (DRL) has been a key machine learning technique in many 5G and 6G applications. DRL agents learn optimal (or sub-optimal) policies by interacting with the environment. However, this process often involves numerous uninformative and repetitive message transmissions between...

Full description

Saved in:
Bibliographic Details
Published in:IEEE open journal of the Communications Society Vol. 5; pp. 5027 - 5036
Main Authors: Termehchi, Atefeh, Bao, Tingnan, Syed, Aisha, Sean Kennedy, William, Erol-Kantarci, Melike
Format: Journal Article
Language:English
Published: IEEE 2024
Subjects:
ISSN:2644-125X, 2644-125X
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep reinforcement learning (DRL) has been a key machine learning technique in many 5G and 6G applications. DRL agents learn optimal (or sub-optimal) policies by interacting with the environment. However, this process often involves numerous uninformative and repetitive message transmissions between the DRL agent and its environment. In this paper, we address the problem of reducing interactions between the DRL agent and the environment, called goal-oriented DRL. Meanwhile, Terahertz (THz) bands and unmanned aerial vehicles (UAVs) are considered two of the main enablers of 6G. Therefore, we investigate the goal-oriented DRL problem in a THz-enabled UAV-aided network. We formulate it as an optimization problem with the goals of i) reducing interactions between the UAV (DRL agent) and IoT devices (environment), ii) maximizing the number of served IoT devices, and iii) ensuring fairness. The constraints include the movement characteristics of IoT devices, the maximum speed limitation of the UAV, the QoS requirements of the served IoT devices, and the limited uplink coverage of the THz-enabled UAV. This problem is a mixed-integer nonlinear programming optimization problem and is NP-hard. To address this problem, we employ the decoupling optimization method and an approach inspired by the self-triggered method from control engineering. Specifically, the problem is divided into two sub-problems; Then, we propose using supervised learning as a teacher for DRL to reduce the interactions. Our simulation results show that the goal-oriented DRL approach outperforms conventional methods by reducing interactions and maintaining good performance in terms of the number of served IoT devices and fairness.
ISSN:2644-125X
2644-125X
DOI:10.1109/OJCOMS.2024.3442709