SMAC-tuned Deep Q-learning for Ramp Metering

The demand for transportation increases as the population of a city grows, and significant expansion is not conceivable because of spatial, financial, and environmental limitations. As a result, improving infrastructure efficiency is becoming increasingly critical. Ramp metering with deep reinforcem...

Full description

Saved in:

Bibliographic Details
Published in:	2023 IEEE International Conference on Smart Mobility (SM) pp. 65 - 72
Main Authors:	ElSamadisy, Omar, Abdulhai, Yazeed, Xue, Haoyuan, Smirnov, Ilia, Khalil, Elias B., Abdulhai, Baher
Format:	Conference Proceeding
Language:	English
Published:	IEEE 19.03.2023
Subjects:	Convergence Deep reinforcement learning Hands Hyperparameter optimization Q-learning Ramp metering - Reinforcement learning - Hyper-parameter tuning Traffic control Transportation Tuning Urban areas
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The demand for transportation increases as the population of a city grows, and significant expansion is not conceivable because of spatial, financial, and environmental limitations. As a result, improving infrastructure efficiency is becoming increasingly critical. Ramp metering with deep reinforcement learning (RL) is a method to tackle this problem. However, fine-tuning RL hyperparameters for RM is yet to be explored in the literature, potentially leaving performance improvements on the table. In this paper, the Sequential Model-based Algorithm Configuration (SMAC) method is used to fine-tune the values of two essential hyperparameters for the deep reinforcement learning ramp metering model, the discount factor and the decay of the explore/exploit ratio. Around 350 experiments with different configurations were run with PySMAC (a python interface to the hyperparameter optimization tool SMAC) and compared to Random search as a baseline. It is found that the best reward discount factor reflects that the RL agent should focus on immediate rewards and not pay much attention to future rewards. On the other hand, the selected value for the exploration ratio decay rate shows that the RL agent should prefer to decrease the exploration rate early. Both random search and SMAC show the same performance improvement of 19% in output flow from the freeway bottleneck. However, SMAC results show earlier convergence. This performance exceeds the baseline ramp metering techniques of ALINEA and Deep Reinforcement Learning (DRL) without hyperparameter fine-tuning.
DOI:	10.1109/SM57895.2023.10112246