Optimal scheduling of shared autonomous electric vehicles with multi-agent reinforcement learning: A MAPPO-based approach

The advent of shared autonomous electric vehicles (SAEVs) is expected to decarbonize the transport sector and enable large-scale mobility on demand. Considering the highly dynamic nature of operation environments, it is appealing to investigate the optimal schedule decisions, comprising order-matchi...

Full description

Saved in:

Bibliographic Details
Published in:	Neurocomputing (Amsterdam) Vol. 622; p. 129343
Main Authors:	Tian, Jingjing, Jia, Hongfei, Wang, Guanfeng, Huang, Qiuyang, Wu, Ruiyi, Gao, Heyao, Liu, Chao
Format:	Journal Article
Language:	English
Published:	Elsevier B.V 14.03.2025
Subjects:	Binary linear programming Multi-agent proximal policy optimization Optimal scheduling Shared autonomous electric vehicles Multi-agent proximal policy optimization Shared autonomous electric vehicles Binary linear programming Optimal scheduling
ISSN:	0925-2312
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The advent of shared autonomous electric vehicles (SAEVs) is expected to decarbonize the transport sector and enable large-scale mobility on demand. Considering the highly dynamic nature of operation environments, it is appealing to investigate the optimal schedule decisions, comprising order-matching, relocation, and charge/ discharge with the V2G technology. Recent literature has investigated these schedule decisions with deep reinforcement learning (DRL) independently, neglecting the collaborative interactions among multiple autonomous vehicles. To this end, the fully dynamic schedule task is integrally described as a partially observable Markov decision process (POMDP) under the constraints of time-dependent demand, spatial-temporal varying electricity price, available vehicles and chargers. Then, we design a robust and extensible framework that combines multi-agent proximal policy optimization (MAPPO) and a binary linear programming (BLP) model to solve the optimization problem, in which SAEVs are regarded as agents to make decisions according to their local observations. Additionally, observation normalization and action masking are utilized to improve training efficiency. Numerical experiments are conducted with New York City yellow taxi data to simulate the real-time SAEV operating environment. These results demonstrate that the proposed MAPPO approach is expected to enhance the operators' revenue while improving the customer service rate compared to benchmark algorithms. •Use a partially observable Markov decision process to model SAEV schedule problem.•A MAPPO-based framework is proposed to solve the optimization problem.•An attention-based mechanism is developed to address colossal state space.•Experiments are conducted to demonstrate the validity of MAPPO-based algorithm.
ISSN:	0925-2312
DOI:	10.1016/j.neucom.2025.129343