Optimal scheduling of shared autonomous electric vehicles with multi-agent reinforcement learning: A MAPPO-based approach
The advent of shared autonomous electric vehicles (SAEVs) is expected to decarbonize the transport sector and enable large-scale mobility on demand. Considering the highly dynamic nature of operation environments, it is appealing to investigate the optimal schedule decisions, comprising order-matchi...
Uloženo v:
| Vydáno v: | Neurocomputing (Amsterdam) Ročník 622; s. 129343 |
|---|---|
| Hlavní autoři: | , , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
14.03.2025
|
| Témata: | |
| ISSN: | 0925-2312 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | The advent of shared autonomous electric vehicles (SAEVs) is expected to decarbonize the transport sector and enable large-scale mobility on demand. Considering the highly dynamic nature of operation environments, it is appealing to investigate the optimal schedule decisions, comprising order-matching, relocation, and charge/ discharge with the V2G technology. Recent literature has investigated these schedule decisions with deep reinforcement learning (DRL) independently, neglecting the collaborative interactions among multiple autonomous vehicles. To this end, the fully dynamic schedule task is integrally described as a partially observable Markov decision process (POMDP) under the constraints of time-dependent demand, spatial-temporal varying electricity price, available vehicles and chargers. Then, we design a robust and extensible framework that combines multi-agent proximal policy optimization (MAPPO) and a binary linear programming (BLP) model to solve the optimization problem, in which SAEVs are regarded as agents to make decisions according to their local observations. Additionally, observation normalization and action masking are utilized to improve training efficiency. Numerical experiments are conducted with New York City yellow taxi data to simulate the real-time SAEV operating environment. These results demonstrate that the proposed MAPPO approach is expected to enhance the operators' revenue while improving the customer service rate compared to benchmark algorithms.
•Use a partially observable Markov decision process to model SAEV schedule problem.•A MAPPO-based framework is proposed to solve the optimization problem.•An attention-based mechanism is developed to address colossal state space.•Experiments are conducted to demonstrate the validity of MAPPO-based algorithm. |
|---|---|
| ISSN: | 0925-2312 |
| DOI: | 10.1016/j.neucom.2025.129343 |