Personalized Car-Following Strategy Based on Convolutional Variational Autoencoder and Mildly Conservative Q-Learning

With the development of intelligent vehicles, the importance of imitating, learning, and optimizing human driving strategies is increasingly recognized. Offline reinforcement learning offers a promising approach, capable of directly extracting driving strategies from diverse real-world datasets, par...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on vehicular technology Ročník 74; číslo 9; s. 13372 - 13386
Hlavní autoři: He, Rui, Chang, Yupeng, Zhang, Sumin, Meng, Zhiwei, Li, Wenfeng
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.09.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:0018-9545, 1939-9359
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:With the development of intelligent vehicles, the importance of imitating, learning, and optimizing human driving strategies is increasingly recognized. Offline reinforcement learning offers a promising approach, capable of directly extracting driving strategies from diverse real-world datasets, particularly in critical vehicle safety systems, where it provides advantages in interpretability and transferability. We propose a personalized car-following strategy framework based on a one-dimensional convolutional variational autoencoder (conVAE) and mildly conservative Q-learning (MCQ). The conVAE extracts subtle feature differences among drivers, classifying their driving styles as the foundation for strategy imitation. Additionally, an uncertainty prediction model for the preceding vehicle's behavior is established. By integrating the conVAE model, a personalized reward function is designed, accounting for both safety and comfort and considering the long-term impact of the preceding vehicle's behavior to imitate human-like driving strategies. Finally, the MCQ method optimizes the personalized conVAE strategy, allowing for appropriate exploration beyond the dataset, improving model generalization, and surpassing human performance. Tests on the HighD dataset demonstrate the proposed strategy's superiority in imitating human driver behavior in gap selection, speed variation, and acceleration adjustments. Validation on real-world datasets further confirms its generalization ability and performance, comparable to or better than human drivers.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0018-9545
1939-9359
DOI:10.1109/TVT.2025.3559314