Personalized Car-Following Strategy Based on Convolutional Variational Autoencoder and Mildly Conservative Q-Learning
With the development of intelligent vehicles, the importance of imitating, learning, and optimizing human driving strategies is increasingly recognized. Offline reinforcement learning offers a promising approach, capable of directly extracting driving strategies from diverse real-world datasets, par...
Gespeichert in:
| Veröffentlicht in: | IEEE transactions on vehicular technology Jg. 74; H. 9; S. 13372 - 13386 |
|---|---|
| Hauptverfasser: | , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
IEEE
01.09.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Schlagworte: | |
| ISSN: | 0018-9545, 1939-9359 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | With the development of intelligent vehicles, the importance of imitating, learning, and optimizing human driving strategies is increasingly recognized. Offline reinforcement learning offers a promising approach, capable of directly extracting driving strategies from diverse real-world datasets, particularly in critical vehicle safety systems, where it provides advantages in interpretability and transferability. We propose a personalized car-following strategy framework based on a one-dimensional convolutional variational autoencoder (conVAE) and mildly conservative Q-learning (MCQ). The conVAE extracts subtle feature differences among drivers, classifying their driving styles as the foundation for strategy imitation. Additionally, an uncertainty prediction model for the preceding vehicle's behavior is established. By integrating the conVAE model, a personalized reward function is designed, accounting for both safety and comfort and considering the long-term impact of the preceding vehicle's behavior to imitate human-like driving strategies. Finally, the MCQ method optimizes the personalized conVAE strategy, allowing for appropriate exploration beyond the dataset, improving model generalization, and surpassing human performance. Tests on the HighD dataset demonstrate the proposed strategy's superiority in imitating human driver behavior in gap selection, speed variation, and acceleration adjustments. Validation on real-world datasets further confirms its generalization ability and performance, comparable to or better than human drivers. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0018-9545 1939-9359 |
| DOI: | 10.1109/TVT.2025.3559314 |