CVAE-based Far-sighted Intention Inference for Opponent Modeling in Multi-agent Reinforcement Learning

Most interactive environments are non-stationary for agents, as the behaviors of their opponents continually change, which can impair the performance of reinforcement learning algorithms. This impairment can be alleviated by modeling opponents to predict their future movements. To predict more preci...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Chinese Control Conference S. 5847 - 5851
Hauptverfasser:	Pei, Yu, Xu, XiaoPeng, Liu, Zhong, Wang, Kuo, Zhu, Li, Wang, Dong
Format:	Tagungsbericht
Sprache:	Englisch
Veröffentlicht:	Technical Committee on Control Theory, Chinese Association of Automation 28.07.2024
Schlagworte:	Analytical models Attention mechanisms conditional variational autoencoder (CVAE) Decision making Games intention inference Multi-agent reinforcement learning opponent modeling Prediction algorithms Predictive models Reinforcement learning
ISSN:	1934-1768
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Most interactive environments are non-stationary for agents, as the behaviors of their opponents continually change, which can impair the performance of reinforcement learning algorithms. This impairment can be alleviated by modeling opponents to predict their future movements. To predict more precisely and further into the future compared to current opponent modeling approaches, we developed a CVAE-based Far-sighted Intention Inference method (CFI2), including a Trajectory Prediction Module (TPM) and a Trajectory Analysis Module (TAM). TPM synthesizes complex interactions between agents using an attention mechanism and achieves robust far-sighted prediction with a conditional variational autoencoder (CVAE). TAM enables agents to analyze trajectories by assigning attention to the predicted movements of their opponents, corresponding to their impacts on the future. We conducted experiments in Drone Game where CFI2 achieves significantly higher rewards more rapidly than baseline methods. It is proven that agents can make better decisions by incorporating long-term predictions, just like the decision-making process of humans.
ISSN:	1934-1768
DOI:	10.23919/CCC63176.2024.10662404