Lyapunov-Informed Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks

Multi-Agent Reinforcement Learning (MARL) has shown great potential in solving complex tasks. Despite great success, low training efficiency remains a pervasive and long-standing challenge in MARL. To tackle this issue, it is promising to leverage prior knowledge or environmental properties to infor...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on automation science and engineering Ročník 22; s. 20263 - 20279
Hlavní autoři: Feng, Pu, Shi, Rongye, Wang, Size, Wu, Qizhen, Yu, Xin, Wu, Wenjun
Médium: Journal Article
Jazyk:angličtina
Vydáno: IEEE 2025
Témata:
ISSN:1545-5955, 1558-3783
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Multi-Agent Reinforcement Learning (MARL) has shown great potential in solving complex tasks. Despite great success, low training efficiency remains a pervasive and long-standing challenge in MARL. To tackle this issue, it is promising to leverage prior knowledge or environmental properties to inform and improve the MARL. We notice that many multi-agent tasks specify certain goal states where special rewards are granted, guiding agents to achieve the goal. Inspired by the theory of Lyapunov stability, an intuitive optimal policy to the tasks should be able to asymptotically converge to the goal states from any initial, making the goal states stable equilibria. Focusing on this type of tasks, we introduce the concept of Lyapunov Markov game (LMG), a new subclass of the cooperative Markov game, featuring a set of goal states and goal-oriented reward function. We then provide a theoretical bound on scaled value distance as a necessary condition to obtain a stable suboptimal policy in LMG. Motivated by this insight, we further propose the Lyapunov-informed MARL, which leverages a newly-designed Lyapunov-informed reward. Theoretical work is conducted to show that the Lyapunov-informed MARL enjoys a broadened bound, facilitating the training process to find a stable suboptimal policy more easily and then converge to an optimal policy more efficiently. Extensive experiments and real-world multi-robot implementations are conducted to show the superior performance of the proposed approach over advanced baseline models. Note to Practitioners -This paper addresses the training efficiency problem in multi-robot cooperation using MARL. Coordinating multi-robot systems, such as uncrewed ground vehicles and drones, to accomplish tasks efficiently is challenging. While MARL offers potential solutions, its slow training process limits practical applications. Leveraging human expert knowledge can significantly improve this issue compared to relying solely on environment interaction. This paper integrates Lyapunov stability theory with MARL to discover suboptimal stable policies as stepping stones towards the optimal policy. We first find a feasible policy and then refine it to achieve better performance (e.g., shorter paths or lower resource consumption). We provide a rigorous theoretical analysis of the proposed method, including proofs of the suboptimal policy bounds and how the embedded Lyapunov method extends these bounds. We provide a general reward design technique for reinforcement learning applications, and this Lyapunov-informed method effectively enhances MARL performance. The proposed method is validated through both simulation and real-world experiments.
ISSN:1545-5955
1558-3783
DOI:10.1109/TASE.2025.3603024