Hierarchical Deep Reinforcement Learning for Multi-robot Cooperation in Partially Observable Environment

Many real-world applications require multi-robot coordination in partially-observable domains such as package delivery, search, and rescue. One typical way to address partial observability is to enable information sharing among robots via dedicated communication protocols. However, designing commu-n...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2021 IEEE Third International Conference on Cognitive Machine Intelligence (CogMI) s. 272 - 281
Hlavní autoři: Liang, Zhixuan, Cao, Jiannong, Lin, Wanyu, Chen, Jinlin, Xu, Huafeng
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.12.2021
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Many real-world applications require multi-robot coordination in partially-observable domains such as package delivery, search, and rescue. One typical way to address partial observability is to enable information sharing among robots via dedicated communication protocols. However, designing commu-nication protocols is difficult due to the dynamic environments and complex interactions among robots. Existing broadcasting-based approaches are communication-inefficient, and they usually introduce redundant information that might impair the learning process and action selection. In this paper, we propose a hierar-chical reinforcement learning approach, called COM-cooperative HRL for multi-robot cooperation in a partially observable en-vironment. Specifically, COM-cooperative HRL addresses the above gaps by introducing a partner selector to learn high-level communication strategy using short-term task-execution rewards. Besides, a low-level controller is trained to select actions based on shared information and individual observation. Extensive empirical results show a faster convergence rate and higher team performance over alternative baselines. Our approach can not only improve learning efficiency but also be adaptive to large-scale multi-robot systems.
DOI:10.1109/CogMI52975.2021.00042