Hierarchical Deep Reinforcement Learning for Multi-robot Cooperation in Partially Observable Environment

Many real-world applications require multi-robot coordination in partially-observable domains such as package delivery, search, and rescue. One typical way to address partial observability is to enable information sharing among robots via dedicated communication protocols. However, designing commu-n...

Full description

Saved in:
Bibliographic Details
Published in:2021 IEEE Third International Conference on Cognitive Machine Intelligence (CogMI) pp. 272 - 281
Main Authors: Liang, Zhixuan, Cao, Jiannong, Lin, Wanyu, Chen, Jinlin, Xu, Huafeng
Format: Conference Proceeding
Language:English
Published: IEEE 01.12.2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Many real-world applications require multi-robot coordination in partially-observable domains such as package delivery, search, and rescue. One typical way to address partial observability is to enable information sharing among robots via dedicated communication protocols. However, designing commu-nication protocols is difficult due to the dynamic environments and complex interactions among robots. Existing broadcasting-based approaches are communication-inefficient, and they usually introduce redundant information that might impair the learning process and action selection. In this paper, we propose a hierar-chical reinforcement learning approach, called COM-cooperative HRL for multi-robot cooperation in a partially observable en-vironment. Specifically, COM-cooperative HRL addresses the above gaps by introducing a partner selector to learn high-level communication strategy using short-term task-execution rewards. Besides, a low-level controller is trained to select actions based on shared information and individual observation. Extensive empirical results show a faster convergence rate and higher team performance over alternative baselines. Our approach can not only improve learning efficiency but also be adaptive to large-scale multi-robot systems.
DOI:10.1109/CogMI52975.2021.00042