DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

Saved in:
Bibliographic Details
Title: DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments
Authors: Ji Ma, Hongming Dai, Yao Mu, Pengying Wu, Hao Wang, Xiaowei Chi, Yang Fei, Shanghang Zhang, Chang Liu
Source: IEEE Robotics and Automation Letters. 9:7389-7396
Publication Status: Preprint
Publisher Information: Institute of Electrical and Electronics Engineers (IEEE), 2024.
Publication Year: 2024
Subject Terms: FOS: Computer and information sciences, Computer Science - Robotics, Data sets for robot learning, Computer Vision and Pattern Recognition (cs.CV), 11. Sustainability, Computer Science - Computer Vision and Pattern Recognition, Embodied AI, Zero-shot object navigation, Semantic scene understanding, Robotics (cs.RO), Data sets for robotic vision
Description: Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object attribute diversity, and scene texts, thus exhibiting noticeable discrepancies from real-world situations. To address these issues, we propose a Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE) that comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios. Specifically, DOZE scenes feature multiple moving humanoid obstacles, a wide array of open-vocabulary objects, diverse distinct-attribute objects, and valuable textual hints. Besides, different from existing datasets that only provide collision checking between the agent and static obstacles, we enhance DOZE by integrating capabilities for detecting collisions between the agent and moving obstacles. This novel functionality enables the evaluation of the agents' collision avoidance abilities in dynamic environments. We test four representative ZSON methods on DOZE, revealing substantial room for improvement in existing approaches concerning navigation efficiency, safety, and object recognition accuracy. Our dataset can be found at https://DOZE-Dataset.github.io/.
This version of the paper has been accepted for publication in IEEE Robotics and Automation Letters (RA-L)
Document Type: Article
ISSN: 2377-3774
DOI: 10.1109/lra.2024.3426381
DOI: 10.48550/arxiv.2402.19007
Access URL: http://arxiv.org/abs/2402.19007
Rights: IEEE Copyright
arXiv Non-Exclusive Distribution
Accession Number: edsair.doi.dedup.....29d1205d8a44407ffc62e203c19d7fbe
Database: OpenAIRE
Description
Abstract:Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object attribute diversity, and scene texts, thus exhibiting noticeable discrepancies from real-world situations. To address these issues, we propose a Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE) that comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios. Specifically, DOZE scenes feature multiple moving humanoid obstacles, a wide array of open-vocabulary objects, diverse distinct-attribute objects, and valuable textual hints. Besides, different from existing datasets that only provide collision checking between the agent and static obstacles, we enhance DOZE by integrating capabilities for detecting collisions between the agent and moving obstacles. This novel functionality enables the evaluation of the agents' collision avoidance abilities in dynamic environments. We test four representative ZSON methods on DOZE, revealing substantial room for improvement in existing approaches concerning navigation efficiency, safety, and object recognition accuracy. Our dataset can be found at https://DOZE-Dataset.github.io/.<br />This version of the paper has been accepted for publication in IEEE Robotics and Automation Letters (RA-L)
ISSN:23773774
DOI:10.1109/lra.2024.3426381