DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics

We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally ph...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters Jg. 8; H. 7; S. 3956 - 3963
Hauptverfasser:	Kapelyukh, Ivan, Vosylius, Vitalis, Johns, Edward
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Piscataway IEEE 01.07.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Schlagworte:	AI-based methods Big Data in robotics and automation deep learning in grasping and manipulation Image segmentation Pipelines Predictive models Robotics Robots Task analysis Training Visualization Webs
ISSN:	2377-3766, 2377-3766
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that goal image. We show that this is possible zero-shot using DALL-E, without needing any further example arrangements, data collection, or training. DALL-E-Bot is fully autonomous and is not restricted to a pre-defined set of objects or scenes, thanks to DALL-E's web-scale pre-training. Encouraging real-world results, with both human studies and objective metrics, show that integrating web-scale diffusion models into robotics pipelines is a promising direction for scalable, unsupervised robot learning.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2023.3272516