Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks

Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. It is nontrivial to manually design a robot controller that combines these modalities, which have very different characteristics. While deep reinforcement learning has shown success in learnin...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on robotics Vol. 36; no. 3; pp. 582 - 596
Main Authors:	Lee, Michelle A., Zhu, Yuke, Zachares, Peter, Tan, Matthew, Srinivasan, Krishnan, Savarese, Silvio, Fei-Fei, Li, Garg, Animesh, Bohg, Jeannette
Format:	Journal Article
Language:	English
Published:	New York IEEE 01.06.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Clearances Computer simulation Control systems design Deep learning in robotics and automation Haptic interfaces Machine learning perception for grasping and manipulation Reinforcement learning Representations Robot sensing systems Robots sensor fusion sensor-based control Solid modeling Task analysis Visualization
ISSN:	1552-3098, 1941-0468
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Contact-rich manipulation tasks in unstructured environments often require both haptic and visual feedback. It is nontrivial to manually design a robot controller that combines these modalities, which have very different characteristics. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to train directly on real robots due to sample complexity. In this article, we use self-supervision to learn a compact and multimodal representation of our sensory inputs, which can then be used to improve the sample efficiency of our policy learning. Evaluating our method on a peg insertion task, we show that it generalizes over varying geometries, configurations, and clearances, while being robust to external perturbations. We also systematically study different self-supervised learning objectives and representation learning architectures. Results are presented in simulation and on a physical robot.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1552-3098 1941-0468
DOI:	10.1109/TRO.2019.2959445