Bibliographic Details
| Title: |
Self-supervised reinforcement learning for multi-step object manipulation skills. |
| Authors: |
Wang, Jiaqi, Chen, Chuxin, Liu, Jingwei, Du, Guanglong, Zhu, Xiaojun, Guan, Quanlong, Qiu, Xiaojian |
| Source: |
Industrial Robot; 2025, Vol. 52 Issue 6, p853-865, 13p |
| Abstract: |
Purpose: The purpose of this study is to address the challenge of object manipulation in scenarios where the target is not explicitly defined, requiring robots to engage in efficient planning to determine the sequence of actions for picking, placing and positioning objects. The aim is to develop a multistep skill learning method that integrates perception with a set of primitive actions, including a novel action of orienting, to enable robots to perform complex tasks that require multistep planning and interaction with various objects in cluttered and unstructured environments. Design/methodology/approach: To achieve the purpose, the authors propose a pipeline that decomposes the object manipulation task into three independent stages, each trained end-to-end with raw visual inputs using off-policy reinforcement learning algorithms. The Q-learning algorithm is used to simultaneously train three fully convolutional neural networks for each primitive action – grasping, pushing, placing and orienting – from scratch. The framework is designed to be modular, allowing for easy extension to multistep manipulation tasks. Findings: The findings demonstrate that robots can learn complex behaviors through both simulated and real-world experiments. In simulation, the robot achieved an efficient block-stacking success rate of up to 98% during testing. When transferring the model to a real universal robots UR3 (UR3) robot using effective domain randomization, the robot achieved a 100% completion rate with convex objects and a 92% completion rate with various objects not seen during training. Originality/value: We develop a novel multistep skill learning method that integrates perception with multiple primitive actions, including a new action of orienting, and the use of off-policy reinforcement learning algorithms for end-to-end training. The modular design of the framework allows for easy extension to more complex manipulation tasks, and the encouraging results in both simulated and real-world experiments demonstrate significant improvements over current long-term planning methods. [ABSTRACT FROM AUTHOR] |
|
Copyright of Industrial Robot is the property of Emerald Publishing Limited and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
| Database: |
Complementary Index |