Cardiff University | Prifysgol Caerdydd ORCA
Online Research @ Cardiff 
WelshClear Cookie - decide language by browser settings

Hierarchical reinforcement learning with universal policies for multi-step robotic manipulation

Yang, Xintong ORCID: https://orcid.org/0000-0002-7612-614X, Ji, Ze ORCID: https://orcid.org/0000-0002-8968-9902, Wu, Jing ORCID: https://orcid.org/0000-0001-5123-9861, Lai, Yu-kun ORCID: https://orcid.org/0000-0002-2094-5680, Wei, Changyun, Liu, Guoliang and Setchi, Rossitza ORCID: https://orcid.org/0000-0002-7207-6544 2022. Hierarchical reinforcement learning with universal policies for multi-step robotic manipulation. IEEE Transactions on Neural Networks and Learning Systems 33 (9) , pp. 4727-4741. 10.1109/TNNLS.2021.3059912

[thumbnail of Ji Z - Hierarchical Reinforcement Learning with Universal ....pdf] PDF - Accepted Post-Print Version
Download (4MB)
[thumbnail of Ji Z -Supplementary.pdf] PDF - Supplemental Material
Download (4MB)

Abstract

Multistep tasks, such as block stacking or parts (dis)assembly, are complex for autonomous robotic manipulation. A robotic system for such tasks would need to hierarchically combine motion control at a lower level and symbolic planning at a higher level. Recently, reinforcement learning (RL)-based methods have been shown to handle robotic motion control with better flexibility and generalizability. However, these methods have limited capability to handle such complex tasks involving planning and control with many intermediate steps over a long time horizon. First, current RL systems cannot achieve varied outcomes by planning over intermediate steps (e.g., stacking blocks in different orders). Second, the exploration efficiency of learning multistep tasks is low, especially when rewards are sparse. To address these limitations, we develop a unified hierarchical reinforcement learning framework, named Universal Option Framework (UOF), to enable the agent to learn varied outcomes in multistep tasks. To improve learning efficiency, we train both symbolic planning and kinematic control policies in parallel, aided by two proposed techniques: 1) an auto-adjusting exploration strategy (AAES) at the low level to stabilize the parallel training, and 2) abstract demonstrations at the high level to accelerate convergence. To evaluate its performance, we performed experiments on various multistep block-stacking tasks with blocks of different shapes and combinations and with different degrees of freedom for robot control. The results demonstrate that our method can accomplish multistep manipulation tasks more efficiently and stably, and with significantly less memory consumption.

Item Type: Article
Date Type: Publication
Status: Published
Schools: Engineering
Publisher: IEEE
ISSN: 2162-237X
Date of First Compliant Deposit: 18 February 2021
Date of Acceptance: 11 February 2021
Last Modified: 28 Mar 2024 12:14
URI: https://orca.cardiff.ac.uk/id/eprint/138649

Citation Data

Cited 11 times in Scopus. View in Scopus. Powered By Scopus® Data

Actions (repository staff only)

Edit Item Edit Item

Downloads

Downloads per month over past year

View more statistics