Binarized P-Network: Deep Reinforcement Learning of Robot Control from Raw Images on FPGA

This letter explores a deep reinforcement learning (DRL) approach for designing image-based control for edge robots to be implemented on Field Programmable Gate Arrays (FPGAs). Although FPGAs are more power-efficient than CPUs and GPUs, a typical DRL method cannot be applied since they are composed...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE robotics and automation letters Vol. 6; no. 4; pp. 8545 - 8552
Main Authors:	Kadokawa, Yuki, Tsurumine, Yoshihisa, Matsubara, Takamitsu
Format:	Journal Article
Language:	English
Published:	Piscataway IEEE 01.10.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Approximation Artificial neural networks Control stability Deep learning embedded systems for robotic and automation Field programmable gate arrays Function approximation hardware-software integration in robotics Low speed Machine learning Mathematical analysis Optical tracking Real-time systems Reinforcement learning Robot control Robots Servers Task analysis Visual tasks
ISSN:	2377-3766, 2377-3766
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This letter explores a deep reinforcement learning (DRL) approach for designing image-based control for edge robots to be implemented on Field Programmable Gate Arrays (FPGAs). Although FPGAs are more power-efficient than CPUs and GPUs, a typical DRL method cannot be applied since they are composed of many Logic Blocks (LBs) for high-speed logical operations but low-speed real-number operations. To cope with this problem, we propose a novel DRL algorithm called Binarized P-Network (BPN), which learns image-input control policies using Binarized Convolutional Neural Networks (BCNNs). To alleviate the instability of reinforcement learning caused by a BCNN with low function approximation accuracy, our BPN adopts a robust value update scheme called Conservative Value Iteration, which is tolerant of function approximation errors. We confirmed the BPN's effectiveness through applications to a visual tracking task in simulation and real-robot experiments with FPGA.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2021.3111416