Binarized P-Network: Deep Reinforcement Learning of Robot Control from Raw Images on FPGA

This letter explores a deep reinforcement learning (DRL) approach for designing image-based control for edge robots to be implemented on Field Programmable Gate Arrays (FPGAs). Although FPGAs are more power-efficient than CPUs and GPUs, a typical DRL method cannot be applied since they are composed...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE robotics and automation letters Ročník 6; číslo 4; s. 8545 - 8552
Hlavní autoři:	Kadokawa, Yuki, Tsurumine, Yoshihisa, Matsubara, Takamitsu
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Piscataway IEEE 01.10.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:	Algorithms Approximation Artificial neural networks Control stability Deep learning embedded systems for robotic and automation Field programmable gate arrays Function approximation hardware-software integration in robotics Low speed Machine learning Mathematical analysis Optical tracking Real-time systems Reinforcement learning Robot control Robots Servers Task analysis Visual tasks
ISSN:	2377-3766, 2377-3766
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	This letter explores a deep reinforcement learning (DRL) approach for designing image-based control for edge robots to be implemented on Field Programmable Gate Arrays (FPGAs). Although FPGAs are more power-efficient than CPUs and GPUs, a typical DRL method cannot be applied since they are composed of many Logic Blocks (LBs) for high-speed logical operations but low-speed real-number operations. To cope with this problem, we propose a novel DRL algorithm called Binarized P-Network (BPN), which learns image-input control policies using Binarized Convolutional Neural Networks (BCNNs). To alleviate the instability of reinforcement learning caused by a BCNN with low function approximation accuracy, our BPN adopts a robust value update scheme called Conservative Value Iteration, which is tolerant of function approximation errors. We confirmed the BPN's effectiveness through applications to a visual tracking task in simulation and real-robot experiments with FPGA.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2021.3111416