Direct heuristic dynamic programming based on an improved PID neural network

In this paper, an improved PID-neural network （IPIDNN） structure is proposed and applied to the critic and action networks of direct heuristic dynamic programming （DHDP）. As one of online learning algorithm of approximate dynamic programming （ADP）, DHDP has demonstrated its applicability to large st...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Journal of control theory and applications Ročník 10; číslo 4; s. 497 - 503
Hlavní autori:	Sun, Jian, Liu, Feng, Si, Jennie, Mei, Shengwei
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Heidelberg South China University of Technology and Academy of Mathematics and Systems Science, CAS 01.11.2012 Department of Electrical Engineering, Tsinghua University, Beijing 100084, China%Department of Electrical Engineering, Arizona State University, Tempe AZ 85287-5706, USA
Predmet:	Algorithms Brief Paper Complexity Computational Intelligence Control Control and Systems Theory Control systems Dynamic programming Engineering Heuristic Learning Mathematical analysis Mechatronics Networks Neural networks Optimization PID神经网络 Robotics Systems Theory 启发式动态规划在线学习算法基础最优性方程状态反馈系统验证输出反馈控制 Approximate dynamic programming (ADP) Direct heuristic dynamic programming (DHDP) Improved PID neural network (IPIDNN)
ISSN:	1672-6340, 1993-0623
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	In this paper, an improved PID-neural network （IPIDNN） structure is proposed and applied to the critic and action networks of direct heuristic dynamic programming （DHDP）. As one of online learning algorithm of approximate dynamic programming （ADP）, DHDP has demonstrated its applicability to large state and control problems. Theoretically, the DHDP algorithm requires access to full state feedback in order to obtain solutions to the Bellman optimality equation. Unfortunately, it is not always possible to access all the states in a real system. This paper proposes a solution by suggesting an IPIDNN configuration to construct the critic and action networks to achieve an output feedback control. Since this structure can estimate the integrals and derivatives of measurable outputs, more system states are utilized and thus better control performance are expected. Compared with traditional PIDNN, this configuration is flexible and easy to expand. Based on this structure, a gradient decent algorithm for this IPIDNN-based DHDP is presented. Convergence issues are addressed within a single learning time step and for the entire learning process. Some important insights are provided to guide the implementation of the algorithm. The proposed learning controller has been applied to a cart-pole system to validate the effectiveness of the structure and the algorithm.
Bibliografia:	Approximate dynamic programming （ADP）; Direct heuristic dynamic programming （DHDP）; ImprovedPID neural network （IPIDNN） In this paper, an improved PID-neural network （IPIDNN） structure is proposed and applied to the critic and action networks of direct heuristic dynamic programming （DHDP）. As one of online learning algorithm of approximate dynamic programming （ADP）, DHDP has demonstrated its applicability to large state and control problems. Theoretically, the DHDP algorithm requires access to full state feedback in order to obtain solutions to the Bellman optimality equation. Unfortunately, it is not always possible to access all the states in a real system. This paper proposes a solution by suggesting an IPIDNN configuration to construct the critic and action networks to achieve an output feedback control. Since this structure can estimate the integrals and derivatives of measurable outputs, more system states are utilized and thus better control performance are expected. Compared with traditional PIDNN, this configuration is flexible and easy to expand. Based on this structure, a gradient decent algorithm for this IPIDNN-based DHDP is presented. Convergence issues are addressed within a single learning time step and for the entire learning process. Some important insights are provided to guide the implementation of the algorithm. The proposed learning controller has been applied to a cart-pole system to validate the effectiveness of the structure and the algorithm. 44-1600/TP ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1672-6340 1993-0623
DOI:	10.1007/s11768-012-0112-0