An Inexact Sequential Quadratic Programming Method for Learning and Control of Recurrent Neural Networks

This article considers the two-stage approach to solving a partially observable Markov decision process (POMDP): the identification stage and the (optimal) control stage. We present an inexact sequential quadratic programming framework for recurrent neural network learning (iSQPRL) for solving the i...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE transaction on neural networks and learning systems Ročník 36; číslo 2; s. 2762 - 2776
Hlavní autori:	Adeoye, Adeyemi D., Bemporad, Alberto
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	United States IEEE 01.02.2025
Predmet:	Gauss–Newton methods markov decision processes Neural networks numerical optimization Optimization Prediction algorithms Process control Quadratic programming Recurrent neural networks recurrent neural networks (RNNs) reinforcement learning (RL) sequential quadratic programming (SQP) Training
ISSN:	2162-237X, 2162-2388, 2162-2388
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	This article considers the two-stage approach to solving a partially observable Markov decision process (POMDP): the identification stage and the (optimal) control stage. We present an inexact sequential quadratic programming framework for recurrent neural network learning (iSQPRL) for solving the identification stage of the POMDP, in which the true system is approximated by a recurrent neural network (RNN) with dynamically consistent overshooting (DCRNN). We formulate the learning problem as a constrained optimization problem and study the quadratic programming (QP) subproblem with a convergence analysis under a restarted Krylov-subspace iterative scheme that implicitly exploits the structure of the associated Karush-Kuhn-Tucker (KKT) subsystem. In the control stage, where a feedforward neural network (FNN) controller is designed on top of the RNN model, we adapt a generalized Gauss-Newton (GGN) algorithm that exploits useful approximations to the curvature terms of the training data and selects its mini-batch step size using a known property of some regularization function. Simulation results are provided to demonstrate the effectiveness of our approach.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2024.3354855