Reinforcement learning control with function approximation via multivariate simplex splines

Summary In the field of optimal control for continuous nonlinear systems, function approximation methods are often employed to overcome the curse of dimensionality. Compared to other global function approximators like neural networks, multivariate splines can be easily evaluated and adapted on a loc...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:International journal of adaptive control and signal processing Ročník 39; číslo 10; s. 2040 - 2061
Hlavní autoři: Feng, Yiting, Zhou, Ye, Ho, Hann Woei, Mat Isa, Nor Ashidi
Médium: Journal Article
Jazyk:angličtina
Vydáno: Bognor Regis Wiley Subscription Services, Inc 01.10.2025
Témata:
ISSN:0890-6327, 1099-1115
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Summary In the field of optimal control for continuous nonlinear systems, function approximation methods are often employed to overcome the curse of dimensionality. Compared to other global function approximators like neural networks, multivariate splines can be easily evaluated and adapted on a local basis with linearity in the parameters. In this work, a multivariate spline based reinforcement learning (RL) strategy is proposed for solving the continuous‐time nonlinear control problem. Based on the classic value iteration method, multivariate splines are integrated into RL algorithms to approximate continuous value functions and policy functions from discrete action and value samples. Hence, the determined splines with updated coefficients can be utilized in continuous control of nonlinear systems. In the simulation experiment, the performance of the spline‐based RL control is evaluated in controlling an under‐actuated inverted pendulum. The proposed method is compared with the value iteration based discrete control strategy and the neural network based continuous control strategy. The simulation results indicate that the proposed method based on multivariate splines has better control performance with less state oscillations, energy consumption and convergence time in comparison with discrete value iteration and neural network based RL, and the adoption of simplex splines improves the function approximation efficiency with less computation time than neural network optimization.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0890-6327
1099-1115
DOI:10.1002/acs.3579