Efficient approximate dynamic programming based on design and analysis of computer experiments for infinite-horizon optimization

•A sequential sampling algorithm for the infinite-horizon approximate dynamic programming is proposed.•A new stopping criterion to effectively identify an optimally equivalent value function is given.•The extrapolation issue of approximate value function built by MARS is explored and discussed. The...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Computers & operations research Ročník 124; s. 105032
Hlavní autoři: Chen, Ying, Liu, Feng, Rosenberger, Jay M., Chen, Victoria C.P., Kulvanitchaiyanunt, Asama, Zhou, Yuan
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York Elsevier Ltd 01.12.2020
Pergamon Press Inc
Témata:
ISSN:0305-0548, 0305-0548
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:•A sequential sampling algorithm for the infinite-horizon approximate dynamic programming is proposed.•A new stopping criterion to effectively identify an optimally equivalent value function is given.•The extrapolation issue of approximate value function built by MARS is explored and discussed. The approximate dynamic programming (ADP) method based on the design and analysis of computer experiments (DACE) approach has been demonstrated as an effective method to solve multistage decision-making problems in the literature. However, this method is still not efficient for infinite-horizon optimization considering the required large volume of sampling in the state space and high-quality value function identification. Therefore, we propose a sequential sampling algorithm and embed it into a DACE-based ADP method to obtain a high-quality value function approximation. Considering the limitations of the traditional stopping criterion (Bellman error bound), we further propose a 45-degree line stopping criterion to terminate value iteration early by identifying an optimally equivalent value function. A comparison of the computational results with those of other three existing policies indicates that the proposed sampling algorithm and stopping criterion can determine a high-quality ADP policy. Finally, we discuss the extrapolation issue of the value function approximated by multivariate adaptive regression splines, the results of which further demonstrate the quality of the ADP policy generated in this study.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0305-0548
0305-0548
DOI:10.1016/j.cor.2020.105032