Stochastic optimization of multireservoir systems via reinforcement learning

Although several variants of stochastic dynamic programming have been applied to optimal operation of multireservoir systems, they have been plagued by a high‐dimensional state space and the inability to accurately incorporate the stochastic environment as characterized by temporally and spatially c...

Full description

Saved in:

Bibliographic Details
Published in:	Water resources research Vol. 43; no. 11
Main Authors:	Lee, J.H, Labadie, J.W
Format:	Journal Article
Language:	English
Published:	Blackwell Publishing Ltd 01.11.2007
Subjects:	case studies dynamic models dynamic programming Geum hydrologic models mathematical models Monte Carlo method optimization probabilistic models reinforcement learning reservoirs rivers spatial variation stochastic optimization stochastic processes temporal variation water flow South Korea Korea, Rep
ISSN:	0043-1397, 1944-7973
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Although several variants of stochastic dynamic programming have been applied to optimal operation of multireservoir systems, they have been plagued by a high‐dimensional state space and the inability to accurately incorporate the stochastic environment as characterized by temporally and spatially correlated hydrologic inflows. Reinforcement learning has emerged as an effective approach to solving sequential decision problems by combining concepts from artificial intelligence, cognitive science, and operations research. A reinforcement learning system has a mathematical foundation similar to dynamic programming and Markov decision processes, with the goal of maximizing the long‐term reward or returns as conditioned on the state of the system environment and the immediate reward obtained from operational decisions. Reinforcement learning can include Monte Carlo simulation where transition probabilities and rewards are not explicitly known a priori. The Q‐Learning method in reinforcement learning is demonstrated on the two‐reservoir Geum River system, South Korea, and is shown to outperform implicit stochastic dynamic programming and sampling stochastic dynamic programming methods.
Bibliography:	ark:/67375/WNG-HC32JQ9P-H ArticleID:2006WR005627 istex:2C354B438C2E0A6B37F942F108EEADE49A1FEC45 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0043-1397 1944-7973
DOI:	10.1029/2006WR005627