Reinforcement learning algorithm for non-stationary environments

Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, etc., one often encounters situations wit...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Applied intelligence (Dordrecht, Netherlands) Ročník 50; číslo 11; s. 3590 - 3606
Hlavní autoři:	Padakandla, Sindhu, K. J., Prabuchandran, Bhatnagar, Shalabh
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York Springer US 01.11.2020 Springer Nature B.V
Témata:	Algorithms Artificial Intelligence Computer Science Decisions Energy management Machine learning Machines Manufacturing Markov processes Mechanical Engineering Nonstationary environments Optimization Processes Robot control Robotics Signal processing Traffic control Traffic signals Markov decision processes Change detection Non-Stationary environments Reinforcement learning
ISSN:	0924-669X, 1573-7497
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, etc., one often encounters situations with non-stationary environments, and in these scenarios, RL methods yield sub-optimal decisions. In this paper, we thus consider the problem of developing RL methods that obtain optimal decisions in a non-stationary environment. The goal of this problem is to maximize the long-term discounted reward accrued when the underlying model of the environment changes over time. To achieve this, we first adapt a change point algorithm to detect change in the statistics of the environment and then develop an RL algorithm that maximizes the long-run reward accrued. We illustrate that our change point method detects change in the model of the environment effectively and thus facilitates the RL algorithm in maximizing the long-run reward. We further validate the effectiveness of the proposed solution on non-stationary random Markov decision processes, a sensor energy management problem, and a traffic signal control problem.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-020-01758-5