Reinforcement learning algorithm for non-stationary environments

Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, etc., one often encounters situations wit...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied intelligence (Dordrecht, Netherlands) Jg. 50; H. 11; S. 3590 - 3606
Hauptverfasser:	Padakandla, Sindhu, K. J., Prabuchandran, Bhatnagar, Shalabh
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York Springer US 01.11.2020 Springer Nature B.V
Schlagworte:	Algorithms Artificial Intelligence Computer Science Decisions Energy management Machine learning Machines Manufacturing Markov processes Mechanical Engineering Nonstationary environments Optimization Processes Robot control Robotics Signal processing Traffic control Traffic signals Markov decision processes Change detection Non-Stationary environments Reinforcement learning
ISSN:	0924-669X, 1573-7497
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, etc., one often encounters situations with non-stationary environments, and in these scenarios, RL methods yield sub-optimal decisions. In this paper, we thus consider the problem of developing RL methods that obtain optimal decisions in a non-stationary environment. The goal of this problem is to maximize the long-term discounted reward accrued when the underlying model of the environment changes over time. To achieve this, we first adapt a change point algorithm to detect change in the statistics of the environment and then develop an RL algorithm that maximizes the long-run reward accrued. We illustrate that our change point method detects change in the model of the environment effectively and thus facilitates the RL algorithm in maximizing the long-run reward. We further validate the effectiveness of the proposed solution on non-stationary random Markov decision processes, a sensor energy management problem, and a traffic signal control problem.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-020-01758-5