Value iteration for simple stochastic games: Stopping criterion and learning algorithm

The classical problem of reachability in simple stochastic games is typically solved by value iteration (VI), which produces a sequence of under-approximations of the value of the game, but is only guaranteed to converge in the limit. We provide an additional converging sequence of over-approximatio...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Information and computation Ročník 285; s. 104886
Hlavní autori:	Eisentraut, Julia, Kelmendi, Edon, Křetínský, Jan, Weininger, Maximilian
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Elsevier Inc 01.05.2022
Predmet:	Markov decision processes Probabilistic verification Reachability Stochastic games Value iteration Probabilistic verification Stochastic games Value iteration Markov decision processes Reachability
ISSN:	0890-5401, 1090-2651
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	The classical problem of reachability in simple stochastic games is typically solved by value iteration (VI), which produces a sequence of under-approximations of the value of the game, but is only guaranteed to converge in the limit. We provide an additional converging sequence of over-approximations, based on an analysis of the game graph. Together, these two sequences entail the first error bound and hence the first stopping criterion for VI on simple stochastic games, indicating when the algorithm can be stopped for a given precision. Consequently, VI becomes an anytime algorithm returning the approximation of the value and the current error bound. We further use this error bound to provide a learning-based asynchronous VI algorithm; it uses simulations and thus often avoids exploring the whole game graph, but still yields the same guarantees. Finally, we experimentally show that the overhead for computing the additional sequence of over-approximations often is negligible.
ISSN:	0890-5401 1090-2651
DOI:	10.1016/j.ic.2022.104886