Bisimulation Metrics for Continuous Markov Decision Processes

In recent years, various metrics have been developed for measuring the behavioral similarity of states in probabilistic transition systems [J. Desharnais et al., Proceedings of CONCUR'99, Springer-Verlag, London, 1999, pp. 258-273; F. van Breugel and J. Worrell, Proceedings of ICALP'01, Sp...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:SIAM journal on computing Ročník 40; číslo 6; s. 1662 - 1714
Hlavní autori: Ferns, Norm, Panangaden, Prakash, Precup, Doina
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Philadelphia, PA Society for Industrial and Applied Mathematics 01.01.2011
Predmet:
ISSN:0097-5397, 1095-7111
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:In recent years, various metrics have been developed for measuring the behavioral similarity of states in probabilistic transition systems [J. Desharnais et al., Proceedings of CONCUR'99, Springer-Verlag, London, 1999, pp. 258-273; F. van Breugel and J. Worrell, Proceedings of ICALP'01, Springer-Verlag, London, 2001, pp. 421-432]. In the context of finite Markov decision processes (MDPs), we have built on these metrics to provide a robust quantitative analogue of stochastic bisimulation [N. Ferns, P. Panangaden, and D. Precup, Proceedings of UAI-04, AUAI Press, Arlington, VA, 2004, pp. 162-169] and an efficient algorithm for its calculation [N. Ferns, P. Panangaden, and D. Precup, Proceedings of UAI-06, AUAI Press, Arlington, VA, 2006, pp. 174-181]. In this paper, we seek to properly extend these bisimulation metrics to MDPs with continuous state spaces. In particular, we provide the first distance-estimation scheme for metrics based on bisimulation for continuous probabilistic transition systems. Our work, based on statistical sampling and infinite dimensional linear programming, is a crucial first step in formally guiding real-world planning, where tasks are usually continuous and highly stochastic in nature, e.g., robot navigation, and often a substitution with a parametric model or crude finite approximation must be made. We show that the optimal value function associated with a discounted infinite-horizon planning task is continuous with respect to metric distances. Thus, our metrics allow one to reason about the quality of solution obtained by replacing one model with another. Alternatively, they may potentially be used directly for state aggregation. [PUBLICATION ABSTRACT]
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0097-5397
1095-7111
DOI:10.1137/10080484X