Scenario-Based Verification of Uncertain MDPs

We consider Markov decision processes (MDPs) in which the transition probabilities and rewards belong to an uncertainty set parametrized by a collection of random variables. The probability distributions for these random parameters are unknown. The problem is to compute the probability to satisfy a...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Tools and algorithms for the construction and analysis of systems : 26th International Conference, TACAS 2020, held as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2020, Dublin, Ireland, April 25-30 Ročník 12078; s. 287
Hlavní autori:	Cubuktepe, Murat, Jansen, Nils, Junges, Sebastian, Katoen, Joost-Pieter, Topcu, Ufuk
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Switzerland 01.04.2020
Predmet:	MDP Uncertainty Verification Scenario optimisation
On-line prístup:	Zistit podrobnosti o prístupe
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	We consider Markov decision processes (MDPs) in which the transition probabilities and rewards belong to an uncertainty set parametrized by a collection of random variables. The probability distributions for these random parameters are unknown. The problem is to compute the probability to satisfy a temporal logic specification within any MDP that corresponds to a sample from these unknown distributions. In general, this problem is undecidable, and we resort to techniques from so-called scenario optimization. Based on a finite number of samples of the uncertain parameters, each of which induces an MDP, the proposed method estimates the probability of satisfying the specification by solving a finite-dimensional convex optimization problem. The number of samples required to obtain a high confidence on this estimate is independent from the number of states and the number of random parameters. Experiments on a large set of benchmarks show that a few thousand samples suffice to obtain high-quality confidence bounds with a high probability.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
DOI:	10.1007/978-3-030-45190-5_16