Value iteration for simple stochastic games: Stopping criterion and learning algorithm

The classical problem of reachability in simple stochastic games is typically solved by value iteration (VI), which produces a sequence of under-approximations of the value of the game, but is only guaranteed to converge in the limit. We provide an additional converging sequence of over-approximatio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information and computation Jg. 285; S. 104886
Hauptverfasser: Eisentraut, Julia, Kelmendi, Edon, Křetínský, Jan, Weininger, Maximilian
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Inc 01.05.2022
Schlagworte:
ISSN:0890-5401, 1090-2651
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract The classical problem of reachability in simple stochastic games is typically solved by value iteration (VI), which produces a sequence of under-approximations of the value of the game, but is only guaranteed to converge in the limit. We provide an additional converging sequence of over-approximations, based on an analysis of the game graph. Together, these two sequences entail the first error bound and hence the first stopping criterion for VI on simple stochastic games, indicating when the algorithm can be stopped for a given precision. Consequently, VI becomes an anytime algorithm returning the approximation of the value and the current error bound. We further use this error bound to provide a learning-based asynchronous VI algorithm; it uses simulations and thus often avoids exploring the whole game graph, but still yields the same guarantees. Finally, we experimentally show that the overhead for computing the additional sequence of over-approximations often is negligible.
AbstractList The classical problem of reachability in simple stochastic games is typically solved by value iteration (VI), which produces a sequence of under-approximations of the value of the game, but is only guaranteed to converge in the limit. We provide an additional converging sequence of over-approximations, based on an analysis of the game graph. Together, these two sequences entail the first error bound and hence the first stopping criterion for VI on simple stochastic games, indicating when the algorithm can be stopped for a given precision. Consequently, VI becomes an anytime algorithm returning the approximation of the value and the current error bound. We further use this error bound to provide a learning-based asynchronous VI algorithm; it uses simulations and thus often avoids exploring the whole game graph, but still yields the same guarantees. Finally, we experimentally show that the overhead for computing the additional sequence of over-approximations often is negligible.
ArticleNumber 104886
Author Eisentraut, Julia
Kelmendi, Edon
Weininger, Maximilian
Křetínský, Jan
Author_xml – sequence: 1
  givenname: Julia
  orcidid: 0000-0002-7735-8751
  surname: Eisentraut
  fullname: Eisentraut, Julia
  organization: Technical University of Munich, Germany
– sequence: 2
  givenname: Edon
  orcidid: 0000-0003-3100-1500
  surname: Kelmendi
  fullname: Kelmendi, Edon
  organization: University of Oxford, United Kingdom
– sequence: 3
  givenname: Jan
  surname: Křetínský
  fullname: Křetínský, Jan
  organization: Technical University of Munich, Germany
– sequence: 4
  givenname: Maximilian
  orcidid: 0000-0002-0163-2152
  surname: Weininger
  fullname: Weininger, Maximilian
  email: maxi.weininger@tum.de
  organization: Technical University of Munich, Germany
BookMark eNp9kMtOwzAQRS1UJNrCnqV_IMWPOHG6QxUvqRILoFvLnUxaV0lc2QaJvyehrJBgNa97Rrp3Ria975GQa84WnPHi5rBwsBBMiGHMtS7OyJSzimWiUHxCpkwPvcoZvyCzGA-Mca7yYko2G9u-I3UJg03O97TxgUbXHVukMXnY25gc0J3tMC7pS_LHo-t3FMJIjHrb17RFG_pxbdudHy777pKcN7aNePVT5-Tt_u519Zitnx-eVrfrDKQsUqZsZZXSdaXqspCNEJLrrWUC6m2pclRVWQqhtbQVDDKU2mIJFeRlI5WsUcg5KU5_IfgYAzYGXPo2koJ1reHMjOmYg3FgxnTMKZ0BZL_AY3CdDZ__IcsTgoOhD4fBRHDYA9YuICRTe_c3_AU2lX6O
CitedBy_id crossref_primary_10_1016_j_ic_2024_105193
crossref_primary_10_4204_EPTCS_428_4
crossref_primary_10_1016_j_ic_2024_105214
crossref_primary_10_1016_j_ic_2024_105236
Cites_doi 10.1007/s10703-010-0097-6
10.1007/s10703-013-0183-7
10.1016/0890-5401(90)90004-2
10.1016/0890-5401(92)90048-K
10.1007/s001650300007
10.1145/3060139
10.1002/9780470316887
10.1016/S0004-3702(00)00039-4
10.1016/j.ejcon.2016.04.009
10.1007/s004539910020
10.1109/TAC.2016.2598476
10.1016/j.tcs.2016.12.003
10.1287/mnsc.25.4.352
10.1007/BF01720283
10.1287/mnsc.12.5.359
10.1145/1968.1972
10.1109/TSMCC.2007.913919
10.1007/s10703-006-0005-2
10.1145/2168260.2168264
10.1016/j.jcss.2012.12.001
ContentType Journal Article
Copyright 2022 The Authors
Copyright_xml – notice: 2022 The Authors
DBID 6I.
AAFTH
AAYXX
CITATION
DOI 10.1016/j.ic.2022.104886
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1090-2651
ExternalDocumentID 10_1016_j_ic_2022_104886
S0890540122000281
GrantInformation_xml – fundername: German Research Foundation
  grantid: KR 4890/2-1
  funderid: https://doi.org/10.13039/501100001659
– fundername: TUM IGSSE
  grantid: 10.06
– fundername: the German Excellence Initiative and the European Union Seventh Framework Programme
  grantid: 291763 for TUM – IAS
– fundername: Studienstiftung des Deutschen Volkes
  funderid: https://doi.org/10.13039/501100004350
GroupedDBID --K
--M
--Z
-~X
.~1
0R~
1B1
1~.
1~5
29I
4.4
457
4G.
5GY
5VS
6I.
6TJ
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAFTH
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABAOU
ABBOA
ABFNM
ABJNI
ABMAC
ABTAH
ABVKL
ABXDB
ABYKQ
ACAZW
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADFGL
ADMUD
AEBSH
AEKER
AENEX
AEXQZ
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ARUGR
ASPBG
AVWKF
AXJTR
AZFZN
BKOJK
BLXMC
CAG
COF
CS3
DM4
DU5
E3Z
EBS
EFBJH
EFLBG
EJD
EO8
EO9
EP2
EP3
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HVGLF
HZ~
H~9
IHE
IXB
J1W
KOM
LG5
LX9
M41
MHUIS
MO0
MVM
N9A
NCXOZ
O-L
O9-
OAUVE
OK1
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
RNS
ROL
RPZ
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSV
SSW
SSZ
T5K
TN5
WH7
WUQ
XJT
XPP
ZMT
ZU3
ZY4
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
ADVLN
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c336t-5a9a558d95d763f22318ba02cdb754e597722883a9c558e38ae7c9c47f353de23
ISICitedReferencesCount 13
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000886255300011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0890-5401
IngestDate Sat Nov 29 07:13:10 EST 2025
Tue Nov 18 21:32:24 EST 2025
Fri Feb 23 02:40:39 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Probabilistic verification
Stochastic games
Value iteration
Markov decision processes
Reachability
Language English
License This is an open access article under the CC BY-NC-ND license.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c336t-5a9a558d95d763f22318ba02cdb754e597722883a9c558e38ae7c9c47f353de23
ORCID 0000-0003-3100-1500
0000-0002-0163-2152
0000-0002-7735-8751
OpenAccessLink https://dx.doi.org/10.1016/j.ic.2022.104886
ParticipantIDs crossref_citationtrail_10_1016_j_ic_2022_104886
crossref_primary_10_1016_j_ic_2022_104886
elsevier_sciencedirect_doi_10_1016_j_ic_2022_104886
PublicationCentury 2000
PublicationDate May 2022
2022-05-00
PublicationDateYYYYMMDD 2022-05-01
PublicationDate_xml – month: 05
  year: 2022
  text: May 2022
PublicationDecade 2020
PublicationTitle Information and computation
PublicationYear 2022
Publisher Elsevier Inc
Publisher_xml – name: Elsevier Inc
References Vrieze, Tijs, Raghavan, Filar (br0600) 1983; 5
Haddad, Monmege (br0330) 2018; 735
Davey, Priestley (br0270) 2002
Dehnert, Junges, Katoen, Volk (br0280) 2017
Kattenbelt, Kwiatkowska, Norman, Parker (br0380) 2010; 36
LaValle (br0490) 2000; 26
Li, Liu (br0500) 2008
Hoffman, Karp (br0350) 1966; 12
Kelmendi, Krämer, Kretínský, Weininger (br0390) 2018
Ashok, Chatterjee, Kretínský, Weininger, Winkler (br0040) 2020
Chatterjee, Fijalkow (br0160) 2011
Chatterjee, Henzinger (br0180) 2008
Kretínský, Meggendorfer (br0410) 2020; 16
Svorenová, Kwiatkowska (br0560) 2016; 30
Ashok, Chatterjee, Daca, Kretínský, Meggendorfer (br0030) 2017
Calinescu, Kikuchi, Johnson (br0130) 2012
Ashok, Daca, Kretínský, Weininger (br0050) 2020
Baier, Katoen (br0070) 2008
Chatterjee, Henzinger, Jobstmann, Radhakrishna (br0190) 2010
Daca, Henzinger, Kretínský, Petrov (br0260) 2017; 18
Valiant (br0590) 1984; 27
McMahan, Likhachev, Gordon (br0510) 2005
Strehl, Li, Wiewiora, Langford, Littman (br0550) 2006
Chen, Forejt, Kwiatkowska, Parker, Simaitis (br0200) 2013; 43
Ujma (br0580) 2015
Baier, Klein, Leuschner, Parker, Wunderlich (br0080) 2017
Balaji, Kiefer, Novotný, Pérez, Shirmohammadi (br0090) 2019
Saffre, Simaitis (br0540) 2012; 7
van Dijk (br0290) 2018
Cheng, Knoll, Luttenberger, Buckl (br0230) 2011
Kwiatkowska, Norman, Parker (br0450) 2012
Phalakarn, Takisaka, Haas, Hasuo (br0520) 2020
Tcheukam, Tembine (br0570) 2016
Kwiatkowska, Norman, Parker, Santos (br0430) 2020
Kwiatkowska, Norman, Parker, Sproston (br0460) 2006; 29
Puterman (br0530) 1994
Feng, Kwiatkowska, Parker (br0310) 2011
Wen, Topcu (br0610) 2016
Cámara, Moreno, Garlan (br0140) 2014
Brázdil, Chatterjee, Chmelik, Forejt, Kretínský, Kwiatkowska, Parker, Ujma (br0110) 2014
Filar, Vrieze (br0320) 2012
Hordijk, Kallenberg (br0360) 1979; 25
Busoniu, Babuska, Schutter (br0120) 2008; 38
Kwiatkowska, Norman, Sproston (br0480) 2003; 14
Brafman, Tennenholtz (br0100) 2000; 121
Itai, Rodeh (br0370) 1990; 88
Eisentraut, Kretínský, Rotar (br0300) 2019
Chen, Kwiatkowska, Parker, Simaitis (br0210) 2011
Hahn, Hartmanns, Hensel, Klauck, Klein, Kretínský, Parker, Quatmann, Ruijters, Steinmetz (br0340) 2019
Chatterjee, Henzinger (br0170) 2011
Chatterjee, de Alfaro, Henzinger (br0150) 2013; 79
Kretínský, Ramneantu, Slivinskiy, Weininger (br0420) 2020
Ashok, Kretínský, Weininger (br0060) 2019
Kwiatkowska, Norman, Parker (br0440) 2011
Condon (br0240) 1992; 96
Kwiatkowska, Norman, Sproston (br0470) 2002
Arslan, Yüksel (br0020) 2017; 62
Condon (br0250) 1993
Andersson, Miltersen (br0010) 2009
Chen, Kwiatkowska, Simaitis, Wiltsche (br0220) 2013
Kretínský, Meggendorfer (br0400) 2017
Chen (10.1016/j.ic.2022.104886_br0200) 2013; 43
Filar (10.1016/j.ic.2022.104886_br0320) 2012
Kretínský (10.1016/j.ic.2022.104886_br0420) 2020
LaValle (10.1016/j.ic.2022.104886_br0490) 2000; 26
Chatterjee (10.1016/j.ic.2022.104886_br0160) 2011
Chatterjee (10.1016/j.ic.2022.104886_br0170) 2011
Cheng (10.1016/j.ic.2022.104886_br0230) 2011
Kwiatkowska (10.1016/j.ic.2022.104886_br0460) 2006; 29
Svorenová (10.1016/j.ic.2022.104886_br0560) 2016; 30
Saffre (10.1016/j.ic.2022.104886_br0540) 2012; 7
Daca (10.1016/j.ic.2022.104886_br0260) 2017; 18
Chen (10.1016/j.ic.2022.104886_br0220) 2013
Wen (10.1016/j.ic.2022.104886_br0610) 2016
Brafman (10.1016/j.ic.2022.104886_br0100) 2000; 121
Ashok (10.1016/j.ic.2022.104886_br0060) 2019
Kattenbelt (10.1016/j.ic.2022.104886_br0380) 2010; 36
Kwiatkowska (10.1016/j.ic.2022.104886_br0470) 2002
Kwiatkowska (10.1016/j.ic.2022.104886_br0480) 2003; 14
Balaji (10.1016/j.ic.2022.104886_br0090) 2019
Chatterjee (10.1016/j.ic.2022.104886_br0190) 2010
Cámara (10.1016/j.ic.2022.104886_br0140) 2014
Chatterjee (10.1016/j.ic.2022.104886_br0150) 2013; 79
Kelmendi (10.1016/j.ic.2022.104886_br0390) 2018
Condon (10.1016/j.ic.2022.104886_br0240) 1992; 96
Ashok (10.1016/j.ic.2022.104886_br0050) 2020
Vrieze (10.1016/j.ic.2022.104886_br0600) 1983; 5
Ashok (10.1016/j.ic.2022.104886_br0040) 2020
Eisentraut (10.1016/j.ic.2022.104886_br0300)
Dehnert (10.1016/j.ic.2022.104886_br0280) 2017
Andersson (10.1016/j.ic.2022.104886_br0010) 2009
Puterman (10.1016/j.ic.2022.104886_br0530) 1994
Baier (10.1016/j.ic.2022.104886_br0070) 2008
Kwiatkowska (10.1016/j.ic.2022.104886_br0450) 2012
Davey (10.1016/j.ic.2022.104886_br0270) 2002
Arslan (10.1016/j.ic.2022.104886_br0020) 2017; 62
Strehl (10.1016/j.ic.2022.104886_br0550) 2006
Kwiatkowska (10.1016/j.ic.2022.104886_br0440) 2011
Hordijk (10.1016/j.ic.2022.104886_br0360) 1979; 25
Kretínský (10.1016/j.ic.2022.104886_br0410) 2020; 16
Chatterjee (10.1016/j.ic.2022.104886_br0180) 2008
Brázdil (10.1016/j.ic.2022.104886_br0110) 2014
Haddad (10.1016/j.ic.2022.104886_br0330) 2018; 735
Li (10.1016/j.ic.2022.104886_br0500) 2008
Chen (10.1016/j.ic.2022.104886_br0210) 2011
Condon (10.1016/j.ic.2022.104886_br0250) 1993
Hahn (10.1016/j.ic.2022.104886_br0340) 2019
Valiant (10.1016/j.ic.2022.104886_br0590) 1984; 27
Kretínský (10.1016/j.ic.2022.104886_br0400) 2017
Itai (10.1016/j.ic.2022.104886_br0370) 1990; 88
Phalakarn (10.1016/j.ic.2022.104886_br0520) 2020
Ashok (10.1016/j.ic.2022.104886_br0030) 2017
Tcheukam (10.1016/j.ic.2022.104886_br0570) 2016
McMahan (10.1016/j.ic.2022.104886_br0510) 2005
Baier (10.1016/j.ic.2022.104886_br0080) 2017
Kwiatkowska (10.1016/j.ic.2022.104886_br0430) 2020
Feng (10.1016/j.ic.2022.104886_br0310) 2011
Hoffman (10.1016/j.ic.2022.104886_br0350) 1966; 12
van Dijk (10.1016/j.ic.2022.104886_br0290) 2018
Ujma (10.1016/j.ic.2022.104886_br0580) 2015
Busoniu (10.1016/j.ic.2022.104886_br0120) 2008; 38
Calinescu (10.1016/j.ic.2022.104886_br0130) 2012
References_xml – start-page: 51
  year: 1993
  end-page: 71
  ident: br0250
  article-title: On algorithms for simple stochastic games
  publication-title: Advances in Computational Complexity Theory
– volume: 29
  start-page: 33
  year: 2006
  end-page: 78
  ident: br0460
  article-title: Performance analysis of probabilistic timed automata using digital clocks
  publication-title: Form. Methods Syst. Des.
– start-page: 169
  year: 2002
  end-page: 187
  ident: br0470
  article-title: Probabilistic model checking of the IEEE 802.11 wireless local area network protocol
  publication-title: PAPM-PROBMIV
– start-page: 203
  year: 2012
  end-page: 204
  ident: br0450
  article-title: The PRISM benchmark suite
  publication-title: QEST
– volume: 14
  start-page: 295
  year: 2003
  end-page: 318
  ident: br0480
  article-title: Probabilistic model checking of deadline properties in the IEEE 1394 firewire root contention protocol
  publication-title: Form. Asp. Comput.
– year: 2008
  ident: br0070
  article-title: Principles of Model Checking
– volume: 62
  start-page: 1545
  year: 2017
  end-page: 1558
  ident: br0020
  article-title: Decentralized Q-learning for stochastic teams and games
  publication-title: IEEE Trans. Autom. Control
– start-page: 380
  year: 2017
  end-page: 399
  ident: br0400
  article-title: Efficient strategy iteration for mean payoff in Markov decision processes
  publication-title: ATVA
– start-page: 102:1
  year: 2019
  end-page: 102:15
  ident: br0090
  article-title: On the complexity of value iteration
  publication-title: ICALP
– start-page: 592
  year: 2017
  end-page: 600
  ident: br0280
  article-title: A storm is coming: a modern probabilistic model checker
  publication-title: CAV (2)
– year: 2012
  ident: br0320
  article-title: Competitive Markov Decision Processes
– start-page: 1318
  year: 2011
  end-page: 1336
  ident: br0170
  article-title: Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification
  publication-title: SODA
– start-page: 190
  year: 2011
  end-page: 207
  ident: br0210
  article-title: Verifying team formation protocols with probabilistic model checking
  publication-title: CLIMA
– volume: 88
  start-page: 60
  year: 1990
  end-page: 87
  ident: br0370
  article-title: Symmetry breaking in distributed networks
  publication-title: Inf. Comput.
– start-page: 585
  year: 2011
  end-page: 591
  ident: br0440
  article-title: PRISM 4.0: verification of probabilistic real-time systems
  publication-title: CAV
– volume: 7
  start-page: 4:1
  year: 2012
  end-page: 4:16
  ident: br0540
  article-title: Host selection through collective decision
  publication-title: ACM Trans. Auton. Adapt. Syst.
– start-page: 349
  year: 2020
  end-page: 371
  ident: br0520
  article-title: Widest paths and global propagation in bounded value iteration for stochastic games
  publication-title: CAV (2)
– volume: 79
  start-page: 640
  year: 2013
  end-page: 657
  ident: br0150
  article-title: Strategy improvement for concurrent reachability and turn-based stochastic safety games
  publication-title: J. Comput. Syst. Sci.
– volume: 735
  start-page: 111
  year: 2018
  end-page: 131
  ident: br0330
  article-title: Interval iteration algorithm for mdps and imdps
  publication-title: Theor. Comput. Sci.
– start-page: 322
  year: 2013
  end-page: 337
  ident: br0220
  article-title: Synthesis for multi-objective stochastic games: an application to autonomous urban driving
  publication-title: QEST
– volume: 96
  start-page: 203
  year: 1992
  end-page: 224
  ident: br0240
  article-title: The complexity of stochastic games
  publication-title: Inf. Comput.
– start-page: 331
  year: 2020
  end-page: 349
  ident: br0050
  article-title: Statistical model checking: black or white?
  publication-title: ISoLA (1)
– volume: 43
  start-page: 61
  year: 2013
  end-page: 92
  ident: br0200
  article-title: Automatic verification of competitive stochastic systems
  publication-title: Form. Methods Syst. Des.
– start-page: 303
  year: 2012
  end-page: 329
  ident: br0130
  article-title: Compositional reverification of probabilistic safety properties for large-scale complex IT systems
  publication-title: Monterey Workshop
– volume: 36
  start-page: 246
  year: 2010
  end-page: 280
  ident: br0380
  article-title: A game-based abstraction-refinement framework for Markov decision processes
  publication-title: Form. Methods Syst. Des.
– start-page: 69
  year: 2019
  end-page: 92
  ident: br0340
  article-title: The 2019 comparison of tools for the analysis of quantitative formal models - (QCOMP 2019 competition report)
  publication-title: TACAS (3)
– start-page: 881
  year: 2006
  end-page: 888
  ident: br0550
  article-title: PAC model-free reinforcement learning
  publication-title: ICML
– start-page: 258
  year: 2011
  end-page: 261
  ident: br0230
  article-title: GAVS+: an open platform for the research of algorithmic game solving
  publication-title: TACAS
– start-page: 497
  year: 2019
  end-page: 519
  ident: br0060
  article-title: PAC statistical model checking for Markov decision processes and stochastic games
  publication-title: CAV (1)
– start-page: 155
  year: 2014
  end-page: 164
  ident: br0140
  article-title: Stochastic game analysis and latency awareness for proactive self-adaptation
  publication-title: SEAMS
– start-page: 107
  year: 2008
  end-page: 138
  ident: br0180
  article-title: Value iteration
  publication-title: 25 Years of Model Checking - History, Achievements, Perspectives
– volume: 38
  start-page: 156
  year: 2008
  end-page: 172
  ident: br0120
  article-title: A comprehensive survey of multiagent reinforcement learning
  publication-title: IEEE Trans. Syst. Man Cybern. Part C
– start-page: 198
  year: 2018
  end-page: 215
  ident: br0290
  article-title: Attracting tangles to solve parity games
  publication-title: CAV (2)
– volume: 5
  start-page: 15
  year: 1983
  end-page: 24
  ident: br0600
  article-title: A finite algorithm for the switching control stochastic game
  publication-title: OR Spektrum
– start-page: 1135
  year: 2008
  end-page: 1144
  ident: br0500
  article-title: A novel heuristic Q-learning algorithm for solving stochastic games
  publication-title: IJCNN
– start-page: 201
  year: 2017
  end-page: 221
  ident: br0030
  article-title: Value iteration for long-run average reward in Markov decision processes
  publication-title: CAV (1)
– year: 1994
  ident: br0530
  article-title: Markov Decision Processes: Discrete Stochastic Dynamic Programming
  publication-title: Wiley Series in Probability and Statistics
– volume: 12
  start-page: 359
  year: 1966
  end-page: 370
  ident: br0350
  article-title: On nonterminating stochastic games
  publication-title: Manag. Sci.
– volume: 16
  year: 2020
  ident: br0410
  article-title: Of cores: a partial-exploration framework for Markov decision processes
  publication-title: Log. Methods Comput. Sci.
– start-page: 3630
  year: 2016
  end-page: 3636
  ident: br0610
  article-title: Probably approximately correct learning in stochastic games with temporal logic specifications
  publication-title: IJCAI, IJCAI/AAAI Press
– year: 2019
  ident: br0300
  article-title: Stopping criteria for value and strategy iteration on concurrent stochastic reachability games
– start-page: 665
  year: 2010
  end-page: 669
  ident: br0190
  article-title: Gist: a solver for probabilistic games
  publication-title: CAV
– start-page: 74
  year: 2011
  end-page: 86
  ident: br0160
  article-title: A reduction from parity games to simple stochastic games
  publication-title: GandALF
– start-page: 475
  year: 2020
  end-page: 487
  ident: br0430
  article-title: Prism-games 3.0: stochastic game verification with concurrency, equilibria and time
  publication-title: CAV (2)
– volume: 30
  start-page: 15
  year: 2016
  end-page: 30
  ident: br0560
  article-title: Quantitative verification and strategy synthesis for stochastic games
  publication-title: Eur. J. Control
– start-page: 112
  year: 2009
  end-page: 121
  ident: br0010
  article-title: The complexity of solving stochastic games on graphs
  publication-title: ISAAC
– volume: 121
  start-page: 31
  year: 2000
  end-page: 47
  ident: br0100
  article-title: A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
  publication-title: Artif. Intell.
– start-page: 144
  year: 2016
  end-page: 145
  ident: br0570
  article-title: One swarm per queen: a particle swarm learning for stochastic games
  publication-title: SASO
– start-page: 160
  year: 2017
  end-page: 180
  ident: br0080
  article-title: Ensuring the reliability of your model checker: interval iteration for Markov decision processes
  publication-title: CAV (1)
– volume: 18
  start-page: 12:1
  year: 2017
  end-page: 12:25
  ident: br0260
  article-title: Faster statistical model checking for unbounded temporal properties
  publication-title: ACM Trans. Comput. Log.
– start-page: 102
  year: 2020
  end-page: 115
  ident: br0040
  article-title: Approximating values of generalized-reachability stochastic games
  publication-title: LICS
– volume: 25
  start-page: 352
  year: 1979
  end-page: 362
  ident: br0360
  article-title: Linear programming and Markov decision chains
  publication-title: Manag. Sci.
– start-page: 623
  year: 2018
  end-page: 642
  ident: br0390
  article-title: Value iteration for simple stochastic games: stopping criterion and learning algorithm
  publication-title: CAV (1)
– start-page: 131
  year: 2020
  end-page: 148
  ident: br0420
  article-title: Comparison of algorithms for simple stochastic games
  publication-title: GandALF
– start-page: 98
  year: 2014
  end-page: 114
  ident: br0110
  article-title: Verification of Markov decision processes using learning algorithms
  publication-title: ATVA
– volume: 27
  start-page: 1134
  year: 1984
  end-page: 1142
  ident: br0590
  article-title: A theory of the learnable
  publication-title: Commun. ACM
– year: 2002
  ident: br0270
  article-title: Introduction to Lattices and Order
– year: 2015
  ident: br0580
  article-title: On verification and controller synthesis for probabilistic systems at runtime
– volume: 26
  start-page: 430
  year: 2000
  end-page: 465
  ident: br0490
  article-title: Robot motion planning: a game-theoretic foundation
  publication-title: Algorithmica
– start-page: 2
  year: 2011
  end-page: 17
  ident: br0310
  article-title: Automated learning of probabilistic assumptions for compositional reasoning
  publication-title: FASE
– start-page: 569
  year: 2005
  end-page: 576
  ident: br0510
  article-title: Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees
  publication-title: ICML
– start-page: 169
  year: 2002
  ident: 10.1016/j.ic.2022.104886_br0470
  article-title: Probabilistic model checking of the IEEE 802.11 wireless local area network protocol
– start-page: 69
  year: 2019
  ident: 10.1016/j.ic.2022.104886_br0340
  article-title: The 2019 comparison of tools for the analysis of quantitative formal models - (QCOMP 2019 competition report)
– start-page: 198
  year: 2018
  ident: 10.1016/j.ic.2022.104886_br0290
  article-title: Attracting tangles to solve parity games
– start-page: 144
  year: 2016
  ident: 10.1016/j.ic.2022.104886_br0570
  article-title: One swarm per queen: a particle swarm learning for stochastic games
– start-page: 102
  year: 2020
  ident: 10.1016/j.ic.2022.104886_br0040
  article-title: Approximating values of generalized-reachability stochastic games
– start-page: 131
  year: 2020
  ident: 10.1016/j.ic.2022.104886_br0420
  article-title: Comparison of algorithms for simple stochastic games
– start-page: 107
  year: 2008
  ident: 10.1016/j.ic.2022.104886_br0180
  article-title: Value iteration
– start-page: 74
  year: 2011
  ident: 10.1016/j.ic.2022.104886_br0160
  article-title: A reduction from parity games to simple stochastic games
– start-page: 881
  year: 2006
  ident: 10.1016/j.ic.2022.104886_br0550
  article-title: PAC model-free reinforcement learning
– volume: 36
  start-page: 246
  year: 2010
  ident: 10.1016/j.ic.2022.104886_br0380
  article-title: A game-based abstraction-refinement framework for Markov decision processes
  publication-title: Form. Methods Syst. Des.
  doi: 10.1007/s10703-010-0097-6
– volume: 43
  start-page: 61
  year: 2013
  ident: 10.1016/j.ic.2022.104886_br0200
  article-title: Automatic verification of competitive stochastic systems
  publication-title: Form. Methods Syst. Des.
  doi: 10.1007/s10703-013-0183-7
– volume: 88
  start-page: 60
  year: 1990
  ident: 10.1016/j.ic.2022.104886_br0370
  article-title: Symmetry breaking in distributed networks
  publication-title: Inf. Comput.
  doi: 10.1016/0890-5401(90)90004-2
– start-page: 190
  year: 2011
  ident: 10.1016/j.ic.2022.104886_br0210
  article-title: Verifying team formation protocols with probabilistic model checking
– volume: 96
  start-page: 203
  year: 1992
  ident: 10.1016/j.ic.2022.104886_br0240
  article-title: The complexity of stochastic games
  publication-title: Inf. Comput.
  doi: 10.1016/0890-5401(92)90048-K
– start-page: 497
  year: 2019
  ident: 10.1016/j.ic.2022.104886_br0060
  article-title: PAC statistical model checking for Markov decision processes and stochastic games
– year: 2015
  ident: 10.1016/j.ic.2022.104886_br0580
– volume: 14
  start-page: 295
  year: 2003
  ident: 10.1016/j.ic.2022.104886_br0480
  article-title: Probabilistic model checking of deadline properties in the IEEE 1394 firewire root contention protocol
  publication-title: Form. Asp. Comput.
  doi: 10.1007/s001650300007
– start-page: 349
  year: 2020
  ident: 10.1016/j.ic.2022.104886_br0520
  article-title: Widest paths and global propagation in bounded value iteration for stochastic games
– volume: 18
  start-page: 12:1
  year: 2017
  ident: 10.1016/j.ic.2022.104886_br0260
  article-title: Faster statistical model checking for unbounded temporal properties
  publication-title: ACM Trans. Comput. Log.
  doi: 10.1145/3060139
– start-page: 1135
  year: 2008
  ident: 10.1016/j.ic.2022.104886_br0500
  article-title: A novel heuristic Q-learning algorithm for solving stochastic games
– year: 1994
  ident: 10.1016/j.ic.2022.104886_br0530
  article-title: Markov Decision Processes: Discrete Stochastic Dynamic Programming
  doi: 10.1002/9780470316887
– volume: 121
  start-page: 31
  year: 2000
  ident: 10.1016/j.ic.2022.104886_br0100
  article-title: A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
  publication-title: Artif. Intell.
  doi: 10.1016/S0004-3702(00)00039-4
– start-page: 112
  year: 2009
  ident: 10.1016/j.ic.2022.104886_br0010
  article-title: The complexity of solving stochastic games on graphs
– start-page: 322
  year: 2013
  ident: 10.1016/j.ic.2022.104886_br0220
  article-title: Synthesis for multi-objective stochastic games: an application to autonomous urban driving
– start-page: 203
  year: 2012
  ident: 10.1016/j.ic.2022.104886_br0450
  article-title: The PRISM benchmark suite
– volume: 30
  start-page: 15
  year: 2016
  ident: 10.1016/j.ic.2022.104886_br0560
  article-title: Quantitative verification and strategy synthesis for stochastic games
  publication-title: Eur. J. Control
  doi: 10.1016/j.ejcon.2016.04.009
– year: 2012
  ident: 10.1016/j.ic.2022.104886_br0320
– volume: 26
  start-page: 430
  year: 2000
  ident: 10.1016/j.ic.2022.104886_br0490
  article-title: Robot motion planning: a game-theoretic foundation
  publication-title: Algorithmica
  doi: 10.1007/s004539910020
– start-page: 155
  year: 2014
  ident: 10.1016/j.ic.2022.104886_br0140
  article-title: Stochastic game analysis and latency awareness for proactive self-adaptation
– start-page: 665
  year: 2010
  ident: 10.1016/j.ic.2022.104886_br0190
  article-title: Gist: a solver for probabilistic games
– volume: 62
  start-page: 1545
  year: 2017
  ident: 10.1016/j.ic.2022.104886_br0020
  article-title: Decentralized Q-learning for stochastic teams and games
  publication-title: IEEE Trans. Autom. Control
  doi: 10.1109/TAC.2016.2598476
– start-page: 380
  year: 2017
  ident: 10.1016/j.ic.2022.104886_br0400
  article-title: Efficient strategy iteration for mean payoff in Markov decision processes
– volume: 735
  start-page: 111
  year: 2018
  ident: 10.1016/j.ic.2022.104886_br0330
  article-title: Interval iteration algorithm for mdps and imdps
  publication-title: Theor. Comput. Sci.
  doi: 10.1016/j.tcs.2016.12.003
– volume: 25
  start-page: 352
  year: 1979
  ident: 10.1016/j.ic.2022.104886_br0360
  article-title: Linear programming and Markov decision chains
  publication-title: Manag. Sci.
  doi: 10.1287/mnsc.25.4.352
– start-page: 623
  year: 2018
  ident: 10.1016/j.ic.2022.104886_br0390
  article-title: Value iteration for simple stochastic games: stopping criterion and learning algorithm
– volume: 5
  start-page: 15
  year: 1983
  ident: 10.1016/j.ic.2022.104886_br0600
  article-title: A finite algorithm for the switching control stochastic game
  publication-title: OR Spektrum
  doi: 10.1007/BF01720283
– start-page: 51
  year: 1993
  ident: 10.1016/j.ic.2022.104886_br0250
  article-title: On algorithms for simple stochastic games
– start-page: 331
  year: 2020
  ident: 10.1016/j.ic.2022.104886_br0050
  article-title: Statistical model checking: black or white?
– volume: 12
  start-page: 359
  year: 1966
  ident: 10.1016/j.ic.2022.104886_br0350
  article-title: On nonterminating stochastic games
  publication-title: Manag. Sci.
  doi: 10.1287/mnsc.12.5.359
– volume: 16
  year: 2020
  ident: 10.1016/j.ic.2022.104886_br0410
  article-title: Of cores: a partial-exploration framework for Markov decision processes
  publication-title: Log. Methods Comput. Sci.
– ident: 10.1016/j.ic.2022.104886_br0300
– start-page: 160
  year: 2017
  ident: 10.1016/j.ic.2022.104886_br0080
  article-title: Ensuring the reliability of your model checker: interval iteration for Markov decision processes
– start-page: 102:1
  year: 2019
  ident: 10.1016/j.ic.2022.104886_br0090
  article-title: On the complexity of value iteration
– start-page: 475
  year: 2020
  ident: 10.1016/j.ic.2022.104886_br0430
  article-title: Prism-games 3.0: stochastic game verification with concurrency, equilibria and time
– volume: 27
  start-page: 1134
  year: 1984
  ident: 10.1016/j.ic.2022.104886_br0590
  article-title: A theory of the learnable
  publication-title: Commun. ACM
  doi: 10.1145/1968.1972
– start-page: 1318
  year: 2011
  ident: 10.1016/j.ic.2022.104886_br0170
  article-title: Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification
– start-page: 569
  year: 2005
  ident: 10.1016/j.ic.2022.104886_br0510
  article-title: Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees
– start-page: 258
  year: 2011
  ident: 10.1016/j.ic.2022.104886_br0230
  article-title: GAVS+: an open platform for the research of algorithmic game solving
– start-page: 585
  year: 2011
  ident: 10.1016/j.ic.2022.104886_br0440
  article-title: PRISM 4.0: verification of probabilistic real-time systems
– start-page: 98
  year: 2014
  ident: 10.1016/j.ic.2022.104886_br0110
  article-title: Verification of Markov decision processes using learning algorithms
– volume: 38
  start-page: 156
  year: 2008
  ident: 10.1016/j.ic.2022.104886_br0120
  article-title: A comprehensive survey of multiagent reinforcement learning
  publication-title: IEEE Trans. Syst. Man Cybern. Part C
  doi: 10.1109/TSMCC.2007.913919
– start-page: 592
  year: 2017
  ident: 10.1016/j.ic.2022.104886_br0280
  article-title: A storm is coming: a modern probabilistic model checker
– start-page: 2
  year: 2011
  ident: 10.1016/j.ic.2022.104886_br0310
  article-title: Automated learning of probabilistic assumptions for compositional reasoning
– start-page: 201
  year: 2017
  ident: 10.1016/j.ic.2022.104886_br0030
  article-title: Value iteration for long-run average reward in Markov decision processes
– start-page: 3630
  year: 2016
  ident: 10.1016/j.ic.2022.104886_br0610
  article-title: Probably approximately correct learning in stochastic games with temporal logic specifications
– year: 2008
  ident: 10.1016/j.ic.2022.104886_br0070
– volume: 29
  start-page: 33
  year: 2006
  ident: 10.1016/j.ic.2022.104886_br0460
  article-title: Performance analysis of probabilistic timed automata using digital clocks
  publication-title: Form. Methods Syst. Des.
  doi: 10.1007/s10703-006-0005-2
– start-page: 303
  year: 2012
  ident: 10.1016/j.ic.2022.104886_br0130
  article-title: Compositional reverification of probabilistic safety properties for large-scale complex IT systems
– volume: 7
  start-page: 4:1
  year: 2012
  ident: 10.1016/j.ic.2022.104886_br0540
  article-title: Host selection through collective decision
  publication-title: ACM Trans. Auton. Adapt. Syst.
  doi: 10.1145/2168260.2168264
– year: 2002
  ident: 10.1016/j.ic.2022.104886_br0270
– volume: 79
  start-page: 640
  year: 2013
  ident: 10.1016/j.ic.2022.104886_br0150
  article-title: Strategy improvement for concurrent reachability and turn-based stochastic safety games
  publication-title: J. Comput. Syst. Sci.
  doi: 10.1016/j.jcss.2012.12.001
SSID ssj0011546
Score 2.448369
Snippet The classical problem of reachability in simple stochastic games is typically solved by value iteration (VI), which produces a sequence of under-approximations...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 104886
SubjectTerms Markov decision processes
Probabilistic verification
Reachability
Stochastic games
Value iteration
Title Value iteration for simple stochastic games: Stopping criterion and learning algorithm
URI https://dx.doi.org/10.1016/j.ic.2022.104886
Volume 285
WOSCitedRecordID wos000886255300011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: ScienceDirect (Freedom Collection)
  customDbUrl:
  eissn: 1090-2651
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0011546
  issn: 0890-5401
  databaseCode: AIEXJ
  dateStart: 20211207
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwELfKxgM8DBggNj7kB_aAqowkjpuYt7EVARITEmP0LXJtZ8tI02pNp_4Z_Mmc44-maEPsgZcoci9OlPv1fDn_7g6h14RyksRFEShGBkGSJCTIJCsCmhUyChVjSsi22UR6fJyNRuxrr_fL5cJcVWldZ8slm_1XVcMYKFunzt5C3X5SGIBzUDocQe1w_CfFn_JqofqmWrLjEc5LXQS4D46eOOe6MnP_TLNjdTjgWzOdtTlTYD503WbLTq5cyIRXZ1P45XzSdWNtElPjpEXbG2JtU39YztvA8aJxadh8ZdyriaplyyMYyg4PYO-Q7oHemnb3_qie_9Qn748Mm9eL_VBtUwuDtS98WU7KyoHcxi_g09ezBZ2ZY6GmZ0RdmxxntGNVI21mBtcafBN7uNgvdTnKON5fia7X1v5jzfNMREdyu8hLkesZcjPDHbQZp5SBndw8-DQcffY7U5FN_nJPbbe-DWdw_Smud3U67svJQ7RlvzvwgcHLI9RT9TZ64Hp6YGvit9H9ToHKx-i0BRP2YMKgdmzAhFdgwi2Y3mEHJeyhhAEc2EEJeyg9Qd8_DE8OPwa2E0cgCBk0AeWMUwp_YiphPSrApYyyMQ9jIccpTZSuYRjrttWcCRBTJOMqFUwkaUEokSomT9FGPa3VM4RhLCzgNRaMJQkPJUiailB0TDJYa3fQW_fScmHL1OtuKVV-k6p20Bt_xcyUaPmLLHF6yK2LaVzHHAB141W7t7jDc3RvhfIXaKO5XKiX6K64asr55SuLpt_znZqC
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Value+iteration+for+simple+stochastic+games%3A+Stopping+criterion+and+learning+algorithm&rft.jtitle=Information+and+computation&rft.au=Eisentraut%2C+Julia&rft.au=Kelmendi%2C+Edon&rft.au=K%C5%99et%C3%ADnsk%C3%BD%2C+Jan&rft.au=Weininger%2C+Maximilian&rft.date=2022-05-01&rft.issn=0890-5401&rft.volume=285&rft.spage=104886&rft_id=info:doi/10.1016%2Fj.ic.2022.104886&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_ic_2022_104886
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0890-5401&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0890-5401&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0890-5401&client=summon