Quantile Markov Decision Processes

The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In th...

Full description

Saved in:
Bibliographic Details
Published in:Operations research Vol. 70; no. 3; p. 1428
Main Authors: Li, Xiaocheng, Zhong, Huaiyang, Brandeau, Margaret L
Format: Journal Article
Language:English
Published: United States 01.05.2022
Subjects:
ISSN:0030-364X
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper we consider the problem of optimizing the quantiles of the cumulative rewards of a Markov decision process (MDP), which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk (CVaR) objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, where patients aim to balance the potential benefits and risks of the treatment.
AbstractList The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper we consider the problem of optimizing the quantiles of the cumulative rewards of a Markov decision process (MDP), which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk (CVaR) objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, where patients aim to balance the potential benefits and risks of the treatment.
The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper we consider the problem of optimizing the quantiles of the cumulative rewards of a Markov decision process (MDP), which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk (CVaR) objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, where patients aim to balance the potential benefits and risks of the treatment.The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper we consider the problem of optimizing the quantiles of the cumulative rewards of a Markov decision process (MDP), which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk (CVaR) objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, where patients aim to balance the potential benefits and risks of the treatment.
Author Brandeau, Margaret L
Li, Xiaocheng
Zhong, Huaiyang
Author_xml – sequence: 1
  givenname: Xiaocheng
  surname: Li
  fullname: Li, Xiaocheng
  organization: Department of Management Science and Engineering, Stanford University, Stanford, CA, 94305
– sequence: 2
  givenname: Huaiyang
  surname: Zhong
  fullname: Zhong, Huaiyang
  organization: Department of Management Science and Engineering, Stanford University, Stanford, CA, 94305
– sequence: 3
  givenname: Margaret L
  surname: Brandeau
  fullname: Brandeau, Margaret L
  organization: Department of Management Science and Engineering, Stanford University, Stanford, CA, 94305
BackLink https://www.ncbi.nlm.nih.gov/pubmed/36034163$$D View this record in MEDLINE/PubMed
BookMark eNo1z0tLxDAUBeAsRpyHbl1KceWm9ebRplnKOD5gRAUFdyVNbqDaNjVpBf-9BcfV2Xycw1mTRe97JOSMQkZZKa_8EDBjwGjGKOMLsgLgkPJCvC_JOsYPAFB5kR-TJS-AC1rwFbl4mXQ_Ni0mjzp8-u_kBk0TG98nz8EbjBHjCTlyuo14esgNebvdvW7v0_3T3cP2ep-aXMgxVVoqRLRWUu20MEKUEq1BR7V1Us_bJVinBHUgXF4rVYASXMpSCi2gRrYhl3-9Q_BfE8ax6pposG11j36KFZMwY56DnOn5gU51h7YaQtPp8FP9_2K_MFxOdA
CitedBy_id crossref_primary_10_1007_s10479_025_06738_x
crossref_primary_10_1016_j_automatica_2025_112318
crossref_primary_10_1007_s11081_023_09800_4
ContentType Journal Article
DBID NPM
7X8
DOI 10.1287/opre.2021.2123
DatabaseName PubMed
MEDLINE - Academic
DatabaseTitle PubMed
MEDLINE - Academic
DatabaseTitleList PubMed
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Engineering
Sciences (General)
Computer Science
Business
ExternalDocumentID 36034163
Genre Journal Article
GrantInformation_xml – fundername: NIDA NIH HHS
  grantid: R37 DA015612
– fundername: NIDA NIH HHS
  grantid: R01 DA015612
GroupedDBID -DZ
-~X
123
18M
1OL
29N
2AX
4.4
7WY
7X7
85S
88E
8AO
8FE
8FG
8FI
8FJ
8FL
8G5
8V8
AAAZS
AABCJ
AAWTO
AAXLS
ABAWQ
ABBHK
ABDPE
ABEFU
ABJCF
ABKVW
ABLWH
ABPPZ
ABUWG
ABWPA
ABXSQ
ABYYQ
ACGFO
ACHJO
ACHQT
ACIWK
ACNCT
ACXJH
ADBBV
ADEPB
ADGDI
ADMHC
ADMHG
ADNFJ
ADNWM
ADULT
AEGXH
AEMOZ
AENEX
AEUPB
AFAIT
AFFNX
AFKRA
AFTQD
AGKTX
AHAJD
AHQJS
AIAGR
AKVCP
ALIPV
ALMA_UNASSIGNED_HOLDINGS
APTMU
ARAPS
ASMEE
AZQEC
BAAKF
BENPR
BEZIV
BGLVJ
BPHCQ
BVXVI
CBXGM
CCKSF
CCPQU
CS3
CYVLN
DWQXO
EBA
EBE
EBO
EBR
EBS
EBU
EJD
EMI
F5P
FRNLG
FYUFA
GENNL
GNUQQ
GROUPED_ABI_INFORM_RESEARCH
GUPYA
GUQSH
HCIFZ
HF~
HGD
HMCUK
HVGLF
H~9
IAO
ICJ
IEA
IOF
IPSME
ITC
JAAYA
JAV
JBC
JBMMH
JBZCM
JENOY
JHFFW
JKQEH
JLEZI
JLXEF
JPL
JPPEU
JST
K1G
K60
K6V
K6~
K7-
L6V
M0C
M1P
M1Q
M2O
M7S
MV1
N95
NIEAY
NPM
P2P
P62
PHGZM
PHGZT
PJZUB
PPXIY
PQBIZ
PQBZA
PQGLB
PQQKQ
PRG
PROAC
PSQYO
PTHSS
RNS
RPU
SA0
TAE
TH9
TN5
U5U
UKHRP
WH7
XJT
XOL
XSW
Y99
YHZ
YNT
YYP
ZCG
ZY4
~02
~92
7X8
ID FETCH-LOGICAL-c547t-9a79eeedd71afa4c4487edcef1adf7a00980df941f04f5b996094377874a40be2
IEDL.DBID 7X8
ISICitedReferencesCount 8
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000731911500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0030-364X
IngestDate Thu Oct 02 10:43:22 EDT 2025
Mon Jul 21 06:07:55 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords Markov Decision Process
Dynamic Programming
Risk Measure
Quantile
Medical Decision Making
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c547t-9a79eeedd71afa4c4487edcef1adf7a00980df941f04f5b996094377874a40be2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/9401554
PMID 36034163
PQID 2707873507
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2707873507
pubmed_primary_36034163
PublicationCentury 2000
PublicationDate 2022-05-01
PublicationDateYYYYMMDD 2022-05-01
PublicationDate_xml – month: 05
  year: 2022
  text: 2022-05-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Operations research
PublicationTitleAlternate Oper Res
PublicationYear 2022
SSID ssj0009565
Score 2.4764657
Snippet The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 1428
Title Quantile Markov Decision Processes
URI https://www.ncbi.nlm.nih.gov/pubmed/36034163
https://www.proquest.com/docview/2707873507
Volume 70
WOSCitedRecordID wos000731911500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEB7UFdGD69bX-qKKBz2U7SNt2pOIunhxWUGht5KmCXhpV7u7v9-ZNsW9CIKXHgqFdDKZzCTffB_AtWTU36lDx1Nh6GCU1I6QBBPTAr1BhFGe6EZsgk8mcZomU3PgVhtYZRcTm0BdVJLOyEc-0dLwANOXu9mnQ6pRdLtqJDTWoRdgKkOQLp7GK6S7UatgEGCsiVhqSBuxSBiRxAhWhz6WiB5JFf2WXjbbzLj_3wHuwa5JMO371iMGsKZKC7Y6fLsF_U7HwTbL2oKdFVJCCwbmfW3fGE7q2324el3gFGAEsam5p1raj0abxzadBqo-gPfx09vDs2PkFRwZMj53EpwNhVtkwT2hBZNYqHHChGpPFJoLYhp1C50wT7s4mTnRuCQs4PiTTDA3V_4hbJRVqY7B9r08KQTPZeJKFsRxzhSPiiiSrFCx8vMhXHY2y9B96U5ClKpa1NmP1YZw1Bo-m7U8G1kQuQHliyd_-PoUtn1qTGigiGfQ07h41TlsyuX8o_66aPwCn5PpyzfYycE5
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Quantile+Markov+Decision+Processes&rft.jtitle=Operations+research&rft.au=Li%2C+Xiaocheng&rft.au=Zhong%2C+Huaiyang&rft.au=Brandeau%2C+Margaret+L&rft.date=2022-05-01&rft.issn=0030-364X&rft.volume=70&rft.issue=3&rft.spage=1428&rft_id=info:doi/10.1287%2Fopre.2021.2123&rft_id=info%3Apmid%2F36034163&rft_id=info%3Apmid%2F36034163&rft.externalDocID=36034163
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0030-364X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0030-364X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0030-364X&client=summon