Quantile Markov Decision Processes
The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In th...
Saved in:
| Published in: | Operations research Vol. 70; no. 3; p. 1428 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
01.05.2022
|
| Subjects: | |
| ISSN: | 0030-364X |
| Online Access: | Get more information |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper we consider the problem of optimizing the quantiles of the cumulative rewards of a Markov decision process (MDP), which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk (CVaR) objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, where patients aim to balance the potential benefits and risks of the treatment. |
|---|---|
| AbstractList | The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper we consider the problem of optimizing the quantiles of the cumulative rewards of a Markov decision process (MDP), which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk (CVaR) objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, where patients aim to balance the potential benefits and risks of the treatment. The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper we consider the problem of optimizing the quantiles of the cumulative rewards of a Markov decision process (MDP), which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk (CVaR) objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, where patients aim to balance the potential benefits and risks of the treatment.The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper we consider the problem of optimizing the quantiles of the cumulative rewards of a Markov decision process (MDP), which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk (CVaR) objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, where patients aim to balance the potential benefits and risks of the treatment. |
| Author | Brandeau, Margaret L Li, Xiaocheng Zhong, Huaiyang |
| Author_xml | – sequence: 1 givenname: Xiaocheng surname: Li fullname: Li, Xiaocheng organization: Department of Management Science and Engineering, Stanford University, Stanford, CA, 94305 – sequence: 2 givenname: Huaiyang surname: Zhong fullname: Zhong, Huaiyang organization: Department of Management Science and Engineering, Stanford University, Stanford, CA, 94305 – sequence: 3 givenname: Margaret L surname: Brandeau fullname: Brandeau, Margaret L organization: Department of Management Science and Engineering, Stanford University, Stanford, CA, 94305 |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/36034163$$D View this record in MEDLINE/PubMed |
| BookMark | eNo1z0tLxDAUBeAsRpyHbl1KceWm9ebRplnKOD5gRAUFdyVNbqDaNjVpBf-9BcfV2Xycw1mTRe97JOSMQkZZKa_8EDBjwGjGKOMLsgLgkPJCvC_JOsYPAFB5kR-TJS-AC1rwFbl4mXQ_Ni0mjzp8-u_kBk0TG98nz8EbjBHjCTlyuo14esgNebvdvW7v0_3T3cP2ep-aXMgxVVoqRLRWUu20MEKUEq1BR7V1Us_bJVinBHUgXF4rVYASXMpSCi2gRrYhl3-9Q_BfE8ax6pposG11j36KFZMwY56DnOn5gU51h7YaQtPp8FP9_2K_MFxOdA |
| CitedBy_id | crossref_primary_10_1007_s10479_025_06738_x crossref_primary_10_1016_j_automatica_2025_112318 crossref_primary_10_1007_s11081_023_09800_4 |
| ContentType | Journal Article |
| DBID | NPM 7X8 |
| DOI | 10.1287/opre.2021.2123 |
| DatabaseName | PubMed MEDLINE - Academic |
| DatabaseTitle | PubMed MEDLINE - Academic |
| DatabaseTitleList | PubMed MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Engineering Sciences (General) Computer Science Business |
| ExternalDocumentID | 36034163 |
| Genre | Journal Article |
| GrantInformation_xml | – fundername: NIDA NIH HHS grantid: R37 DA015612 – fundername: NIDA NIH HHS grantid: R01 DA015612 |
| GroupedDBID | -DZ -~X 123 18M 1OL 29N 2AX 4.4 7WY 7X7 85S 88E 8AO 8FE 8FG 8FI 8FJ 8FL 8G5 8V8 AAAZS AABCJ AAWTO AAXLS ABAWQ ABBHK ABDPE ABEFU ABJCF ABKVW ABLWH ABPPZ ABUWG ABWPA ABXSQ ABYYQ ACGFO ACHJO ACHQT ACIWK ACNCT ACXJH ADBBV ADEPB ADGDI ADMHC ADMHG ADNFJ ADNWM ADULT AEGXH AEMOZ AENEX AEUPB AFAIT AFFNX AFKRA AFTQD AGKTX AHAJD AHQJS AIAGR AKVCP ALIPV ALMA_UNASSIGNED_HOLDINGS APTMU ARAPS ASMEE AZQEC BAAKF BENPR BEZIV BGLVJ BPHCQ BVXVI CBXGM CCKSF CCPQU CS3 CYVLN DWQXO EBA EBE EBO EBR EBS EBU EJD EMI F5P FRNLG FYUFA GENNL GNUQQ GROUPED_ABI_INFORM_RESEARCH GUPYA GUQSH HCIFZ HF~ HGD HMCUK HVGLF H~9 IAO ICJ IEA IOF IPSME ITC JAAYA JAV JBC JBMMH JBZCM JENOY JHFFW JKQEH JLEZI JLXEF JPL JPPEU JST K1G K60 K6V K6~ K7- L6V M0C M1P M1Q M2O M7S MV1 N95 NIEAY NPM P2P P62 PHGZM PHGZT PJZUB PPXIY PQBIZ PQBZA PQGLB PQQKQ PRG PROAC PSQYO PTHSS RNS RPU SA0 TAE TH9 TN5 U5U UKHRP WH7 XJT XOL XSW Y99 YHZ YNT YYP ZCG ZY4 ~02 ~92 7X8 |
| ID | FETCH-LOGICAL-c547t-9a79eeedd71afa4c4487edcef1adf7a00980df941f04f5b996094377874a40be2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 8 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000731911500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0030-364X |
| IngestDate | Thu Oct 02 10:43:22 EDT 2025 Mon Jul 21 06:07:55 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Keywords | Markov Decision Process Dynamic Programming Risk Measure Quantile Medical Decision Making |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c547t-9a79eeedd71afa4c4487edcef1adf7a00980df941f04f5b996094377874a40be2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/9401554 |
| PMID | 36034163 |
| PQID | 2707873507 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2707873507 pubmed_primary_36034163 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-05-01 |
| PublicationDateYYYYMMDD | 2022-05-01 |
| PublicationDate_xml | – month: 05 year: 2022 text: 2022-05-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Operations research |
| PublicationTitleAlternate | Oper Res |
| PublicationYear | 2022 |
| SSID | ssj0009565 |
| Score | 2.4764657 |
| Snippet | The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 1428 |
| Title | Quantile Markov Decision Processes |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/36034163 https://www.proquest.com/docview/2707873507 |
| Volume | 70 |
| WOSCitedRecordID | wos000731911500001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEB7UFdGD69bX-qKKBz2U7SNt2pOIunhxWUGht5KmCXhpV7u7v9-ZNsW9CIKXHgqFdDKZzCTffB_AtWTU36lDx1Nh6GCU1I6QBBPTAr1BhFGe6EZsgk8mcZomU3PgVhtYZRcTm0BdVJLOyEc-0dLwANOXu9mnQ6pRdLtqJDTWoRdgKkOQLp7GK6S7UatgEGCsiVhqSBuxSBiRxAhWhz6WiB5JFf2WXjbbzLj_3wHuwa5JMO371iMGsKZKC7Y6fLsF_U7HwTbL2oKdFVJCCwbmfW3fGE7q2324el3gFGAEsam5p1raj0abxzadBqo-gPfx09vDs2PkFRwZMj53EpwNhVtkwT2hBZNYqHHChGpPFJoLYhp1C50wT7s4mTnRuCQs4PiTTDA3V_4hbJRVqY7B9r08KQTPZeJKFsRxzhSPiiiSrFCx8vMhXHY2y9B96U5ClKpa1NmP1YZw1Bo-m7U8G1kQuQHliyd_-PoUtn1qTGigiGfQ07h41TlsyuX8o_66aPwCn5PpyzfYycE5 |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Quantile+Markov+Decision+Processes&rft.jtitle=Operations+research&rft.au=Li%2C+Xiaocheng&rft.au=Zhong%2C+Huaiyang&rft.au=Brandeau%2C+Margaret+L&rft.date=2022-05-01&rft.issn=0030-364X&rft.volume=70&rft.issue=3&rft.spage=1428&rft_id=info:doi/10.1287%2Fopre.2021.2123&rft_id=info%3Apmid%2F36034163&rft_id=info%3Apmid%2F36034163&rft.externalDocID=36034163 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0030-364X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0030-364X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0030-364X&client=summon |