Reinforcement learning with dynamic convex risk measures
We develop an approach for solving time‐consistent risk‐sensitive stochastic optimization problems using model‐free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time‐consistent dynamic pr...
Gespeichert in:
| Veröffentlicht in: | Mathematical finance Jg. 34; H. 2; S. 557 - 587 |
|---|---|
| Hauptverfasser: | , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Oxford
Blackwell Publishing Ltd
01.04.2024
|
| Schlagworte: | |
| ISSN: | 0960-1627, 1467-9965 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | We develop an approach for solving time‐consistent risk‐sensitive stochastic optimization problems using model‐free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time‐consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor–critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control. |
|---|---|
| AbstractList | We develop an approach for solving time‐consistent risk‐sensitive stochastic optimization problems using model‐free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time‐consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor–critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control. We develop an approach for solving time‐consistent risk‐sensitive stochastic optimization problems using model‐free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time‐consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor–critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control. |
| Author | Coache, Anthony Jaimungal, Sebastian |
| Author_xml | – sequence: 1 givenname: Anthony surname: Coache fullname: Coache, Anthony organization: Department of Statistical Sciences University of Toronto Toronto Canada – sequence: 2 givenname: Sebastian orcidid: 0000-0002-0193-0993 surname: Jaimungal fullname: Jaimungal, Sebastian organization: Department of Statistical Sciences University of Toronto Toronto Canada, Oxford‐Man Institute University of Oxford Oxford United Kingdom |
| BookMark | eNptkM1KAzEYRYNUsK1ufIIBd8LU_E0yWUrRKhQE0XXI5EdTO0lNUrVv79S6Er_N3Zx7PzgTMAoxWADOEZyh4a565fwMYdK2R2CMKOO1EKwZgTEUDNaIYX4CJjmvIISUUj4G7aP1wcWkbW9DqdZWpeDDS_Xpy2tldkH1Xlc6hg_7VSWf36reqrxNNp-CY6fW2Z795hQ83948ze_q5cPifn69rDVpm1Ir55gynelcKyDFSCNKLe9YgwzvWoUa4YhWUJAhkKUaI8WN4VgRAQ2hhkzBxWF3k-L71uYiV3GbwvBSYkFaTAQlbKDggdIp5pysk9oXVXwMJSm_lgjKvR-59yN__AyVyz-VTfK9Srv_4G_2vWlK |
| CitedBy_id | crossref_primary_10_3390_math13060964 crossref_primary_10_1109_TBDATA_2023_3338011 crossref_primary_10_1016_j_engappai_2024_108599 crossref_primary_10_1080_14697688_2023_2244531 crossref_primary_10_3934_jimo_2025123 crossref_primary_10_1287_opre_2023_0299 crossref_primary_10_1007_s00186_024_00857_0 crossref_primary_10_1007_s10479_025_06701_w crossref_primary_10_1016_j_eswa_2025_128573 crossref_primary_10_1287_moor_2023_0211 |
| Cites_doi | 10.1111/j.1467-9965.2006.00277.x 10.1073/pnas.1718942115 10.1111/1467-9965.00068 10.1109/ICASSP40776.2020.9053001 10.1287/opre.1080.0685 10.1287/moor.2021.1187 10.1093/rfs/6.2.327 10.1016/j.spa.2004.03.004 10.1016/j.ejor.2021.04.030 10.1007/s40304-017-0117-6 10.1137/1.9781611973433 10.1162/NECO_a_00600 10.1007/s00780-005-0159-6 10.1090/psapm/078/06 10.1017/S0962492900002919 10.1080/1350486X.2022.2136727 10.1609/aaai.v35i13.17393 10.1109/IROS40897.2019.8967699 10.1137/1119062 10.1038/nature16961 10.1016/j.sorms.2013.03.001 10.1016/j.frl.2008.10.002 10.3390/risks8040101 10.1007/s00780-021-00467-2 10.1038/nature14539 10.1080/14697688.2020.1817974 10.1016/0893-6080(91)90009-T 10.1007/s10107-010-0393-3 10.1111/1468-0262.00296 10.1080/17442508.2015.1026346 10.1109/TAC.2016.2644871 10.1137/20M1382386 10.1007/s00186-021-00746-w 10.1214/14-AIHP664 10.1007/BF02551274 10.1007/BF02985423 10.21314/JOR.2000.038 10.1080/15326340600878016 10.1109/TAC.2018.2790261 10.1007/s007800200072 10.1109/TSMCC.2012.2218595 10.1016/S0893-6080(05)80131-5 10.1007/978-3-642-18412-3_1 |
| ContentType | Journal Article |
| Copyright | 2023. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: 2023. This article is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | AAYXX CITATION 8BJ FQK JBE JQ2 |
| DOI | 10.1111/mafi.12388 |
| DatabaseName | CrossRef International Bibliography of the Social Sciences (IBSS) International Bibliography of the Social Sciences International Bibliography of the Social Sciences ProQuest Computer Science Collection |
| DatabaseTitle | CrossRef International Bibliography of the Social Sciences (IBSS) ProQuest Computer Science Collection |
| DatabaseTitleList | CrossRef International Bibliography of the Social Sciences (IBSS) |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Mathematics Business |
| EISSN | 1467-9965 |
| EndPage | 587 |
| ExternalDocumentID | 10_1111_mafi_12388 |
| GroupedDBID | -~X .3N .GA .Y3 05W 0R~ 10A 1OB 1OC 1OL 29M 31~ 33P 4.4 50Y 50Z 51W 51Y 52M 52O 52Q 52S 52T 52U 52W 5GY 5HH 5LA 5VS 66C 702 7PT 8-0 8-1 8-3 8-4 8-5 8UM 8VB 930 A04 AABCJ AABNI AAESR AAHQN AAMMB AAMNL AANHP AAONW AAOUF AASGY AAXRX AAYCA AAYXX AAZKR ABCQN ABCUV ABEML ABJNI ABLJU ABPVW ABSOO ACAHQ ACBKW ACBWZ ACCZN ACGFO ACGFS ACHQT ACIWK ACPOU ACRPL ACSCC ACXQS ACYXJ ADBBV ADEMA ADEOM ADIZJ ADKYN ADMGS ADNMO ADXAS ADZMN AEFGJ AEGXH AEIGN AEIMD AEMOZ AENEX AEUYR AEYWJ AFBPY AFEBI AFFPM AFGKR AFKFF AFWVQ AFZJQ AGHNM AGQPQ AGXDD AHBTC AHEFC AHQJS AIAGR AIDQK AIDYY AIQQE AIURR AKVCP ALAGY ALMA_UNASSIGNED_HOLDINGS ALVPJ AMBMR AMVHM AMYDB ASPBG ASTYK AVWKF AZBYB AZFZN AZVAB BAFTC BDRZF BFHJK BKOMP BMXJE BNVMJ BQESF BROTX BRXPI BY8 CAG CITATION COF CS3 D-C D-D DC6 DCZOG DJZPD DPXWK DR2 DRFUL DRSSH DU5 EBA EBE EBO EBR EBS EBU EJD EMK EOH F00 F01 FEDTE FZ0 G-S G.N G50 GODZA HGLYW HVGLF HZI HZ~ IHE IX1 J0M K1G K48 LATKE LC2 LC4 LEEKS LH4 LITHE LOXES LP6 LP7 LUTES LW6 LYRES MEWTI MK4 MRFUL MRSSH MSFUL MSSSH MXFUL MXSSH N04 N06 N9A NF~ O66 O8X O9- OIG P2P P2W P2Y P4C PALCI PQQKQ Q.N Q11 QB0 QWB R.K RIWAO RJQFR ROL RX1 SAMSI SUPJJ TH9 TN5 U5U UB1 V8K W8V W99 WBKPD WEBCB WIH WII WOHZO WQZ WSUWO WXSBR XG1 ZL0 ZZTAW ~IA ~WP 8BJ FQK JBE JQ2 |
| ID | FETCH-LOGICAL-c385t-aff6adbdbf890421c144e7b651d7b8a159f3ca093f3c1e4c21a7dd72a390d34d3 |
| ISICitedReferencesCount | 12 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000971842000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0960-1627 |
| IngestDate | Sat Nov 08 03:29:17 EST 2025 Sat Nov 29 01:48:50 EST 2025 Tue Nov 18 22:39:59 EST 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c385t-aff6adbdbf890421c144e7b651d7b8a159f3ca093f3c1e4c21a7dd72a390d34d3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-0193-0993 |
| OpenAccessLink | https://onlinelibrary.wiley.com/doi/pdfdirect/10.1111/mafi.12388 |
| PQID | 2938239436 |
| PQPubID | 31855 |
| PageCount | 31 |
| ParticipantIDs | proquest_journals_2938239436 crossref_citationtrail_10_1111_mafi_12388 crossref_primary_10_1111_mafi_12388 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-04-01 |
| PublicationDateYYYYMMDD | 2024-04-01 |
| PublicationDate_xml | – month: 04 year: 2024 text: 2024-04-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Oxford |
| PublicationPlace_xml | – name: Oxford |
| PublicationTitle | Mathematical finance |
| PublicationYear | 2024 |
| Publisher | Blackwell Publishing Ltd |
| Publisher_xml | – name: Blackwell Publishing Ltd |
| References | e_1_2_11_32_1 e_1_2_11_55_1 e_1_2_11_30_1 e_1_2_11_57_1 Goodfellow I. (e_1_2_11_29_1) 2016 e_1_2_11_36_1 Osogami T. (e_1_2_11_49_1) 2012; 25 e_1_2_11_51_1 e_1_2_11_13_1 e_1_2_11_34_1 e_1_2_11_53_1 e_1_2_11_11_1 e_1_2_11_6_1 e_1_2_11_27_1 e_1_2_11_4_1 e_1_2_11_48_1 e_1_2_11_2_1 e_1_2_11_60_1 e_1_2_11_20_1 e_1_2_11_45_1 Agarwal A. (e_1_2_11_3_1) 2021; 22 e_1_2_11_47_1 e_1_2_11_68_1 e_1_2_11_24_1 e_1_2_11_41_1 e_1_2_11_8_1 e_1_2_11_22_1 e_1_2_11_43_1 e_1_2_11_15_1 e_1_2_11_59_1 e_1_2_11_38_1 e_1_2_11_19_1 Tamar A. (e_1_2_11_64_1) 2015 e_1_2_11_50_1 e_1_2_11_10_1 e_1_2_11_31_1 e_1_2_11_56_1 e_1_2_11_58_1 e_1_2_11_14_1 e_1_2_11_35_1 e_1_2_11_52_1 e_1_2_11_12_1 e_1_2_11_33_1 e_1_2_11_54_1 e_1_2_11_7_1 e_1_2_11_28_1 e_1_2_11_5_1 Chu S. (e_1_2_11_17_1) 2014; 87 e_1_2_11_26_1 Sutton R. S. (e_1_2_11_62_1) 2018 Weber S. (e_1_2_11_66_1) 2006; 16 Chow Y. (e_1_2_11_16_1) 2017; 18 e_1_2_11_61_1 e_1_2_11_21_1 e_1_2_11_44_1 e_1_2_11_67_1 Kose U. (e_1_2_11_42_1) 2021; 22 e_1_2_11_46_1 e_1_2_11_25_1 e_1_2_11_40_1 e_1_2_11_63_1 e_1_2_11_9_1 e_1_2_11_23_1 e_1_2_11_65_1 e_1_2_11_18_1 e_1_2_11_37_1 e_1_2_11_39_1 |
| References_xml | – start-page: 1468 volume-title: Advances in neural information processing systems year: 2015 ident: e_1_2_11_64_1 – ident: e_1_2_11_41_1 – ident: e_1_2_11_5_1 – volume: 16 start-page: 419 issue: 2 year: 2006 ident: e_1_2_11_66_1 article-title: Distribution‐invariant risk measures, information, and dynamic consistency publication-title: Mathematical Finance: An International Journal of Mathematics, Statistics and Financial Economics doi: 10.1111/j.1467-9965.2006.00277.x – ident: e_1_2_11_32_1 doi: 10.1073/pnas.1718942115 – ident: e_1_2_11_6_1 doi: 10.1111/1467-9965.00068 – ident: e_1_2_11_39_1 doi: 10.1109/ICASSP40776.2020.9053001 – ident: e_1_2_11_63_1 – ident: e_1_2_11_20_1 doi: 10.1287/opre.1080.0685 – ident: e_1_2_11_7_1 doi: 10.1287/moor.2021.1187 – volume: 22 start-page: 1 issue: 38 year: 2021 ident: e_1_2_11_42_1 article-title: Risk‐averse learning by temporal difference methods with Markov risk measures publication-title: Journal of Machine Learning Research – ident: e_1_2_11_10_1 – ident: e_1_2_11_33_1 doi: 10.1093/rfs/6.2.327 – ident: e_1_2_11_55_1 doi: 10.1016/j.spa.2004.03.004 – volume: 22 start-page: 1 issue: 98 year: 2021 ident: e_1_2_11_3_1 article-title: On the theory of policy gradient methods: Optimality, approximation, and distribution shift publication-title: Journal of Machine Learning Research – volume: 18 start-page: 6070 issue: 1 year: 2017 ident: e_1_2_11_16_1 article-title: Risk‐constrained reinforcement learning with percentile risk criteria publication-title: Journal of Machine Learning Research – ident: e_1_2_11_40_1 – ident: e_1_2_11_9_1 doi: 10.1016/j.ejor.2021.04.030 – volume-title: Deep learning year: 2016 ident: e_1_2_11_29_1 – ident: e_1_2_11_54_1 – ident: e_1_2_11_67_1 doi: 10.1007/s40304-017-0117-6 – ident: e_1_2_11_59_1 doi: 10.1137/1.9781611973433 – ident: e_1_2_11_60_1 doi: 10.1162/NECO_a_00600 – ident: e_1_2_11_21_1 doi: 10.1007/s00780-005-0159-6 – ident: e_1_2_11_13_1 doi: 10.1090/psapm/078/06 – ident: e_1_2_11_52_1 doi: 10.1017/S0962492900002919 – ident: e_1_2_11_14_1 doi: 10.1080/1350486X.2022.2136727 – ident: e_1_2_11_4_1 doi: 10.1609/aaai.v35i13.17393 – ident: e_1_2_11_47_1 doi: 10.1109/IROS40897.2019.8967699 – ident: e_1_2_11_28_1 – ident: e_1_2_11_46_1 doi: 10.1137/1119062 – ident: e_1_2_11_61_1 doi: 10.1038/nature16961 – ident: e_1_2_11_56_1 doi: 10.1016/j.sorms.2013.03.001 – ident: e_1_2_11_36_1 – ident: e_1_2_11_15_1 doi: 10.1016/j.frl.2008.10.002 – ident: e_1_2_11_18_1 doi: 10.3390/risks8040101 – volume-title: Reinforcement learning: An introduction year: 2018 ident: e_1_2_11_62_1 – ident: e_1_2_11_38_1 doi: 10.1007/s00780-021-00467-2 – ident: e_1_2_11_43_1 doi: 10.1038/nature14539 – ident: e_1_2_11_35_1 doi: 10.1080/14697688.2020.1817974 – ident: e_1_2_11_34_1 doi: 10.1016/0893-6080(91)90009-T – ident: e_1_2_11_58_1 doi: 10.1007/s10107-010-0393-3 – ident: e_1_2_11_45_1 doi: 10.1111/1468-0262.00296 – volume: 25 start-page: 233 year: 2012 ident: e_1_2_11_49_1 article-title: Robustness and risk‐sensitivity in Markov decision processes publication-title: Advances in Neural Information Processing Systems – ident: e_1_2_11_12_1 – ident: e_1_2_11_51_1 – ident: e_1_2_11_27_1 – ident: e_1_2_11_11_1 doi: 10.1080/17442508.2015.1026346 – ident: e_1_2_11_37_1 – ident: e_1_2_11_23_1 – ident: e_1_2_11_48_1 – ident: e_1_2_11_65_1 doi: 10.1109/TAC.2016.2644871 – ident: e_1_2_11_31_1 doi: 10.1137/20M1382386 – ident: e_1_2_11_8_1 doi: 10.1007/s00186-021-00746-w – ident: e_1_2_11_24_1 doi: 10.1214/14-AIHP664 – ident: e_1_2_11_19_1 doi: 10.1007/BF02551274 – ident: e_1_2_11_26_1 doi: 10.1007/BF02985423 – ident: e_1_2_11_57_1 doi: 10.21314/JOR.2000.038 – ident: e_1_2_11_50_1 – ident: e_1_2_11_22_1 doi: 10.1080/15326340600878016 – ident: e_1_2_11_53_1 – ident: e_1_2_11_68_1 doi: 10.1109/TAC.2018.2790261 – ident: e_1_2_11_25_1 doi: 10.1007/s007800200072 – ident: e_1_2_11_30_1 doi: 10.1109/TSMCC.2012.2218595 – volume: 87 start-page: 2286 issue: 11 year: 2014 ident: e_1_2_11_17_1 article-title: Markov decision processes with iterated coherent risk measures publication-title: International Journal of Control – ident: e_1_2_11_44_1 doi: 10.1016/S0893-6080(05)80131-5 – ident: e_1_2_11_2_1 doi: 10.1007/978-3-642-18412-3_1 |
| SSID | ssj0004447 |
| Score | 2.4736001 |
| Snippet | We develop an approach for solving time‐consistent risk‐sensitive stochastic optimization problems using model‐free reinforcement learning (RL). Specifically,... |
| SourceID | proquest crossref |
| SourceType | Aggregation Database Enrichment Source Index Database |
| StartPage | 557 |
| SubjectTerms | Algorithms Arbitrage Dynamic programming Flexibility Hedging Learning Machine learning Neural networks Obstacle avoidance Optimization Policies Random variables Reinforcement Risk Risk assessment Robot control |
| Title | Reinforcement learning with dynamic convex risk measures |
| URI | https://www.proquest.com/docview/2938239436 |
| Volume | 34 |
| WOSCitedRecordID | wos000971842000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVWIB databaseName: Wiley Online Library Full Collection 2020 customDbUrl: eissn: 1467-9965 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0004447 issn: 0960-1627 databaseCode: DRFUL dateStart: 19970101 isFulltext: true titleUrlDefault: https://onlinelibrary.wiley.com providerName: Wiley-Blackwell |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1JS8NAFH5oFdGDuOJOQC8eIkkmycwcxQUPKiIK3sJkMoGCjcVU6c_3zZI0dQE9eElLyELnm76N974P4IiLMkwTrvxQRNTHiFj6QoXSZ5rJRIpI5MKollzT21v29MTvnMZmbeQEaFWx8ZgP_xVqPIdg69HZP8DdPhRP4HcEHY8IOx5_Bfy9MmSo0tT9GlUIV3AtrP687TUf277yga0S1t0w9aYlc9XzjYaUo90AZ3oES3WIB9omHNEfoOkwCgJog9A_jpq958oKUbcbxdUH08APUzu4f6KsddRWFROkpGs-XS2y38lirS1MEtpxq4n1qz9Y7IEo-yfoRK3E3zQt9id31TYRNumLvjcz987CXEQTznowd35_-Xg9GZCNjdJc-5scT61u6Zq8eToymXbMJtp4WIFllyZ4pxbeVZhR1RosNFMKa7A0QaheBzaFuddg7mnMPYe5ZzH3NOZeg_kGPF5ePJxd-U4Sw5eEJSNflGUqirzIS8bR3IYS82FF8zQJC5ozgbFpSaQIOMGPUMUyCgUtChoJwoOCxAXZhF71Uqkt8HJN8xSoGGP6PA6YEhFa_4CUZSApSZXYhuNmOTLp-OK1bMlz9nXht-GwvXZoWVK-vWqvWdXM_WHqDMNNFhEek3TnVw_ZhcXJdt2D3uj1Te3DvHwf9evXAwf8B4u9aTo |
| linkProvider | Wiley-Blackwell |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Reinforcement+learning+with+dynamic+convex+risk+measures&rft.jtitle=Mathematical+finance&rft.au=Coache%2C+Anthony&rft.au=Jaimungal%2C+Sebastian&rft.date=2024-04-01&rft.issn=0960-1627&rft.eissn=1467-9965&rft.volume=34&rft.issue=2&rft.spage=557&rft.epage=587&rft_id=info:doi/10.1111%2Fmafi.12388&rft.externalDBID=n%2Fa&rft.externalDocID=10_1111_mafi_12388 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0960-1627&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0960-1627&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0960-1627&client=summon |