Risk-sensitive reinforcement learning
We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received reward...
Uložené v:
| Vydané v: | Neural computation Ročník 26; číslo 7; s. 1298 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
United States
01.07.2014
|
| Predmet: | |
| ISSN: | 1530-888X, 1530-888X |
| On-line prístup: | Zistit podrobnosti o prístupe |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used. |
|---|---|
| AbstractList | We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used. We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used. |
| Author | Shen, Yun Sommer, Tobias Obermayer, Klaus Tobia, Michael J |
| Author_xml | – sequence: 1 givenname: Yun surname: Shen fullname: Shen, Yun email: yun@ni.tu-berlin.de organization: Technical University, 10587 Berlin, Germany yun@ni.tu-berlin.de – sequence: 2 givenname: Michael J surname: Tobia fullname: Tobia, Michael J – sequence: 3 givenname: Tobias surname: Sommer fullname: Sommer, Tobias – sequence: 4 givenname: Klaus surname: Obermayer fullname: Obermayer, Klaus |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/24708369$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNj0tLxDAURoOMOA_duZbZCG6qN2nSpEsZxgcMDoiCu5AmtxJt0zFpB_z3FhzB1fctDgfOnExCF5CQcwrXlBbs5mm92mqjAQqAIzKjIodMKfU2-fenZJ7SB4wMBXFCpoxLUHlRzsjls0-fWcKQfO_3uIzoQ91Fiy2GftmgicGH91NyXJsm4dlhF-T1bv2yesg22_vH1e0mswJUn3EUtuBQU8kUrSR13JRV7qiwltW15cCVKwFBcKMKqJWTihkB1GEO3JqSLcjVr3cXu68BU69bnyw2jQnYDUmPQbQUUig5ohcHdKhadHoXfWvit_5LYz8gkVDq |
| CitedBy_id | crossref_primary_10_1016_j_sysconle_2021_105009 crossref_primary_10_1109_ACCESS_2024_3486549 crossref_primary_10_1111_risa_14104 crossref_primary_10_1093_heapol_czaf015 crossref_primary_10_1109_LRA_2021_3070252 crossref_primary_10_1016_j_artint_2024_104096 crossref_primary_10_1007_s10462_023_10468_6 crossref_primary_10_1016_j_isatra_2021_06_010 crossref_primary_10_1007_s10458_022_09596_0 crossref_primary_10_3390_a16070325 crossref_primary_10_3390_electronics14153157 crossref_primary_10_1080_14697688_2023_2244531 crossref_primary_10_1109_TAC_2020_2989702 crossref_primary_10_1109_LWC_2024_3430516 crossref_primary_10_1061__ASCE_CP_1943_5487_0000991 crossref_primary_10_1109_TAC_2019_2926674 crossref_primary_10_1111_mafi_12382 crossref_primary_10_1016_j_artint_2022_103743 crossref_primary_10_1177_0278364918772017 crossref_primary_10_3389_fpsyg_2015_01342 crossref_primary_10_1080_01605682_2025_2533265 crossref_primary_10_1162_NECO_a_00887 crossref_primary_10_1007_s40747_024_01621_x crossref_primary_10_1007_s00521_023_09300_7 crossref_primary_10_1145_3603148 crossref_primary_10_1080_01621459_2025_2506197 crossref_primary_10_1146_annurev_control_053018_023634 crossref_primary_10_3390_biomimetics7040193 crossref_primary_10_1109_TNNLS_2017_2654539 crossref_primary_10_1002_mde_3002 crossref_primary_10_1111_mafi_12388 crossref_primary_10_1109_LCSYS_2022_3185404 crossref_primary_10_1109_TIT_2025_3569697 crossref_primary_10_1109_TNNLS_2021_3106818 |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1162/NECO_a_00600 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1530-888X |
| ExternalDocumentID | 24708369 |
| Genre | Letter Correspondence |
| GroupedDBID | --- -~X .4S .DC 0R~ 123 36B 4.4 41~ 53G 6IK AAFWJ AAJGR AALMD ABAZT ABDBF ABDNZ ABEFU ABIVO ABJNI ABVLG ACGFO ACUHS ACYGS ADIYS ADMLS AEGXH AEILP AENEX AIAGR ALMA_UNASSIGNED_HOLDINGS AMVHM ARCSS AVWKF AZFZN BEFXN BFFAM BGNUA BKEBE BPEOZ CAG CGR COF CS3 CUY CVF DU5 EAP EAS EBC EBD EBS ECM ECS EDO EIF EJD EMB EMK EMOBN EPL EPS EST ESX F5P FEDTE FNEHJ HVGLF HZ~ H~9 I-F IPLJI JAVBF MCG MINIK MKJ NPM O9- OCL P2P PK0 PQQKQ RMI SV3 TUS WG8 WH7 XJE ZWS 7X8 ABUFD |
| ID | FETCH-LOGICAL-c508t-4e5c640f17281b71d4a9b3d15cc2ffc4048d90e054a860f8d782a501de304ca92 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 73 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000336876200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1530-888X |
| IngestDate | Sun Nov 09 11:22:51 EST 2025 Mon Jul 21 06:01:39 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 7 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c508t-4e5c640f17281b71d4a9b3d15cc2ffc4048d90e054a860f8d782a501de304ca92 |
| Notes | content type line 23 SourceType-Scholarly Journals-1 ObjectType-Correspondence-1 |
| PMID | 24708369 |
| PQID | 1531957587 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_1531957587 pubmed_primary_24708369 |
| PublicationCentury | 2000 |
| PublicationDate | 2014-07-01 |
| PublicationDateYYYYMMDD | 2014-07-01 |
| PublicationDate_xml | – month: 07 year: 2014 text: 2014-07-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Neural computation |
| PublicationTitleAlternate | Neural Comput |
| PublicationYear | 2014 |
| SSID | ssj0006105 |
| Score | 2.4626815 |
| Snippet | We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 1298 |
| SubjectTerms | Algorithms Brain - physiology Brain Mapping Decision Making - physiology Humans Magnetic Resonance Imaging Markov Chains Models, Psychological Nonlinear Dynamics Oxygen - blood Probability Reinforcement (Psychology) Risk |
| Title | Risk-sensitive reinforcement learning |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/24708369 https://www.proquest.com/docview/1531957587 |
| Volume | 26 |
| WOSCitedRecordID | wos000336876200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LSwMxEB7UevBifVtfrKDH0GSb7G5OIqXFg65FVPa25CmlsK3d6u832Qc9CYKX3AJhdmby7eTj-wBupGCx0ipxyA0zRLmWSAgTIYmZtJpYKyqXiPfHOE2TLOOTZuBWNrTKtidWjVrPlZ-R94lPFoctkvhu8Ym8a5R_XW0sNDahM3BQxmd1nK3VwqOawui2YuT-9LKW-B6F_XQ0fM5F7uVI8O_gsrpkxt3_Hm8Pdht4GdzX-bAPG6Y4gG5r3RA0lXwIty_TcoZKz173_S5YmkpBVVXDwqCxkvg4grfx6HX4gBrHBKQc0FohapiKKLbedIrImGgquBxowpQKrVXUlavm2DiYJpII20Q7fCAYJtoMMFWCh8ewVcwLcwqBIt40ggluCac8shIbYbXDazLhipuwB9dtIHKXkf6ZQRRm_lXm61D04KSOZr6opTPykMZeDpuf_WH3Oew4dEJrbuwFdKyrR3MJ2-p7NS2XV9Wndms6efoB1sG0Lw |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Risk-sensitive+reinforcement+learning&rft.jtitle=Neural+computation&rft.au=Shen%2C+Yun&rft.au=Tobia%2C+Michael+J&rft.au=Sommer%2C+Tobias&rft.au=Obermayer%2C+Klaus&rft.date=2014-07-01&rft.eissn=1530-888X&rft.volume=26&rft.issue=7&rft.spage=1298&rft_id=info:doi/10.1162%2FNECO_a_00600&rft_id=info%3Apmid%2F24708369&rft_id=info%3Apmid%2F24708369&rft.externalDocID=24708369 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1530-888X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1530-888X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1530-888X&client=summon |