Risk-sensitive reinforcement learning

We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received reward...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Neural computation Ročník 26; číslo 7; s. 1298
Hlavní autori: Shen, Yun, Tobia, Michael J, Sommer, Tobias, Obermayer, Klaus
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: United States 01.07.2014
Predmet:
ISSN:1530-888X, 1530-888X
On-line prístup:Zistit podrobnosti o prístupe
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.
AbstractList We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.
We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By applying a utility function to the temporal difference (TD) error, nonlinear transformations are effectively applied not only to the received rewards but also to the true transition probabilities of the underlying Markov decision process. When appropriate utility functions are chosen, the agents' behaviors express key features of human behavior as predicted by prospect theory (Kahneman & Tversky, 1979 ), for example, different risk preferences for gains and losses, as well as the shape of subjective probability curves. We derive a risk-sensitive Q-learning algorithm, which is necessary for modeling human behavior when transition probabilities are unknown, and prove its convergence. As a proof of principle for the applicability of the new framework, we apply it to quantify human behavior in a sequential investment task. We find that the risk-sensitive variant provides a significantly better fit to the behavioral data and that it leads to an interpretation of the subject's responses that is indeed consistent with prospect theory. The analysis of simultaneously measured fMRI signals shows a significant correlation of the risk-sensitive TD error with BOLD signal change in the ventral striatum. In addition we find a significant correlation of the risk-sensitive Q-values with neural activity in the striatum, cingulate cortex, and insula that is not present if standard Q-values are used.
Author Shen, Yun
Sommer, Tobias
Obermayer, Klaus
Tobia, Michael J
Author_xml – sequence: 1
  givenname: Yun
  surname: Shen
  fullname: Shen, Yun
  email: yun@ni.tu-berlin.de
  organization: Technical University, 10587 Berlin, Germany yun@ni.tu-berlin.de
– sequence: 2
  givenname: Michael J
  surname: Tobia
  fullname: Tobia, Michael J
– sequence: 3
  givenname: Tobias
  surname: Sommer
  fullname: Sommer, Tobias
– sequence: 4
  givenname: Klaus
  surname: Obermayer
  fullname: Obermayer, Klaus
BackLink https://www.ncbi.nlm.nih.gov/pubmed/24708369$$D View this record in MEDLINE/PubMed
BookMark eNpNj0tLxDAURoOMOA_duZbZCG6qN2nSpEsZxgcMDoiCu5AmtxJt0zFpB_z3FhzB1fctDgfOnExCF5CQcwrXlBbs5mm92mqjAQqAIzKjIodMKfU2-fenZJ7SB4wMBXFCpoxLUHlRzsjls0-fWcKQfO_3uIzoQ91Fiy2GftmgicGH91NyXJsm4dlhF-T1bv2yesg22_vH1e0mswJUn3EUtuBQU8kUrSR13JRV7qiwltW15cCVKwFBcKMKqJWTihkB1GEO3JqSLcjVr3cXu68BU69bnyw2jQnYDUmPQbQUUig5ohcHdKhadHoXfWvit_5LYz8gkVDq
CitedBy_id crossref_primary_10_1016_j_sysconle_2021_105009
crossref_primary_10_1109_ACCESS_2024_3486549
crossref_primary_10_1111_risa_14104
crossref_primary_10_1093_heapol_czaf015
crossref_primary_10_1109_LRA_2021_3070252
crossref_primary_10_1016_j_artint_2024_104096
crossref_primary_10_1007_s10462_023_10468_6
crossref_primary_10_1016_j_isatra_2021_06_010
crossref_primary_10_1007_s10458_022_09596_0
crossref_primary_10_3390_a16070325
crossref_primary_10_3390_electronics14153157
crossref_primary_10_1080_14697688_2023_2244531
crossref_primary_10_1109_TAC_2020_2989702
crossref_primary_10_1109_LWC_2024_3430516
crossref_primary_10_1061__ASCE_CP_1943_5487_0000991
crossref_primary_10_1109_TAC_2019_2926674
crossref_primary_10_1111_mafi_12382
crossref_primary_10_1016_j_artint_2022_103743
crossref_primary_10_1177_0278364918772017
crossref_primary_10_3389_fpsyg_2015_01342
crossref_primary_10_1080_01605682_2025_2533265
crossref_primary_10_1162_NECO_a_00887
crossref_primary_10_1007_s40747_024_01621_x
crossref_primary_10_1007_s00521_023_09300_7
crossref_primary_10_1145_3603148
crossref_primary_10_1080_01621459_2025_2506197
crossref_primary_10_1146_annurev_control_053018_023634
crossref_primary_10_3390_biomimetics7040193
crossref_primary_10_1109_TNNLS_2017_2654539
crossref_primary_10_1002_mde_3002
crossref_primary_10_1111_mafi_12388
crossref_primary_10_1109_LCSYS_2022_3185404
crossref_primary_10_1109_TIT_2025_3569697
crossref_primary_10_1109_TNNLS_2021_3106818
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1162/NECO_a_00600
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Computer Science
EISSN 1530-888X
ExternalDocumentID 24708369
Genre Letter
Correspondence
GroupedDBID ---
-~X
.4S
.DC
0R~
123
36B
4.4
41~
53G
6IK
AAFWJ
AAJGR
AALMD
ABAZT
ABDBF
ABDNZ
ABEFU
ABIVO
ABJNI
ABVLG
ACGFO
ACUHS
ACYGS
ADIYS
ADMLS
AEGXH
AEILP
AENEX
AIAGR
ALMA_UNASSIGNED_HOLDINGS
AMVHM
ARCSS
AVWKF
AZFZN
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CAG
CGR
COF
CS3
CUY
CVF
DU5
EAP
EAS
EBC
EBD
EBS
ECM
ECS
EDO
EIF
EJD
EMB
EMK
EMOBN
EPL
EPS
EST
ESX
F5P
FEDTE
FNEHJ
HVGLF
HZ~
H~9
I-F
IPLJI
JAVBF
MCG
MINIK
MKJ
NPM
O9-
OCL
P2P
PK0
PQQKQ
RMI
SV3
TUS
WG8
WH7
XJE
ZWS
7X8
ABUFD
ID FETCH-LOGICAL-c508t-4e5c640f17281b71d4a9b3d15cc2ffc4048d90e054a860f8d782a501de304ca92
IEDL.DBID 7X8
ISICitedReferencesCount 73
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000336876200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1530-888X
IngestDate Sun Nov 09 11:22:51 EST 2025
Mon Jul 21 06:01:39 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 7
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c508t-4e5c640f17281b71d4a9b3d15cc2ffc4048d90e054a860f8d782a501de304ca92
Notes content type line 23
SourceType-Scholarly Journals-1
ObjectType-Correspondence-1
PMID 24708369
PQID 1531957587
PQPubID 23479
ParticipantIDs proquest_miscellaneous_1531957587
pubmed_primary_24708369
PublicationCentury 2000
PublicationDate 2014-07-01
PublicationDateYYYYMMDD 2014-07-01
PublicationDate_xml – month: 07
  year: 2014
  text: 2014-07-01
  day: 01
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Neural computation
PublicationTitleAlternate Neural Comput
PublicationYear 2014
SSID ssj0006105
Score 2.4626815
Snippet We derive a family of risk-sensitive reinforcement learning methods for agents, who face sequential decision-making tasks in uncertain environments. By...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 1298
SubjectTerms Algorithms
Brain - physiology
Brain Mapping
Decision Making - physiology
Humans
Magnetic Resonance Imaging
Markov Chains
Models, Psychological
Nonlinear Dynamics
Oxygen - blood
Probability
Reinforcement (Psychology)
Risk
Title Risk-sensitive reinforcement learning
URI https://www.ncbi.nlm.nih.gov/pubmed/24708369
https://www.proquest.com/docview/1531957587
Volume 26
WOSCitedRecordID wos000336876200003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LSwMxEB7UevBifVtfrKDH0GSb7G5OIqXFg65FVPa25CmlsK3d6u832Qc9CYKX3AJhdmby7eTj-wBupGCx0ipxyA0zRLmWSAgTIYmZtJpYKyqXiPfHOE2TLOOTZuBWNrTKtidWjVrPlZ-R94lPFoctkvhu8Ym8a5R_XW0sNDahM3BQxmd1nK3VwqOawui2YuT-9LKW-B6F_XQ0fM5F7uVI8O_gsrpkxt3_Hm8Pdht4GdzX-bAPG6Y4gG5r3RA0lXwIty_TcoZKz173_S5YmkpBVVXDwqCxkvg4grfx6HX4gBrHBKQc0FohapiKKLbedIrImGgquBxowpQKrVXUlavm2DiYJpII20Q7fCAYJtoMMFWCh8ewVcwLcwqBIt40ggluCac8shIbYbXDazLhipuwB9dtIHKXkf6ZQRRm_lXm61D04KSOZr6opTPykMZeDpuf_WH3Oew4dEJrbuwFdKyrR3MJ2-p7NS2XV9Wndms6efoB1sG0Lw
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Risk-sensitive+reinforcement+learning&rft.jtitle=Neural+computation&rft.au=Shen%2C+Yun&rft.au=Tobia%2C+Michael+J&rft.au=Sommer%2C+Tobias&rft.au=Obermayer%2C+Klaus&rft.date=2014-07-01&rft.eissn=1530-888X&rft.volume=26&rft.issue=7&rft.spage=1298&rft_id=info:doi/10.1162%2FNECO_a_00600&rft_id=info%3Apmid%2F24708369&rft_id=info%3Apmid%2F24708369&rft.externalDocID=24708369
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1530-888X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1530-888X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1530-888X&client=summon