Dopamine reward prediction errors reflect hidden-state inference across time

A long-standing idea in modern neuroscience is that the brain computes inferences about the outside world rather than passively observing its environment. The authors record from midbrain dopamine neurons during tasks with different reward contingencies and show that responses are consistent with a...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Nature neuroscience Ročník 20; číslo 4; s. 581 - 589
Hlavní autoři:	Starkweather, Clara Kwon, Babayan, Benedicte M, Uchida, Naoshige, Gershman, Samuel J
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	New York Nature Publishing Group US 01.04.2017 Nature Publishing Group
Témata:	631/378/116/2396 631/378/1788 9/97 Animal Genetics and Genomics Animals Association Learning - physiology Behavioral Sciences Biological Techniques Biomedicine Dopamine Dopaminergic Neurons - physiology Health aspects Inference Male Mice Models, Neurological Neurobiology Neurosciences Normal distribution Odors Physiological aspects Psychological aspects Recording sessions Reward Rewards (Psychology) Time Factors Ventral Tegmental Area - physiology United States United States > US Massachusetts
ISSN:	1097-6256, 1546-1726, 1546-1726
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	A long-standing idea in modern neuroscience is that the brain computes inferences about the outside world rather than passively observing its environment. The authors record from midbrain dopamine neurons during tasks with different reward contingencies and show that responses are consistent with a learning rule that harnesses hidden-state inference. Midbrain dopamine neurons signal reward prediction error (RPE), or actual minus expected reward. The temporal difference (TD) learning model has been a cornerstone in understanding how dopamine RPEs could drive associative learning. Classically, TD learning imparts value to features that serially track elapsed time relative to observable stimuli. In the real world, however, sensory stimuli provide ambiguous information about the hidden state of the environment, leading to the proposal that TD learning might instead compute a value signal based on an inferred distribution of hidden states (a 'belief state'). Here we asked whether dopaminergic signaling supports a TD learning framework that operates over hidden states. We found that dopamine signaling showed a notable difference between two tasks that differed only with respect to whether reward was delivered in a deterministic manner. Our results favor an associative learning rule that combines cached values with hidden-state inference.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1097-6256 1546-1726 1546-1726
DOI:	10.1038/nn.4520