Bandit algorithms for policy learning: methods, implementation, and welfare-performance

Static supervised learning—in which experimental data serves as a training sample for the estimation of an optimal treatment assignment policy—is a commonly assumed framework of policy learning. An arguably more realistic but challenging scenario is a dynamic setting in which the planner performs ex...

Full description

Saved in:
Bibliographic Details
Published in:Japanese economic review (Oxford, England) Vol. 75; no. 3; pp. 407 - 447
Main Authors: Kitagawa, Toru, Rowley, Jeff
Format: Journal Article
Language:English
Published: Singapore Springer Nature Singapore 01.07.2024
Springer Nature B.V
Subjects:
ISSN:1352-4739, 1468-5876
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Be the first to leave a comment!
You must be logged in first