Bandit algorithms for policy learning: methods, implementation, and welfare-performance
Static supervised learning—in which experimental data serves as a training sample for the estimation of an optimal treatment assignment policy—is a commonly assumed framework of policy learning. An arguably more realistic but challenging scenario is a dynamic setting in which the planner performs ex...
Saved in:
| Published in: | Japanese economic review (Oxford, England) Vol. 75; no. 3; pp. 407 - 447 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Singapore
Springer Nature Singapore
01.07.2024
Springer Nature B.V |
| Subjects: | |
| ISSN: | 1352-4739, 1468-5876 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!