DUELING BANDIT PROBLEMS.
Saved in:
| Title: | DUELING BANDIT PROBLEMS. |
|---|---|
| Authors: | Peköz, Erol, Ross, Sheldon M., Zhang, Zhengyu |
| Source: | Probability in the Engineering & Informational Sciences; Apr2022, Vol. 36 Issue 2, p264-275, 12p |
| Subject Terms: | ROBBERS, EDUCATIONAL games |
| Abstract: | There is a set of n bandits and at every stage, two of the bandits are chosen to play a game, with the result of a game being learned. In the "weak regret problem," we suppose there is a "best" bandit that wins each game it plays with probability at least p > 1/2, with the value of p being unknown. The objective is to choose bandits to maximize the number of times that one of the competitors is the best bandit. In the "strong regret problem", we suppose that bandit i has unknown value v |
| Copyright of Probability in the Engineering & Informational Sciences is the property of Cambridge University Press and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Complementary Index |
Be the first to leave a comment!
Full Text Finder
Nájsť tento článok vo Web of Science