A novel individually rational objective in multi-agent multi-armed bandits: Algorithms and regret bounds

We study a two-player stochastic multi-armed bandit (MAB) problem with different expected rewards for each player, a generalisation of two-player general sum repeated games to stochastic rewards. Our aim is to find the egalitarian bargaining solution (EBS) for the repeated game, which can lead to mu...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems Ročník 2020-May; s. 1395
Hlavní autoři: Tossou, Aristide, Dimitrakakis, Christos, Rzepecki, Jaroslaw, Hofmann, K.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: 2020
Témata:
ISSN:1558-2914, 1548-8403
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract We study a two-player stochastic multi-armed bandit (MAB) problem with different expected rewards for each player, a generalisation of two-player general sum repeated games to stochastic rewards. Our aim is to find the egalitarian bargaining solution (EBS) for the repeated game, which can lead to much higher rewards than the maximin value of both players. Our main contribution is the derivation of an algorithm, UCRG, that achieves simultaneously for both players, a high-probability regret bound of order Õ ( T2/3) after any T rounds of play. We demonstrate that our upper bound is nearly optimal by proving a lower bound of Ω ( T2/3) for any algorithm. Experiments confirm our theoretical results and the superiority of UCRG compared to the well-known explore-then-commit heuristic.
AbstractList We study a two-player stochastic multi-armed bandit (MAB) problem with different expected rewards for each player, a generalisation of two-player general sum repeated games to stochastic rewards. Our aim is to find the egalitarian bargaining solution (EBS) for the repeated game, which can lead to much higher rewards than the maximin value of both players. Our main contribution is the derivation of an algorithm, UCRG, that achieves simultaneously for both players, a high-probability regret bound of order Õ ( T2/3) after any T rounds of play. We demonstrate that our upper bound is nearly optimal by proving a lower bound of Ω ( T2/3) for any algorithm. Experiments confirm our theoretical results and the superiority of UCRG compared to the well-known explore-then-commit heuristic.
Author Rzepecki, Jaroslaw
Hofmann, K.
Dimitrakakis, Christos
Tossou, Aristide
Author_xml – sequence: 1
  givenname: Aristide
  surname: Tossou
  fullname: Tossou, Aristide
  organization: Data Science
– sequence: 2
  givenname: Christos
  surname: Dimitrakakis
  fullname: Dimitrakakis, Christos
  organization: University of Oslo
– sequence: 3
  givenname: Jaroslaw
  surname: Rzepecki
  fullname: Rzepecki, Jaroslaw
  organization: Microsoft Research
– sequence: 4
  givenname: K.
  surname: Hofmann
  fullname: Hofmann, K.
  organization: Microsoft Research
BackLink https://research.chalmers.se/publication/521346$$DView record from Swedish Publication Index (Chalmers tekniska högskola)
BookMark eNo1jMtOwzAURL0oEm1hzdY_kOJH7DjsqoqXVIkFsLauk-vGVR6VnRT17wkCZnNGM5pZkUU_9EjIHWcbNeteytIUmm9-WAqxIEuulMlEyfNrskrpyJjUJi-XpNnSfjhjS0Nfh3OoJ2jbC40whqGHlg7uiNUYzjj3tJvaMWRwwH7897HDmjqYt2N6oNv2MMQwNl2ic0QjHiKO1A1TX6cbcuWhTXj7xzX5fHr82L1k-7fn1912n4HgxmSqFL7wCr30Qkle1RJrDQwYV8wYx4TLHShUXuSVkkIxL5yqeSG157wwTq7J--9v-sLT5Owphg7ixQ4QbMSEEKvGVg20HcZkE1qhOFMloK18jTZnTlrQvLK-AHBaGy2Vkd-cyWsB
ContentType Conference Proceeding
DBID ADTPV
BNKNJ
F1S
DOI 10.5555/3398761.3398922
DatabaseName SwePub
SwePub Conference
SWEPUB Chalmers tekniska högskola
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
ExternalDocumentID oai_research_chalmers_se_251059ae_cfde_40b3_a61c_f7aab6686358
GroupedDBID 123
29P
5VS
6IK
6IL
AAJGR
ADTPV
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
APO
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BNKNJ
BPEOZ
CHZPO
F1S
I07
IPLJI
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-a2188-592f7f5ef3f2531cd3ed6a0a015088b02b4ba5e5f24c53250f2b5d1736f1178b3
ISSN 1558-2914
1548-8403
IngestDate Wed Nov 05 04:22:36 EST 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a2188-592f7f5ef3f2531cd3ed6a0a015088b02b4ba5e5f24c53250f2b5d1736f1178b3
ParticipantIDs swepub_primary_oai_research_chalmers_se_251059ae_cfde_40b3_a61c_f7aab6686358
PublicationCentury 2000
PublicationDate 2020
PublicationDateYYYYMMDD 2020-01-01
PublicationDate_xml – year: 2020
  text: 2020
PublicationDecade 2020
PublicationTitle Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems
PublicationYear 2020
SSID ssj0036849
Score 1.7420202
Snippet We study a two-player stochastic multi-armed bandit (MAB) problem with different expected rewards for each player, a generalisation of two-player general sum...
SourceID swepub
SourceType Open Access Repository
StartPage 1395
SubjectTerms Egalitarian bargaining solution
Individual rationality
Multi-armed bandits
Safety
Title A novel individually rational objective in multi-agent multi-armed bandits: Algorithms and regret bounds
URI https://research.chalmers.se/publication/521346
Volume 2020-May
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lj9MwELbKwoETr0XAAvKBW5WliePE4VbxEIJltYci7S2yHbvt0iZVkla7_GJ-BmM7L0ovCOghdd3ISTufZsaT-WYQekXjLEkE0140yYgXSsI9LmjihTHxJVg8GUtbMv8sPj9nl5fJxWj0o-XC7FZxnrPr62TzX0UNcyBsQ539A3F3i8IEjEHocASxw_F3wR-0PxfdZNWmAOwH_pZ5PSD7mScG021t-A0mI3Y6t7Q3m4Jh8g25-fxLbfM2w3mcFztlqna0rK7Vzbhsr1KIK6dNTUzFJi56bqVmXK6N92toNXXlnhbMi3JZL9bu0qWal6oeC9P3qXP8Z2DTi61VaFY7ZR0u3xmmVsm_GY94UDWhT-X_rjZKuibdnzj4BqtB-L7Q66ZZ9OfTYRwkmAziII3qpoZP4iiprW63J35p0pGcigaXlx6yHRReIFtCEjAQ_ql5Txxl-kBB7qYS0yKVC9vmpkorlQbWWeUqlTpTaTgRJOWRL1Mdcy6iiIFvx26h2wGBoWMatj4DzNiNmtk_erDltrSQ9ve4glTm9l7v3dxeeVvrEs3uo-MeP7hH3AM0UvlDdK_tFoIb4_EILabYogUP0YJbtOAOLfA9HqAFD9CCG7S8wT1WMExhhxXssHKMvn54P3v70Wv6fHgcHEzm0STQsaZKEx2ASZAZUVnEJ9wE4xgTk0CEglNFdRBKSsBn14GgmR-TSPt-zAR5jI7yIldPEBaREoyzRGcCtgZxwHnIM9hUiIxwEUr-FJ25fyzduGIu6V9J89m_Xe4E3e2R_Rwd1eVWvUB35K5eVuVLi5if2NS6XA
linkProvider IEEE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+of+the+International+Joint+Conference+on+Autonomous+Agents+and+Multiagent+Systems&rft.atitle=A+novel+individually+rational+objective+in+multi-agent+multi-armed+bandits%3A+Algorithms+and+regret+bounds&rft.au=Tossou%2C+Aristide&rft.au=Dimitrakakis%2C+Christos&rft.au=Rzepecki%2C+Jaroslaw&rft.au=Hofmann%2C+K.&rft.date=2020-01-01&rft.issn=1558-2914&rft.volume=2020-May&rft.spage=1395&rft_id=info:doi/10.5555%2F3398761.3398922&rft.externalDocID=oai_research_chalmers_se_251059ae_cfde_40b3_a61c_f7aab6686358
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-2914&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-2914&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-2914&client=summon