Sequential Decision Making With Limited Observation Capability: Application to Wireless Networks

This paper studies a generalized class of restless multi-armed bandits with hidden states and allow cumulative feedback, as opposed to the conventional instantaneous feedback. We call them lazy restless bandits (LRBs) as the events of decision making are sparser than the events of state transition....

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on cognitive communications and networking Ročník 5; číslo 2; s. 237 - 251
Hlavní autori: Kaza, Kesav, Meshram, Rahul, Mehta, Varun, Merchant, Shabbir N.
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Piscataway IEEE 01.06.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:2332-7731, 2332-7731
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract This paper studies a generalized class of restless multi-armed bandits with hidden states and allow cumulative feedback, as opposed to the conventional instantaneous feedback. We call them lazy restless bandits (LRBs) as the events of decision making are sparser than the events of state transition. Hence, feedback after each decision event is the cumulative effect of the following state transition events. The states of arms are hidden from the decision maker and rewards for actions are state dependent. The decision maker needs to choose one arm in each decision interval, such that the long-term cumulative reward is maximized. As the states are hidden, the decision maker maintains and updates its belief about them. It is shown that LRBs admit an optimal policy which has threshold structure in belief space. The Whittle-index policy for solving the LRB problem is analyzed; indexability of LRBs is shown. Further, the closed-form index expressions are provided for two sets of special cases; for more general cases, an algorithm for index computation is provided. An extensive simulation study is presented; Whittle-index, modified Whittle-index, and myopic policies are compared. The Lagrangian relaxation of the problem provides an upper bound on the optimal value function; it is used to assess the degree of sub-optimality various policies.
AbstractList This paper studies a generalized class of restless multi-armed bandits with hidden states and allow cumulative feedback, as opposed to the conventional instantaneous feedback. We call them lazy restless bandits (LRBs) as the events of decision making are sparser than the events of state transition. Hence, feedback after each decision event is the cumulative effect of the following state transition events. The states of arms are hidden from the decision maker and rewards for actions are state dependent. The decision maker needs to choose one arm in each decision interval, such that the long-term cumulative reward is maximized. As the states are hidden, the decision maker maintains and updates its belief about them. It is shown that LRBs admit an optimal policy which has threshold structure in belief space. The Whittle-index policy for solving the LRB problem is analyzed; indexability of LRBs is shown. Further, the closed-form index expressions are provided for two sets of special cases; for more general cases, an algorithm for index computation is provided. An extensive simulation study is presented; Whittle-index, modified Whittle-index, and myopic policies are compared. The Lagrangian relaxation of the problem provides an upper bound on the optimal value function; it is used to assess the degree of sub-optimality various policies.
Author Merchant, Shabbir N.
Meshram, Rahul
Mehta, Varun
Kaza, Kesav
Author_xml – sequence: 1
  givenname: Kesav
  orcidid: 0000-0002-9051-4624
  surname: Kaza
  fullname: Kaza, Kesav
  email: krk@ee.iitb.ac.in
  organization: Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India
– sequence: 2
  givenname: Rahul
  orcidid: 0000-0003-3966-3269
  surname: Meshram
  fullname: Meshram, Rahul
  email: rahulmeshram07@gmail.com
  organization: University of Waterloo, Waterloo, Canada
– sequence: 3
  givenname: Varun
  surname: Mehta
  fullname: Mehta, Varun
  email: varun.baps@gmail.com
  organization: Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India
– sequence: 4
  givenname: Shabbir N.
  surname: Merchant
  fullname: Merchant, Shabbir N.
  email: merchant@ee.iitb.ac.in
  organization: Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India
BookMark eNp9kMtOwzAQRS1UJErpByA2kVin-JHGMbsqPKXSLihiaex0Am7TJNguqH9PQiqEWLCy5bnHM3OOUa-sSkDolOARIVhcLNJ0NqKYiBFNRIIxPkB9yhgNOWek9-t-hIbOrZoAiWkcJ1EfvTzC-xZKb1QRXEFmnKnK4EGtTfkaPBv_FkzNxnhYBnPtwH4o39ZTVSttCuN3l8GkrguTde--ahgLBTgXzMB_VnbtTtBhrgoHw_05QE8314v0LpzOb-_TyTTMqGA-JHqMY0ggIdFYL_OEZVHEqeLASYSJFjRjmutYa6ybuTkFhiNGcsbEGAuuCBug8-7f2lbNRs7LVbW1ZdNSUipighPGoyZFulRmK-cs5LK2ZqPsThIsW5eydSlbl3LvsmH4HyYz_nthb5Up_iXPOtIAwE-nJGaNfca-AEqigqo
CODEN ITCCG7
CitedBy_id crossref_primary_10_1109_TCCN_2019_2898000
crossref_primary_10_3390_math11071639
crossref_primary_10_1109_ACCESS_2024_3510558
crossref_primary_10_1007_s00186_024_00868_x
crossref_primary_10_1017_S0269964822000286
crossref_primary_10_1016_j_peva_2023_102394
Cites_doi 10.1287/moor.11.1.184
10.1016/j.peva.2012.10.003
10.1287/opre.51.6.850.24925
10.1287/moor.21.2.257
10.1109/CDC.2015.7403456
10.1109/WCNC.2018.8377345
10.1287/moor.11.1.180
10.1007/978-93-86279-38-5
10.1109/TVT.2013.2285713
10.1287/mnsc.17.9.587
10.1109/TVT.2010.2041803
10.1002/9780470980033
10.1109/TIT.2011.2173717
10.1109/T-WC.2008.071349
10.1287/moor.24.2.293
10.1109/ICNC.2011.6022074
10.1287/opre.35.5.736
10.1007/978-3-319-67235-9_19
10.1109/TCNS.2017.2774046
10.1109/TNET.2015.2438009
10.2307/3214547
10.1109/TIT.2010.2068950
10.1109/ICC.2017.7996366
10.1007/978-1-4612-0729-0
10.1109/TIT.2009.2025561
10.2307/3214163
10.1214/15-AAP1137
10.1109/TCCN.2019.2898000
10.1109/INFCOM.2012.6195483
10.1109/ICC.2008.404
10.1239/aap/999187898
10.1287/opre.2016.1531
10.1287/opre.1070.0445
10.1007/978-0-387-49819-5_6
10.1109/TAC.2018.2799521
10.1287/opre.48.1.80.12444
10.1109/TAC.2017.2715329
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019
DBID 97E
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
DOI 10.1109/TCCN.2019.2898000
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library (IEL) (UW System Shared)
CrossRef
Electronics & Communications Abstracts
Technology Research Database
Advanced Technologies Database with Aerospace
DatabaseTitle CrossRef
Technology Research Database
Advanced Technologies Database with Aerospace
Electronics & Communications Abstracts
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2332-7731
EndPage 251
ExternalDocumentID 10_1109_TCCN_2019_2898000
8636263
Genre orig-research
GroupedDBID 0R~
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABJNI
ABQJQ
ABVLG
ACGFS
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
IES
IFIPE
IPLJI
JAVBF
M43
O9-
OCL
RIA
RIE
AAYXX
CITATION
7SP
8FD
L7M
ID FETCH-LOGICAL-c293t-1b506e8e8145bdf83c4472a7e71401b92c3b7b6bb0b68472e30431f3395097a13
IEDL.DBID RIE
ISICitedReferencesCount 8
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000471115000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2332-7731
IngestDate Sun Nov 09 06:50:48 EST 2025
Sat Nov 29 03:02:23 EST 2025
Tue Nov 18 20:45:15 EST 2025
Wed Aug 27 06:00:29 EDT 2025
IsPeerReviewed false
IsScholarly true
Issue 2
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c293t-1b506e8e8145bdf83c4472a7e71401b92c3b7b6bb0b68472e30431f3395097a13
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-3966-3269
0000-0002-9051-4624
PQID 2296108374
PQPubID 4437218
PageCount 15
ParticipantIDs crossref_primary_10_1109_TCCN_2019_2898000
ieee_primary_8636263
proquest_journals_2296108374
crossref_citationtrail_10_1109_TCCN_2019_2898000
PublicationCentury 2000
PublicationDate 2019-06-01
PublicationDateYYYYMMDD 2019-06-01
PublicationDate_xml – month: 06
  year: 2019
  text: 2019-06-01
  day: 01
PublicationDecade 2010
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE transactions on cognitive communications and networking
PublicationTitleAbbrev TCCN
PublicationYear 2019
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref34
ref12
ref37
ref15
niño-mora (ref23) 2008
niño-mora (ref33) 2009
ref36
ref14
nocedal (ref46) 2006
ref30
wei (ref35) 2010; 59
ref11
ref10
ref2
ref1
ref39
ref17
ref38
ref19
ref18
gittins (ref8) 1974
(ref16) 2017
ref24
ref26
niño-mora (ref31) 2011
puterman (ref45) 2014
ref25
ny (ref32) 2008
ref20
ref42
ref41
ref22
ref44
ref21
ref43
ref28
ref27
ref29
ref7
ref9
ref4
ref3
ref6
ref5
ref40
hawkins (ref13) 2003
References_xml – ident: ref10
  doi: 10.1287/moor.11.1.184
– start-page: 231
  year: 2008
  ident: ref23
  article-title: An index policy for dynamic fading-channel allocation to heterogeneous mobile users with partial observations
  publication-title: Proc NGI
– start-page: 60
  year: 2009
  ident: ref33
  article-title: A restless bandit marginal productivity index for opportunistic spectrum access with sensing errors
  publication-title: Proc Net-Coop
– year: 2006
  ident: ref46
  publication-title: Numerical Optimization
– ident: ref24
  doi: 10.1016/j.peva.2012.10.003
– ident: ref44
  doi: 10.1287/opre.51.6.850.24925
– ident: ref11
  doi: 10.1287/moor.21.2.257
– ident: ref29
  doi: 10.1109/CDC.2015.7403456
– ident: ref1
  doi: 10.1109/WCNC.2018.8377345
– ident: ref9
  doi: 10.1287/moor.11.1.180
– ident: ref41
  doi: 10.1007/978-93-86279-38-5
– ident: ref5
  doi: 10.1109/TVT.2013.2285713
– ident: ref38
  doi: 10.1287/mnsc.17.9.587
– volume: 59
  start-page: 2149
  year: 2010
  ident: ref35
  article-title: Distributed optimal relay selection in wireless cooperative networks with finite-state Markov channels
  publication-title: IEEE Trans Veh Technol
  doi: 10.1109/TVT.2010.2041803
– start-page: 4220
  year: 2008
  ident: ref32
  article-title: Multi-UAV dynamic routing with partial observations using restless bandit allocation indices
  publication-title: Proc ACC
– year: 2014
  ident: ref45
  publication-title: Markov Decision Processes Discrete Stochastic Dynamic Programming
– ident: ref19
  doi: 10.1002/9780470980033
– ident: ref27
  doi: 10.1109/TIT.2011.2173717
– ident: ref26
  doi: 10.1109/T-WC.2008.071349
– ident: ref17
  doi: 10.1287/moor.24.2.293
– ident: ref2
  doi: 10.1109/ICNC.2011.6022074
– ident: ref39
  doi: 10.1287/opre.35.5.736
– ident: ref30
  doi: 10.1007/978-3-319-67235-9_19
– ident: ref42
  doi: 10.1109/TCNS.2017.2774046
– year: 2017
  ident: ref16
  publication-title: Index Policies and Performance Bounds for Dynamic Selection Problems
– start-page: 1
  year: 2011
  ident: ref31
  article-title: Sensor scheduling for hunting elusive hiding targets via Whittle's restless bandit index policy
  publication-title: Proc Net-Coop
– ident: ref25
  doi: 10.1109/TNET.2015.2438009
– ident: ref18
  doi: 10.2307/3214547
– ident: ref4
  doi: 10.1109/TIT.2010.2068950
– ident: ref3
  doi: 10.1109/ICC.2017.7996366
– ident: ref43
  doi: 10.1007/978-1-4612-0729-0
– ident: ref28
  doi: 10.1109/TIT.2009.2025561
– ident: ref7
  doi: 10.2307/3214163
– ident: ref21
  doi: 10.1214/15-AAP1137
– ident: ref40
  doi: 10.1109/TCCN.2019.2898000
– ident: ref6
  doi: 10.1109/INFCOM.2012.6195483
– year: 2003
  ident: ref13
  article-title: A Langrangian decomposition approach to weakly coupled dynamic optimization problems and its applications
– ident: ref34
  doi: 10.1109/ICC.2008.404
– start-page: 241
  year: 1974
  ident: ref8
  article-title: A dynamic allocation index for the sequential design experiments
  publication-title: Progress in Statistics
– ident: ref22
  doi: 10.1239/aap/999187898
– ident: ref15
  doi: 10.1287/opre.2016.1531
– ident: ref14
  doi: 10.1287/opre.1070.0445
– ident: ref20
  doi: 10.1007/978-0-387-49819-5_6
– ident: ref36
  doi: 10.1109/TAC.2018.2799521
– ident: ref12
  doi: 10.1287/opre.48.1.80.12444
– ident: ref37
  doi: 10.1109/TAC.2017.2715329
SSID ssj0001626684
Score 2.1712012
Snippet This paper studies a generalized class of restless multi-armed bandits with hidden states and allow cumulative feedback, as opposed to the conventional...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 237
SubjectTerms Algorithms
Computer simulation
cumulative feedback
Decision making
dynamic programming
Economic models
Fading channels
Feedback
Indexes
Markov processes
Optimization
Policies
Productivity
relay selection
Relays
restless bandits
Sequential decision making
Upper bounds
weakly coupled partially observable Markov decision processes
Whittle index
Wireless networks
Title Sequential Decision Making With Limited Observation Capability: Application to Wireless Networks
URI https://ieeexplore.ieee.org/document/8636263
https://www.proquest.com/docview/2296108374
Volume 5
WOSCitedRecordID wos000471115000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2332-7731
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001626684
  issn: 2332-7731
  databaseCode: RIE
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH5sw4Me_DXF6ZQcPInd0l9J421Uhxer4MTdapNmOBibbJ3gf2-SZp2iCN4KTUrJl-S9l_flewDnXPKcqqngEBZkjrLXoZP5jDt5xoSfBSMVQZTFJmiSRMMhe6jBZXUXRkppyGeyox9NLj-fiaU-KutGxIin1KFOKSnvaq3PU9QbEgU2celi1h3EcaK5W6yjggrlF-FvpsfUUvmxARur0t_53__swrb1HlGvhHsPanK6D1tfNAWb8PJoyNFq4U7QtS2gg-5MzSn0PC5ekb3ShO55dSCLYmUyDUv24wr11iltVMyQpsdO1HaIkpIwvjiAp_7NIL51bBkFRyhbXjguDzGRkYzcIOT5KPJFEFAvo1Jr9bmcecLnlBPOMVcDSD3pa8Gdke8z5UzQzPUPoTGdTeURoMglmdYxwiIXgXJseBjmeESIzCKupoLbArwa4VRYjXFd6mKSmlgDs1SDkmpQUgtKCy6qLm-lwMZfjZsahaqhBaAF7RWMqV2Ci9TzmHINVfwdHP_e6wQ29bdL3lcbGsV8KU9hQ7wX48X8zMyuT-7KzkU
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1dS8MwFL3MKagPfk1xOjUPPond-t3GtzEdE7cqOHFvNUlTHIxNtk7w35ukWacogm-FJrTkJLn35t6cA3BOOU0CMRUMH7vEEPbaM4iDqZEQzBzipiKCyMUmgigKBwP8UILL4i4M51wVn_G6fFS5_GTC5vKorBH6ijxlBValcpaX39ZanqiId37o6tSlZeJGv9WKZPUWrouwQnhG5jfjo9RUfmzByq60t__3Rzuwpf1H1MwB34USH-_B5hdWwQq8PKryaLF0R-haS-ignlKdQs_D7BXpS03onhZHsqgljKaqk_24Qs1lUhtlEyQLZEdiQ0RRXjI-24en9k2_1TG0kILBhDXPDIt6ps9DHlquR5M0dJjrBjYJuGTrsyi2mUMD6lNqUjGAgc0dSbmTOg4W7kRALOcAyuPJmB8CCi2fSCYjkyXMFa4N9bzETH2fk5CKyWBVwVyMcMw0y7gUuxjFKtowcSxBiSUosQalChdFl7ecYuOvxhWJQtFQA1CF2gLGWC_CWWzbWDiHIgJ3j37vdQbrnX6vG3dvo7tj2JDfyavAalDOpnN-AmvsPRvOpqdqpn0C2pLRkA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sequential+Decision+Making+With+Limited+Observation+Capability%3A+Application+to+Wireless+Networks&rft.jtitle=IEEE+transactions+on+cognitive+communications+and+networking&rft.au=Kaza%2C+Kesav&rft.au=Meshram%2C+Rahul&rft.au=Mehta%2C+Varun&rft.au=Merchant%2C+Shabbir+N.&rft.date=2019-06-01&rft.pub=IEEE&rft.eissn=2332-7731&rft.volume=5&rft.issue=2&rft.spage=237&rft.epage=251&rft_id=info:doi/10.1109%2FTCCN.2019.2898000&rft.externalDocID=8636263
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2332-7731&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2332-7731&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2332-7731&client=summon