Sequential Decision Making With Limited Observation Capability: Application to Wireless Networks
This paper studies a generalized class of restless multi-armed bandits with hidden states and allow cumulative feedback, as opposed to the conventional instantaneous feedback. We call them lazy restless bandits (LRBs) as the events of decision making are sparser than the events of state transition....
Saved in:
| Published in: | IEEE transactions on cognitive communications and networking Vol. 5; no. 2; pp. 237 - 251 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Piscataway
IEEE
01.06.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects: | |
| ISSN: | 2332-7731, 2332-7731 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | This paper studies a generalized class of restless multi-armed bandits with hidden states and allow cumulative feedback, as opposed to the conventional instantaneous feedback. We call them lazy restless bandits (LRBs) as the events of decision making are sparser than the events of state transition. Hence, feedback after each decision event is the cumulative effect of the following state transition events. The states of arms are hidden from the decision maker and rewards for actions are state dependent. The decision maker needs to choose one arm in each decision interval, such that the long-term cumulative reward is maximized. As the states are hidden, the decision maker maintains and updates its belief about them. It is shown that LRBs admit an optimal policy which has threshold structure in belief space. The Whittle-index policy for solving the LRB problem is analyzed; indexability of LRBs is shown. Further, the closed-form index expressions are provided for two sets of special cases; for more general cases, an algorithm for index computation is provided. An extensive simulation study is presented; Whittle-index, modified Whittle-index, and myopic policies are compared. The Lagrangian relaxation of the problem provides an upper bound on the optimal value function; it is used to assess the degree of sub-optimality various policies. |
|---|---|
| AbstractList | This paper studies a generalized class of restless multi-armed bandits with hidden states and allow cumulative feedback, as opposed to the conventional instantaneous feedback. We call them lazy restless bandits (LRBs) as the events of decision making are sparser than the events of state transition. Hence, feedback after each decision event is the cumulative effect of the following state transition events. The states of arms are hidden from the decision maker and rewards for actions are state dependent. The decision maker needs to choose one arm in each decision interval, such that the long-term cumulative reward is maximized. As the states are hidden, the decision maker maintains and updates its belief about them. It is shown that LRBs admit an optimal policy which has threshold structure in belief space. The Whittle-index policy for solving the LRB problem is analyzed; indexability of LRBs is shown. Further, the closed-form index expressions are provided for two sets of special cases; for more general cases, an algorithm for index computation is provided. An extensive simulation study is presented; Whittle-index, modified Whittle-index, and myopic policies are compared. The Lagrangian relaxation of the problem provides an upper bound on the optimal value function; it is used to assess the degree of sub-optimality various policies. |
| Author | Merchant, Shabbir N. Meshram, Rahul Mehta, Varun Kaza, Kesav |
| Author_xml | – sequence: 1 givenname: Kesav orcidid: 0000-0002-9051-4624 surname: Kaza fullname: Kaza, Kesav email: krk@ee.iitb.ac.in organization: Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India – sequence: 2 givenname: Rahul orcidid: 0000-0003-3966-3269 surname: Meshram fullname: Meshram, Rahul email: rahulmeshram07@gmail.com organization: University of Waterloo, Waterloo, Canada – sequence: 3 givenname: Varun surname: Mehta fullname: Mehta, Varun email: varun.baps@gmail.com organization: Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India – sequence: 4 givenname: Shabbir N. surname: Merchant fullname: Merchant, Shabbir N. email: merchant@ee.iitb.ac.in organization: Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India |
| BookMark | eNp9kMtOwzAQRS1UJErpByA2kVin-JHGMbsqPKXSLihiaex0Am7TJNguqH9PQiqEWLCy5bnHM3OOUa-sSkDolOARIVhcLNJ0NqKYiBFNRIIxPkB9yhgNOWek9-t-hIbOrZoAiWkcJ1EfvTzC-xZKb1QRXEFmnKnK4EGtTfkaPBv_FkzNxnhYBnPtwH4o39ZTVSttCuN3l8GkrguTde--ahgLBTgXzMB_VnbtTtBhrgoHw_05QE8314v0LpzOb-_TyTTMqGA-JHqMY0ggIdFYL_OEZVHEqeLASYSJFjRjmutYa6ybuTkFhiNGcsbEGAuuCBug8-7f2lbNRs7LVbW1ZdNSUipighPGoyZFulRmK-cs5LK2ZqPsThIsW5eydSlbl3LvsmH4HyYz_nthb5Up_iXPOtIAwE-nJGaNfca-AEqigqo |
| CODEN | ITCCG7 |
| CitedBy_id | crossref_primary_10_1109_TCCN_2019_2898000 crossref_primary_10_3390_math11071639 crossref_primary_10_1109_ACCESS_2024_3510558 crossref_primary_10_1007_s00186_024_00868_x crossref_primary_10_1017_S0269964822000286 crossref_primary_10_1016_j_peva_2023_102394 |
| Cites_doi | 10.1287/moor.11.1.184 10.1016/j.peva.2012.10.003 10.1287/opre.51.6.850.24925 10.1287/moor.21.2.257 10.1109/CDC.2015.7403456 10.1109/WCNC.2018.8377345 10.1287/moor.11.1.180 10.1007/978-93-86279-38-5 10.1109/TVT.2013.2285713 10.1287/mnsc.17.9.587 10.1109/TVT.2010.2041803 10.1002/9780470980033 10.1109/TIT.2011.2173717 10.1109/T-WC.2008.071349 10.1287/moor.24.2.293 10.1109/ICNC.2011.6022074 10.1287/opre.35.5.736 10.1007/978-3-319-67235-9_19 10.1109/TCNS.2017.2774046 10.1109/TNET.2015.2438009 10.2307/3214547 10.1109/TIT.2010.2068950 10.1109/ICC.2017.7996366 10.1007/978-1-4612-0729-0 10.1109/TIT.2009.2025561 10.2307/3214163 10.1214/15-AAP1137 10.1109/TCCN.2019.2898000 10.1109/INFCOM.2012.6195483 10.1109/ICC.2008.404 10.1239/aap/999187898 10.1287/opre.2016.1531 10.1287/opre.1070.0445 10.1007/978-0-387-49819-5_6 10.1109/TAC.2018.2799521 10.1287/opre.48.1.80.12444 10.1109/TAC.2017.2715329 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019 |
| DBID | 97E RIA RIE AAYXX CITATION 7SP 8FD L7M |
| DOI | 10.1109/TCCN.2019.2898000 |
| DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Electronics & Communications Abstracts Technology Research Database Advanced Technologies Database with Aerospace |
| DatabaseTitle | CrossRef Technology Research Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2332-7731 |
| EndPage | 251 |
| ExternalDocumentID | 10_1109_TCCN_2019_2898000 8636263 |
| Genre | orig-research |
| GroupedDBID | 0R~ 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABJNI ABQJQ ABVLG ACGFS AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IES IFIPE IPLJI JAVBF M43 O9- OCL RIA RIE AAYXX CITATION 7SP 8FD L7M |
| ID | FETCH-LOGICAL-c293t-1b506e8e8145bdf83c4472a7e71401b92c3b7b6bb0b68472e30431f3395097a13 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 8 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000471115000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2332-7731 |
| IngestDate | Sun Nov 09 06:50:48 EST 2025 Sat Nov 29 03:02:23 EST 2025 Tue Nov 18 20:45:15 EST 2025 Wed Aug 27 06:00:29 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Issue | 2 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c293t-1b506e8e8145bdf83c4472a7e71401b92c3b7b6bb0b68472e30431f3395097a13 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0003-3966-3269 0000-0002-9051-4624 |
| PQID | 2296108374 |
| PQPubID | 4437218 |
| PageCount | 15 |
| ParticipantIDs | crossref_primary_10_1109_TCCN_2019_2898000 ieee_primary_8636263 proquest_journals_2296108374 crossref_citationtrail_10_1109_TCCN_2019_2898000 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-06-01 |
| PublicationDateYYYYMMDD | 2019-06-01 |
| PublicationDate_xml | – month: 06 year: 2019 text: 2019-06-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE transactions on cognitive communications and networking |
| PublicationTitleAbbrev | TCCN |
| PublicationYear | 2019 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref34 ref12 ref37 ref15 niño-mora (ref23) 2008 niño-mora (ref33) 2009 ref36 ref14 nocedal (ref46) 2006 ref30 wei (ref35) 2010; 59 ref11 ref10 ref2 ref1 ref39 ref17 ref38 ref19 ref18 gittins (ref8) 1974 (ref16) 2017 ref24 ref26 niño-mora (ref31) 2011 puterman (ref45) 2014 ref25 ny (ref32) 2008 ref20 ref42 ref41 ref22 ref44 ref21 ref43 ref28 ref27 ref29 ref7 ref9 ref4 ref3 ref6 ref5 ref40 hawkins (ref13) 2003 |
| References_xml | – ident: ref10 doi: 10.1287/moor.11.1.184 – start-page: 231 year: 2008 ident: ref23 article-title: An index policy for dynamic fading-channel allocation to heterogeneous mobile users with partial observations publication-title: Proc NGI – start-page: 60 year: 2009 ident: ref33 article-title: A restless bandit marginal productivity index for opportunistic spectrum access with sensing errors publication-title: Proc Net-Coop – year: 2006 ident: ref46 publication-title: Numerical Optimization – ident: ref24 doi: 10.1016/j.peva.2012.10.003 – ident: ref44 doi: 10.1287/opre.51.6.850.24925 – ident: ref11 doi: 10.1287/moor.21.2.257 – ident: ref29 doi: 10.1109/CDC.2015.7403456 – ident: ref1 doi: 10.1109/WCNC.2018.8377345 – ident: ref9 doi: 10.1287/moor.11.1.180 – ident: ref41 doi: 10.1007/978-93-86279-38-5 – ident: ref5 doi: 10.1109/TVT.2013.2285713 – ident: ref38 doi: 10.1287/mnsc.17.9.587 – volume: 59 start-page: 2149 year: 2010 ident: ref35 article-title: Distributed optimal relay selection in wireless cooperative networks with finite-state Markov channels publication-title: IEEE Trans Veh Technol doi: 10.1109/TVT.2010.2041803 – start-page: 4220 year: 2008 ident: ref32 article-title: Multi-UAV dynamic routing with partial observations using restless bandit allocation indices publication-title: Proc ACC – year: 2014 ident: ref45 publication-title: Markov Decision Processes Discrete Stochastic Dynamic Programming – ident: ref19 doi: 10.1002/9780470980033 – ident: ref27 doi: 10.1109/TIT.2011.2173717 – ident: ref26 doi: 10.1109/T-WC.2008.071349 – ident: ref17 doi: 10.1287/moor.24.2.293 – ident: ref2 doi: 10.1109/ICNC.2011.6022074 – ident: ref39 doi: 10.1287/opre.35.5.736 – ident: ref30 doi: 10.1007/978-3-319-67235-9_19 – ident: ref42 doi: 10.1109/TCNS.2017.2774046 – year: 2017 ident: ref16 publication-title: Index Policies and Performance Bounds for Dynamic Selection Problems – start-page: 1 year: 2011 ident: ref31 article-title: Sensor scheduling for hunting elusive hiding targets via Whittle's restless bandit index policy publication-title: Proc Net-Coop – ident: ref25 doi: 10.1109/TNET.2015.2438009 – ident: ref18 doi: 10.2307/3214547 – ident: ref4 doi: 10.1109/TIT.2010.2068950 – ident: ref3 doi: 10.1109/ICC.2017.7996366 – ident: ref43 doi: 10.1007/978-1-4612-0729-0 – ident: ref28 doi: 10.1109/TIT.2009.2025561 – ident: ref7 doi: 10.2307/3214163 – ident: ref21 doi: 10.1214/15-AAP1137 – ident: ref40 doi: 10.1109/TCCN.2019.2898000 – ident: ref6 doi: 10.1109/INFCOM.2012.6195483 – year: 2003 ident: ref13 article-title: A Langrangian decomposition approach to weakly coupled dynamic optimization problems and its applications – ident: ref34 doi: 10.1109/ICC.2008.404 – start-page: 241 year: 1974 ident: ref8 article-title: A dynamic allocation index for the sequential design experiments publication-title: Progress in Statistics – ident: ref22 doi: 10.1239/aap/999187898 – ident: ref15 doi: 10.1287/opre.2016.1531 – ident: ref14 doi: 10.1287/opre.1070.0445 – ident: ref20 doi: 10.1007/978-0-387-49819-5_6 – ident: ref36 doi: 10.1109/TAC.2018.2799521 – ident: ref12 doi: 10.1287/opre.48.1.80.12444 – ident: ref37 doi: 10.1109/TAC.2017.2715329 |
| SSID | ssj0001626684 |
| Score | 2.171305 |
| Snippet | This paper studies a generalized class of restless multi-armed bandits with hidden states and allow cumulative feedback, as opposed to the conventional... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 237 |
| SubjectTerms | Algorithms Computer simulation cumulative feedback Decision making dynamic programming Economic models Fading channels Feedback Indexes Markov processes Optimization Policies Productivity relay selection Relays restless bandits Sequential decision making Upper bounds weakly coupled partially observable Markov decision processes Whittle index Wireless networks |
| Title | Sequential Decision Making With Limited Observation Capability: Application to Wireless Networks |
| URI | https://ieeexplore.ieee.org/document/8636263 https://www.proquest.com/docview/2296108374 |
| Volume | 5 |
| WOSCitedRecordID | wos000471115000004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2332-7731 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001626684 issn: 2332-7731 databaseCode: RIE dateStart: 20150101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFH848aAHv8XplBw8iZ3px5rE25gOD1oFv3arTZOiMDZxVfC_9yXNOkURvJU2KSW_vryP_N57AAeqEygeZCFKWqa9KIwKT2YafR4l8a7gWW4z5O4vWJLwwUBcz8FRnQujtbbkM902l_YsX43zNxMqO-axLZ7SgAZjcZWrNYun4JOYR-7g0qfi-LbXSwx3S7TRqUC7iH5TPbaXyo8N2GqV_sr_vmcVlp31SLoV3Gswp0frsPSlpuAGPN5YcjQK7pCcugY65NL2nCIPz-UTcSlN5ErWAVnSQ5VpWbIfJ6Q7O9Im5ZgYeuwQt0OSVITxySbc9c9ue-eea6Pg5ajLS8-XHRprrrkfdaQqeJhHEQsypk2tPl-KIA8lk7GUVOICskCHpuBOEYYCjQmW-eEWzI_GI70NRKmcahorJtGNwuXIqEQdmPGg0IUUKm4Cna5wmrsa46bVxTC1vgYVqQElNaCkDpQmHNZTXqoCG38N3jAo1AMdAE1oTWFMnQhO0iAQaBqi_x3t_D5rFxbNuyveVwvmy9c3vQcL-Xv5PHndt3_XJ0FlzvY |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1bS8MwFD7MC6gP3qY4nZoHn8S69LI28U2mMnGrgvPyVpsmxcHYZKuC_96TLJuKIvhW2oSWfD05l3znHIADWfck81IfJS1VTuAHuSNShT6PFHiXszQzGXL3rSiO2eMjvynB0TQXRillyGfqWF-as3w5yF51qKzGQlM8ZQbmdOcsm631GVHBZyEL7NGlS3mt02jEmr3Fj9GtQMuIflM-ppvKjy3Y6JWLlf990SosW_uRnI4BX4OS6q_D0peqgmV4ujX0aBTdHjmzLXRI23SdIg_d4pnYpCZyLaYhWdJApWl4su8n5PTzUJsUA6IJsj3cEEk8poyPNuDu4rzTaDq2kYKToTYvHFfUaaiYYm5QFzJnfhYEkZdGSlfrcwX3Ml9EIhSCClzAyFO-LrmT-z5HcyJKXX8TZvuDvtoCImVGFQ1lJNCRwuVIqUAtmDIvV7ngMqwAnaxwktkq47rZRS8x3gbliQYl0aAkFpQKHE6nvIxLbPw1uKxRmA60AFSgOoExsUI4SjyPo3GIHniw_fusfVhodtqtpHUZX-3Aon7PmAVWhdli-Kp2YT57K7qj4Z750z4AMV3SPw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sequential+Decision+Making+With+Limited+Observation+Capability%3A+Application+to+Wireless+Networks&rft.jtitle=IEEE+transactions+on+cognitive+communications+and+networking&rft.au=Kaza%2C+Kesav&rft.au=Meshram%2C+Rahul&rft.au=Mehta%2C+Varun&rft.au=Merchant%2C+Shabbir+N.&rft.date=2019-06-01&rft.issn=2332-7731&rft.eissn=2332-7731&rft.volume=5&rft.issue=2&rft.spage=237&rft.epage=251&rft_id=info:doi/10.1109%2FTCCN.2019.2898000&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCCN_2019_2898000 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2332-7731&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2332-7731&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2332-7731&client=summon |