Quantum-Enhanced Reinforcement Learning for Finite-Episode Games with Discrete State Spaces
Quantum annealing algorithms belong to the class of metaheuristic tools, applicable for solving binary optimization problems. Hardware implementations of quantum annealing, such as the quantum annealing machines produced by D-Wave Systems [1], have been subject to multiple analyses in research, with...
Gespeichert in:
| Veröffentlicht in: | Frontiers in physics Jg. 5 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Frontiers Media S.A
01.02.2018
|
| Schlagworte: | |
| ISSN: | 2296-424X, 2296-424X |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Quantum annealing algorithms belong to the class of metaheuristic tools, applicable for solving binary optimization problems. Hardware implementations of quantum annealing, such as the quantum annealing machines produced by D-Wave Systems [1], have been subject to multiple analyses in research, with the aim of characterizing the technology's usefulness for optimization and sampling tasks [2–16]. Here, we present a way to partially embed both Monte Carlo policy iteration for finding an optimal policy on random observations, as well as how to embed n sub-optimal state-value functions for approximating an improved state-value function given a policy for finite horizon games with discrete state spaces on a D-Wave 2000Q quantum processing unit (QPU). We explain how both problems can be expressed as a quadratic unconstrained binary optimization (QUBO) problem, and show that quantum-enhanced Monte Carlo policy evaluation allows for finding equivalent or better state-value functions for a given policy with the same number episodes compared to a purely classical Monte Carlo algorithm. Additionally, we describe a quantum-classical policy learning algorithm. Our first and foremost aim is to explain how to represent and solve parts of these problems with the help of the QPU, and not to prove supremacy over every existing classical policy evaluation algorithm. |
|---|---|
| AbstractList | Quantum annealing algorithms belong to the class of metaheuristic tools, applicable for solving binary optimization problems. Hardware implementations of quantum annealing, such as the quantum annealing machines produced by D-Wave Systems [1], have been subject to multiple analyses in research, with the aim of characterizing the technology's usefulness for optimization and sampling tasks [2–16]. Here, we present a way to partially embed both Monte Carlo policy iteration for finding an optimal policy on random observations, as well as how to embed n sub-optimal state-value functions for approximating an improved state-value function given a policy for finite horizon games with discrete state spaces on a D-Wave 2000Q quantum processing unit (QPU). We explain how both problems can be expressed as a quadratic unconstrained binary optimization (QUBO) problem, and show that quantum-enhanced Monte Carlo policy evaluation allows for finding equivalent or better state-value functions for a given policy with the same number episodes compared to a purely classical Monte Carlo algorithm. Additionally, we describe a quantum-classical policy learning algorithm. Our first and foremost aim is to explain how to represent and solve parts of these problems with the help of the QPU, and not to prove supremacy over every existing classical policy evaluation algorithm. |
| Author | Neukart, Florian Seidel, Christian Compostella, Gabriele Von Dollen, David |
| Author_xml | – sequence: 1 givenname: Florian surname: Neukart fullname: Neukart, Florian – sequence: 2 givenname: David surname: Von Dollen fullname: Von Dollen, David – sequence: 3 givenname: Christian surname: Seidel fullname: Seidel, Christian – sequence: 4 givenname: Gabriele surname: Compostella fullname: Compostella, Gabriele |
| BookMark | eNp1UE1LAzEQDaJgrT173T-wNcnudjdH0bYWCuIXCB7CbDJpI222JCnSf2_aKojgZT4e897MvAty6jqHhFwxOiyKRlybzXI35JTVQ0ppzU5Ij3Mxyktevp3-qs_JIISPNMJ4JRpe9sj74xZc3K7zsVuCU6izJ7TOdF7hGl3M5gjeWbfIEpRNrLMR8_HGhk5jNoU1huzTxmV2Z4PyGDF7jrCPG1AYLsmZgVXAwXfuk9fJ-OX2Pp8_TGe3N_NcFWUT8xEKzUxTcQBasJbSUQON4LrVgquyrbEtTepaZlClMcZQ1KOKM-B1I7A2RZ_Mjrq6gw-58XYNfic7sPIAdH4hwUerVih1QcEYSjVTTanTJsWrAgthyoRVLU9a1VFL-S4Ej0Yqm16ynYse7EoyKveGy73hcm-4PBieeNd_eD93_Mf4Au-Whp4 |
| CitedBy_id | crossref_primary_10_1038_s41598_021_92295_9 crossref_primary_10_3389_fphy_2018_00055 crossref_primary_10_1007_s42484_024_00179_8 crossref_primary_10_1088_1361_6633_ac8c54 crossref_primary_10_1007_s11227_025_07662_4 crossref_primary_10_1038_s41598_020_60022_5 crossref_primary_10_3389_fcomp_2019_00009 crossref_primary_10_7566_JPSJ_88_061005 crossref_primary_10_1088_1367_2630_ac5b56 crossref_primary_10_1103_PRXQuantum_2_010328 crossref_primary_10_7566_JPSJ_89_085001 crossref_primary_10_3389_fcomp_2019_00002 crossref_primary_10_1007_s42484_021_00049_7 crossref_primary_10_1155_2023_2451990 crossref_primary_10_1088_2058_9565_ad261b crossref_primary_10_1007_s11128_025_04870_y crossref_primary_10_1088_2058_9565_aaef5e crossref_primary_10_1103_PhysRevResearch_3_033006 crossref_primary_10_1007_s42484_020_00026_6 |
| Cites_doi | 10.1103/PhysRevX.4.021041 10.3389/fict.2017.00029 10.1038/nphys2900 10.1007/s11128-017-1527-9 10.1103/PhysRevX.5.031040 10.1007/s11128-014-0892-x 10.1140/epjst/e2015-02347-y 10.1140/epjst/e2015-02349-9 10.3389/fphy.2014.00005 10.1103/PhysRevA.94.022308 10.1038/srep00571 10.1103/PhysRevLett.117.180402 10.1016/j.proeng.2014.03.148 10.1007/978-3-658-16176-7_8 10.3389/fphy.2014.00052 10.1103/PhysRevLett.118.066802 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION DOA |
| DOI | 10.3389/fphy.2017.00071 |
| DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Physics |
| EISSN | 2296-424X |
| ExternalDocumentID | oai_doaj_org_article_d30aff00d1c84d068c253e39f400d5b2 10_3389_fphy_2017_00071 |
| GroupedDBID | 5VS 9T4 AAFWJ AAYXX ACGFS ADBBV AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION GROUPED_DOAJ KQ8 M~E OK1 |
| ID | FETCH-LOGICAL-c348t-6e9d1f852aa031b0068a892dbd92c4b7eb4f2dbb1fecf8511e976521a2789e7f3 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 21 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000423814200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2296-424X |
| IngestDate | Fri Oct 03 12:43:48 EDT 2025 Tue Nov 18 22:04:51 EST 2025 Sat Nov 29 05:42:27 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c348t-6e9d1f852aa031b0068a892dbd92c4b7eb4f2dbb1fecf8511e976521a2789e7f3 |
| OpenAccessLink | https://doaj.org/article/d30aff00d1c84d068c253e39f400d5b2 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_d30aff00d1c84d068c253e39f400d5b2 crossref_citationtrail_10_3389_fphy_2017_00071 crossref_primary_10_3389_fphy_2017_00071 |
| PublicationCentury | 2000 |
| PublicationDate | 2018-2-1 2018-02-01 |
| PublicationDateYYYYMMDD | 2018-02-01 |
| PublicationDate_xml | – month: 02 year: 2018 text: 2018-2-1 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Frontiers in physics |
| PublicationYear | 2018 |
| Publisher | Frontiers Media S.A |
| Publisher_xml | – name: Frontiers Media S.A |
| References | Wiering (B20) 2012 Neukart (B22) 2013; 2 Neukart (B25) 2017 Venturelli (B9) 2014; 5 B24 Perdomo-Ortiz (B10) 2015; 224 Sutton (B21) 1998 Jiang (B5) 2017; 16 Rieffel (B8) 2015; 14 Neukart (B23) 2014; 69 Lucas (B17) 2014; 2 Neukart (B16) 2017; 4 Smolin (B13) 2013; 2 B15 Smelyanskiy (B3) 2015; 118 Isakov (B6) 2015; 117 Babbush (B12) 2012 Korenkevych (B18) 2016 B1 Venturelli (B4) 2015 Lanting (B19) 2013; 4 Benedetti (B2) 2015; 94 Levit (B26) 2017 O'Gorman (B7) 2015; 224 Perdomo-Ortiz (B14) 2012; 2 Crawford (B27) 2016 Boixo (B11) 2014; 10 |
| References_xml | – year: 2015 ident: B4 article-title: Quantum annealing implementation of job-shop scheduling publication-title: arXiv:1506.08479v2 [quant-ph] – year: 2016 ident: B18 article-title: Benchmarking quantum hardware for training of fully visible Boltzmann machines publication-title: arXiv:1611.04528v1 [quant-ph]. – volume-title: Reinforcement Learning: An Introduction. year: 1998 ident: B21 – volume: 4 start-page: 021041 year: 2013 ident: B19 article-title: Entanglement in a quantum annealing processor publication-title: Phys Rev X doi: 10.1103/PhysRevX.4.021041 – volume: 4 start-page: 29 year: 2017 ident: B16 article-title: Traffic flow optimization using a quantum annealer publication-title: Front ICT doi: 10.3389/fict.2017.00029 – ident: B1 – start-page: pp. 3 volume-title: Reinforcement Learning. Adaptation, Learning, and Optimization year: 2012 ident: B20 article-title: Reinforcement learning and markov decision processes – volume: 10 start-page: 218 year: 2014 ident: B11 article-title: Evidence for quantum annealing with more than one hundred qubits publication-title: Nat. Phys. doi: 10.1038/nphys2900 – volume: 16 start-page: 89 year: 2017 ident: B5 article-title: Non-commuting two-local Hamiltonians for quantum error suppression publication-title: Quant Inf Process. doi: 10.1007/s11128-017-1527-9 – volume: 5 start-page: 031040 year: 2014 ident: B9 article-title: Quantum optimization of fully-connected spin glasses publication-title: Phys Rev X doi: 10.1103/PhysRevX.5.031040 – volume: 14 start-page: 1 year: 2015 ident: B8 article-title: A case study in programming a quantum annealer for hard operational planning problems publication-title: Quantum Inf Process doi: 10.1007/s11128-014-0892-x – volume: 2 start-page: 1 year: 2013 ident: B22 article-title: On quantum computers and artificial neural networks publication-title: Signal Process Res. – volume: 224 start-page: 131 year: 2015 ident: B10 article-title: A quantum annealing approach for fault detection and diagnosis of graph-based systems publication-title: Eur Phys J Spec Top. doi: 10.1140/epjst/e2015-02347-y – year: 2017 ident: B26 article-title: Free-energy-based reinforcement learning using a quantum processor – volume: 224 start-page: 163 year: 2015 ident: B7 article-title: Bayesian network structure learning using quantum annealing publication-title: Eur Phys J Spec Top. doi: 10.1140/epjst/e2015-02349-9 – volume: 2 start-page: 5 year: 2014 ident: B17 article-title: Ising formulations of many NP problems publication-title: Front Phys. doi: 10.3389/fphy.2014.00005 – volume: 94 start-page: 022308 year: 2015 ident: B2 article-title: Estimation of effective temperatures in quantum annealers for sampling applications: a case study with possible applications in deep learning publication-title: Phys Rev A doi: 10.1103/PhysRevA.94.022308 – year: 2012 ident: B12 article-title: Construction of energy functions for lattice heteropolymer models: efficient encodings for constraint satisfaction programming and quantum annealing advances in chemical physics publication-title: arXiv:1211.3422v2 [quant-ph]. – ident: B24 – volume: 2 start-page: 571 year: 2012 ident: B14 article-title: Finding low-energy conformations of lattice protein models by quantum annealing publication-title: Sci Rep. doi: 10.1038/srep00571 – year: 2016 ident: B27 article-title: Reinforcement learning using quantum Boltzmann machines publication-title: arXiv:1612.05695 [quant-ph]. – volume: 117 start-page: 180402 year: 2015 ident: B6 article-title: Understanding quantum tunneling through quantum Monte Carlo simulations publication-title: Phys Rev Lett. doi: 10.1103/PhysRevLett.117.180402 – volume: 69 start-page: 1509 year: 2014 ident: B23 article-title: Operations on quantum physical artificial neural structures publication-title: Proc Eng. doi: 10.1016/j.proeng.2014.03.148 – start-page: pp. 221 volume-title: Reverse Engineering the Mind. year: 2017 ident: B25 article-title: Quantum physics and the biological brain doi: 10.1007/978-3-658-16176-7_8 – ident: B15 – volume: 2 start-page: 52 year: 2013 ident: B13 article-title: Classical signature of quantum annealing publication-title: Front Phys. doi: 10.3389/fphy.2014.00052 – volume: 118 start-page: 066802 year: 2015 ident: B3 article-title: Quantum annealing via environment-mediated quantum diffusion publication-title: Phys Rev Lett. doi: 10.1103/PhysRevLett.118.066802 |
| SSID | ssj0001259824 |
| Score | 2.2498605 |
| Snippet | Quantum annealing algorithms belong to the class of metaheuristic tools, applicable for solving binary optimization problems. Hardware implementations of... |
| SourceID | doaj crossref |
| SourceType | Open Website Enrichment Source Index Database |
| SubjectTerms | quantum annealing quantum computing quantum-classical quantum-enhanced algorithms reinforcement learning |
| Title | Quantum-Enhanced Reinforcement Learning for Finite-Episode Games with Discrete State Spaces |
| URI | https://doaj.org/article/d30aff00d1c84d068c253e39f400d5b2 |
| Volume | 5 |
| WOSCitedRecordID | wos000423814200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2296-424X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001259824 issn: 2296-424X databaseCode: DOA dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources (ISSN International Center) customDbUrl: eissn: 2296-424X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001259824 issn: 2296-424X databaseCode: M~E dateStart: 20130101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV05T8MwFLZQBRIL4hTlkgcGFtMkThp75EhhoeKUKjFE8QWRIK16MPLbec8JVRkQC0ukWHbkfL6-9_LyPUKOrbUSDAfBeNwFA0WFhkkuNHOJs4FMXdAVxiebSPt9MRjI24VUXxgTVssD18B1DA8K54LAhFrEBprqKOGWSweTzyTK777AehaMqdq7gsJ0ca3lA1aY7DjoNUZyecXCNPxxDC2o9ftjpbdO1ho-SM_qfmyQJVttkhUfl6knW-T5bgavPntnWfXqv9XTe-u1TrV369FGHvWFQhHtlUggWTYqJ0Nj6RUGwFJ0tNLLEnYHoMfUc0v6MMJArG3y1MseL65Zkw-BaR6LKetaaUInkqgoYCniehGFkJFRRkY6VqlVsYM7FTqrHTIpC1wDjucC_3a1qeM7pFUNK7tLqHIqDYwAbDmYSHEopNAiMUXqQuECa9rk9BueXDdi4Ziz4i0HowHxzBHPHPHMPZ5tcjJvMKp1Mn6veo54z6uhwLUvgGHPm2HP_xr2vf94yD5ZhW6JOgj7gLSm45k9JMv6Y1pOxkd-RsH15jP7AlKM0es |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Quantum-Enhanced+Reinforcement+Learning+for+Finite-Episode+Games+with+Discrete+State+Spaces&rft.jtitle=Frontiers+in+physics&rft.au=Florian+Neukart&rft.au=David+Von+Dollen&rft.au=Christian+Seidel&rft.au=Gabriele+Compostella&rft.date=2018-02-01&rft.pub=Frontiers+Media+S.A&rft.eissn=2296-424X&rft.volume=5&rft_id=info:doi/10.3389%2Ffphy.2017.00071&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_d30aff00d1c84d068c253e39f400d5b2 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2296-424X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2296-424X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2296-424X&client=summon |