Quantum-Enhanced Reinforcement Learning for Finite-Episode Games with Discrete State Spaces

Quantum annealing algorithms belong to the class of metaheuristic tools, applicable for solving binary optimization problems. Hardware implementations of quantum annealing, such as the quantum annealing machines produced by D-Wave Systems [1], have been subject to multiple analyses in research, with...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Frontiers in physics Jg. 5
Hauptverfasser: Neukart, Florian, Von Dollen, David, Seidel, Christian, Compostella, Gabriele
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Frontiers Media S.A 01.02.2018
Schlagworte:
ISSN:2296-424X, 2296-424X
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Quantum annealing algorithms belong to the class of metaheuristic tools, applicable for solving binary optimization problems. Hardware implementations of quantum annealing, such as the quantum annealing machines produced by D-Wave Systems [1], have been subject to multiple analyses in research, with the aim of characterizing the technology's usefulness for optimization and sampling tasks [2–16]. Here, we present a way to partially embed both Monte Carlo policy iteration for finding an optimal policy on random observations, as well as how to embed n sub-optimal state-value functions for approximating an improved state-value function given a policy for finite horizon games with discrete state spaces on a D-Wave 2000Q quantum processing unit (QPU). We explain how both problems can be expressed as a quadratic unconstrained binary optimization (QUBO) problem, and show that quantum-enhanced Monte Carlo policy evaluation allows for finding equivalent or better state-value functions for a given policy with the same number episodes compared to a purely classical Monte Carlo algorithm. Additionally, we describe a quantum-classical policy learning algorithm. Our first and foremost aim is to explain how to represent and solve parts of these problems with the help of the QPU, and not to prove supremacy over every existing classical policy evaluation algorithm.
AbstractList Quantum annealing algorithms belong to the class of metaheuristic tools, applicable for solving binary optimization problems. Hardware implementations of quantum annealing, such as the quantum annealing machines produced by D-Wave Systems [1], have been subject to multiple analyses in research, with the aim of characterizing the technology's usefulness for optimization and sampling tasks [2–16]. Here, we present a way to partially embed both Monte Carlo policy iteration for finding an optimal policy on random observations, as well as how to embed n sub-optimal state-value functions for approximating an improved state-value function given a policy for finite horizon games with discrete state spaces on a D-Wave 2000Q quantum processing unit (QPU). We explain how both problems can be expressed as a quadratic unconstrained binary optimization (QUBO) problem, and show that quantum-enhanced Monte Carlo policy evaluation allows for finding equivalent or better state-value functions for a given policy with the same number episodes compared to a purely classical Monte Carlo algorithm. Additionally, we describe a quantum-classical policy learning algorithm. Our first and foremost aim is to explain how to represent and solve parts of these problems with the help of the QPU, and not to prove supremacy over every existing classical policy evaluation algorithm.
Author Neukart, Florian
Seidel, Christian
Compostella, Gabriele
Von Dollen, David
Author_xml – sequence: 1
  givenname: Florian
  surname: Neukart
  fullname: Neukart, Florian
– sequence: 2
  givenname: David
  surname: Von Dollen
  fullname: Von Dollen, David
– sequence: 3
  givenname: Christian
  surname: Seidel
  fullname: Seidel, Christian
– sequence: 4
  givenname: Gabriele
  surname: Compostella
  fullname: Compostella, Gabriele
BookMark eNp1UE1LAzEQDaJgrT173T-wNcnudjdH0bYWCuIXCB7CbDJpI222JCnSf2_aKojgZT4e897MvAty6jqHhFwxOiyKRlybzXI35JTVQ0ppzU5Ij3Mxyktevp3-qs_JIISPNMJ4JRpe9sj74xZc3K7zsVuCU6izJ7TOdF7hGl3M5gjeWbfIEpRNrLMR8_HGhk5jNoU1huzTxmV2Z4PyGDF7jrCPG1AYLsmZgVXAwXfuk9fJ-OX2Pp8_TGe3N_NcFWUT8xEKzUxTcQBasJbSUQON4LrVgquyrbEtTepaZlClMcZQ1KOKM-B1I7A2RZ_Mjrq6gw-58XYNfic7sPIAdH4hwUerVih1QcEYSjVTTanTJsWrAgthyoRVLU9a1VFL-S4Ej0Yqm16ynYse7EoyKveGy73hcm-4PBieeNd_eD93_Mf4Au-Whp4
CitedBy_id crossref_primary_10_1038_s41598_021_92295_9
crossref_primary_10_3389_fphy_2018_00055
crossref_primary_10_1007_s42484_024_00179_8
crossref_primary_10_1088_1361_6633_ac8c54
crossref_primary_10_1007_s11227_025_07662_4
crossref_primary_10_1038_s41598_020_60022_5
crossref_primary_10_3389_fcomp_2019_00009
crossref_primary_10_7566_JPSJ_88_061005
crossref_primary_10_1088_1367_2630_ac5b56
crossref_primary_10_1103_PRXQuantum_2_010328
crossref_primary_10_7566_JPSJ_89_085001
crossref_primary_10_3389_fcomp_2019_00002
crossref_primary_10_1007_s42484_021_00049_7
crossref_primary_10_1155_2023_2451990
crossref_primary_10_1088_2058_9565_ad261b
crossref_primary_10_1007_s11128_025_04870_y
crossref_primary_10_1088_2058_9565_aaef5e
crossref_primary_10_1103_PhysRevResearch_3_033006
crossref_primary_10_1007_s42484_020_00026_6
Cites_doi 10.1103/PhysRevX.4.021041
10.3389/fict.2017.00029
10.1038/nphys2900
10.1007/s11128-017-1527-9
10.1103/PhysRevX.5.031040
10.1007/s11128-014-0892-x
10.1140/epjst/e2015-02347-y
10.1140/epjst/e2015-02349-9
10.3389/fphy.2014.00005
10.1103/PhysRevA.94.022308
10.1038/srep00571
10.1103/PhysRevLett.117.180402
10.1016/j.proeng.2014.03.148
10.1007/978-3-658-16176-7_8
10.3389/fphy.2014.00052
10.1103/PhysRevLett.118.066802
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.3389/fphy.2017.00071
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 2296-424X
ExternalDocumentID oai_doaj_org_article_d30aff00d1c84d068c253e39f400d5b2
10_3389_fphy_2017_00071
GroupedDBID 5VS
9T4
AAFWJ
AAYXX
ACGFS
ADBBV
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
GROUPED_DOAJ
KQ8
M~E
OK1
ID FETCH-LOGICAL-c348t-6e9d1f852aa031b0068a892dbd92c4b7eb4f2dbb1fecf8511e976521a2789e7f3
IEDL.DBID DOA
ISICitedReferencesCount 21
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000423814200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2296-424X
IngestDate Fri Oct 03 12:43:48 EDT 2025
Tue Nov 18 22:04:51 EST 2025
Sat Nov 29 05:42:27 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c348t-6e9d1f852aa031b0068a892dbd92c4b7eb4f2dbb1fecf8511e976521a2789e7f3
OpenAccessLink https://doaj.org/article/d30aff00d1c84d068c253e39f400d5b2
ParticipantIDs doaj_primary_oai_doaj_org_article_d30aff00d1c84d068c253e39f400d5b2
crossref_citationtrail_10_3389_fphy_2017_00071
crossref_primary_10_3389_fphy_2017_00071
PublicationCentury 2000
PublicationDate 2018-2-1
2018-02-01
PublicationDateYYYYMMDD 2018-02-01
PublicationDate_xml – month: 02
  year: 2018
  text: 2018-2-1
  day: 01
PublicationDecade 2010
PublicationTitle Frontiers in physics
PublicationYear 2018
Publisher Frontiers Media S.A
Publisher_xml – name: Frontiers Media S.A
References Wiering (B20) 2012
Neukart (B22) 2013; 2
Neukart (B25) 2017
Venturelli (B9) 2014; 5
B24
Perdomo-Ortiz (B10) 2015; 224
Sutton (B21) 1998
Jiang (B5) 2017; 16
Rieffel (B8) 2015; 14
Neukart (B23) 2014; 69
Lucas (B17) 2014; 2
Neukart (B16) 2017; 4
Smolin (B13) 2013; 2
B15
Smelyanskiy (B3) 2015; 118
Isakov (B6) 2015; 117
Babbush (B12) 2012
Korenkevych (B18) 2016
B1
Venturelli (B4) 2015
Lanting (B19) 2013; 4
Benedetti (B2) 2015; 94
Levit (B26) 2017
O'Gorman (B7) 2015; 224
Perdomo-Ortiz (B14) 2012; 2
Crawford (B27) 2016
Boixo (B11) 2014; 10
References_xml – year: 2015
  ident: B4
  article-title: Quantum annealing implementation of job-shop scheduling
  publication-title: arXiv:1506.08479v2 [quant-ph]
– year: 2016
  ident: B18
  article-title: Benchmarking quantum hardware for training of fully visible Boltzmann machines
  publication-title: arXiv:1611.04528v1 [quant-ph].
– volume-title: Reinforcement Learning: An Introduction.
  year: 1998
  ident: B21
– volume: 4
  start-page: 021041
  year: 2013
  ident: B19
  article-title: Entanglement in a quantum annealing processor
  publication-title: Phys Rev X
  doi: 10.1103/PhysRevX.4.021041
– volume: 4
  start-page: 29
  year: 2017
  ident: B16
  article-title: Traffic flow optimization using a quantum annealer
  publication-title: Front ICT
  doi: 10.3389/fict.2017.00029
– ident: B1
– start-page: pp. 3
  volume-title: Reinforcement Learning. Adaptation, Learning, and Optimization
  year: 2012
  ident: B20
  article-title: Reinforcement learning and markov decision processes
– volume: 10
  start-page: 218
  year: 2014
  ident: B11
  article-title: Evidence for quantum annealing with more than one hundred qubits
  publication-title: Nat. Phys.
  doi: 10.1038/nphys2900
– volume: 16
  start-page: 89
  year: 2017
  ident: B5
  article-title: Non-commuting two-local Hamiltonians for quantum error suppression
  publication-title: Quant Inf Process.
  doi: 10.1007/s11128-017-1527-9
– volume: 5
  start-page: 031040
  year: 2014
  ident: B9
  article-title: Quantum optimization of fully-connected spin glasses
  publication-title: Phys Rev X
  doi: 10.1103/PhysRevX.5.031040
– volume: 14
  start-page: 1
  year: 2015
  ident: B8
  article-title: A case study in programming a quantum annealer for hard operational planning problems
  publication-title: Quantum Inf Process
  doi: 10.1007/s11128-014-0892-x
– volume: 2
  start-page: 1
  year: 2013
  ident: B22
  article-title: On quantum computers and artificial neural networks
  publication-title: Signal Process Res.
– volume: 224
  start-page: 131
  year: 2015
  ident: B10
  article-title: A quantum annealing approach for fault detection and diagnosis of graph-based systems
  publication-title: Eur Phys J Spec Top.
  doi: 10.1140/epjst/e2015-02347-y
– year: 2017
  ident: B26
  article-title: Free-energy-based reinforcement learning using a quantum processor
– volume: 224
  start-page: 163
  year: 2015
  ident: B7
  article-title: Bayesian network structure learning using quantum annealing
  publication-title: Eur Phys J Spec Top.
  doi: 10.1140/epjst/e2015-02349-9
– volume: 2
  start-page: 5
  year: 2014
  ident: B17
  article-title: Ising formulations of many NP problems
  publication-title: Front Phys.
  doi: 10.3389/fphy.2014.00005
– volume: 94
  start-page: 022308
  year: 2015
  ident: B2
  article-title: Estimation of effective temperatures in quantum annealers for sampling applications: a case study with possible applications in deep learning
  publication-title: Phys Rev A
  doi: 10.1103/PhysRevA.94.022308
– year: 2012
  ident: B12
  article-title: Construction of energy functions for lattice heteropolymer models: efficient encodings for constraint satisfaction programming and quantum annealing advances in chemical physics
  publication-title: arXiv:1211.3422v2 [quant-ph].
– ident: B24
– volume: 2
  start-page: 571
  year: 2012
  ident: B14
  article-title: Finding low-energy conformations of lattice protein models by quantum annealing
  publication-title: Sci Rep.
  doi: 10.1038/srep00571
– year: 2016
  ident: B27
  article-title: Reinforcement learning using quantum Boltzmann machines
  publication-title: arXiv:1612.05695 [quant-ph].
– volume: 117
  start-page: 180402
  year: 2015
  ident: B6
  article-title: Understanding quantum tunneling through quantum Monte Carlo simulations
  publication-title: Phys Rev Lett.
  doi: 10.1103/PhysRevLett.117.180402
– volume: 69
  start-page: 1509
  year: 2014
  ident: B23
  article-title: Operations on quantum physical artificial neural structures
  publication-title: Proc Eng.
  doi: 10.1016/j.proeng.2014.03.148
– start-page: pp. 221
  volume-title: Reverse Engineering the Mind.
  year: 2017
  ident: B25
  article-title: Quantum physics and the biological brain
  doi: 10.1007/978-3-658-16176-7_8
– ident: B15
– volume: 2
  start-page: 52
  year: 2013
  ident: B13
  article-title: Classical signature of quantum annealing
  publication-title: Front Phys.
  doi: 10.3389/fphy.2014.00052
– volume: 118
  start-page: 066802
  year: 2015
  ident: B3
  article-title: Quantum annealing via environment-mediated quantum diffusion
  publication-title: Phys Rev Lett.
  doi: 10.1103/PhysRevLett.118.066802
SSID ssj0001259824
Score 2.2498605
Snippet Quantum annealing algorithms belong to the class of metaheuristic tools, applicable for solving binary optimization problems. Hardware implementations of...
SourceID doaj
crossref
SourceType Open Website
Enrichment Source
Index Database
SubjectTerms quantum annealing
quantum computing
quantum-classical
quantum-enhanced algorithms
reinforcement learning
Title Quantum-Enhanced Reinforcement Learning for Finite-Episode Games with Discrete State Spaces
URI https://doaj.org/article/d30aff00d1c84d068c253e39f400d5b2
Volume 5
WOSCitedRecordID wos000423814200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2296-424X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001259824
  issn: 2296-424X
  databaseCode: DOA
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources (ISSN International Center)
  customDbUrl:
  eissn: 2296-424X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001259824
  issn: 2296-424X
  databaseCode: M~E
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV05T8MwFLZQBRIL4hTlkgcGFtMkThp75EhhoeKUKjFE8QWRIK16MPLbec8JVRkQC0ukWHbkfL6-9_LyPUKOrbUSDAfBeNwFA0WFhkkuNHOJs4FMXdAVxiebSPt9MRjI24VUXxgTVssD18B1DA8K54LAhFrEBprqKOGWSweTzyTK777AehaMqdq7gsJ0ca3lA1aY7DjoNUZyecXCNPxxDC2o9ftjpbdO1ho-SM_qfmyQJVttkhUfl6knW-T5bgavPntnWfXqv9XTe-u1TrV369FGHvWFQhHtlUggWTYqJ0Nj6RUGwFJ0tNLLEnYHoMfUc0v6MMJArG3y1MseL65Zkw-BaR6LKetaaUInkqgoYCniehGFkJFRRkY6VqlVsYM7FTqrHTIpC1wDjucC_3a1qeM7pFUNK7tLqHIqDYwAbDmYSHEopNAiMUXqQuECa9rk9BueXDdi4Ziz4i0HowHxzBHPHPHMPZ5tcjJvMKp1Mn6veo54z6uhwLUvgGHPm2HP_xr2vf94yD5ZhW6JOgj7gLSm45k9JMv6Y1pOxkd-RsH15jP7AlKM0es
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Quantum-Enhanced+Reinforcement+Learning+for+Finite-Episode+Games+with+Discrete+State+Spaces&rft.jtitle=Frontiers+in+physics&rft.au=Florian+Neukart&rft.au=David+Von+Dollen&rft.au=Christian+Seidel&rft.au=Gabriele+Compostella&rft.date=2018-02-01&rft.pub=Frontiers+Media+S.A&rft.eissn=2296-424X&rft.volume=5&rft_id=info:doi/10.3389%2Ffphy.2017.00071&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_d30aff00d1c84d068c253e39f400d5b2
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2296-424X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2296-424X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2296-424X&client=summon