Human-in-the-Loop Reinforcement Learning: A Survey and Position on Requirements, Challenges, and Opportunities

Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with superhuman performance. However, we consider RL as fundamentally a Human-in-the-Loop (HITL) paradigm, even when an agent eventually performs its ta...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of artificial intelligence research Jg. 79; S. 359 - 415
Hauptverfasser: Retzlaff, Carl Orge, Das, Srijita, Wayllace, Christabel, Mousavi, Payam, Afshari, Mohammad, Yang, Tianpei, Saranti, Anna, Angerschmid, Alessa, Taylor, Matthew E., Holzinger, Andreas
Format: Journal Article
Sprache:Englisch
Veröffentlicht: San Francisco AI Access Foundation 01.01.2024
Schlagworte:
ISSN:1076-9757, 1076-9757, 1943-5037
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with superhuman performance. However, we consider RL as fundamentally a Human-in-the-Loop (HITL) paradigm, even when an agent eventually performs its task autonomously.  In cases where the reward function is challenging or impossible to define, HITL approaches are considered particularly advantageous. The application of Reinforcement Learning from Human Feedback (RLHF) in systems such as ChatGPT demonstrates the effectiveness of optimizing for user experience and integrating their feedback into the training loop. In HITL RL, human input is integrated during the agent’s learning process, allowing iterative updates and fine-tuning based on human feedback, thus enhancing the agent’s performance. Since the human is an essential part of this process, we argue that human-centric approaches are the key to successful RL, a fact that has not been adequately considered in the existing literature. This paper aims to inform readers about current explainability methods in HITL RL. It also shows how the application of explainable AI (xAI) and specific improvements to existing explainability approaches can enable a better human-agent interaction in HITL RL for all types of users, whether for lay people, domain experts, or machine learning specialists. Accounting for the workflow in HITL RL and based on software and machine learning methodologies, this article identifies four phases for human involvement for creating HITL RL systems: (1) Agent Development, (2) Agent Learning, (3) Agent Evaluation, and (4) Agent Deployment. We highlight human involvement, explanation requirements, new challenges, and goals for each phase. We furthermore identify low-risk, high-return opportunities for explainability research in HITL RL and present long-term research goals to advance the field. Finally, we propose a vision of human-robot collaboration that allows both parties to reach their full potential and cooperate effectively.
AbstractList Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with superhuman performance. However, we consider RL as fundamentally a Human-in-the-Loop (HITL) paradigm, even when an agent eventually performs its task autonomously.In cases where the reward function is challenging or impossible to define, HITL approaches are considered particularly advantageous.The application of Reinforcement Learning from Human Feedback (RLHF) in systems such as ChatGPT demonstrates the effectiveness of optimizing for user experience and integrating their feedback into the training loop. In HITL RL, human input is integrated during the agent’s learning process, allowing iterative updates and fine-tuning based on human feedback, thus enhancing the agent’s performance. Since the human is an essential part of this process, we argue that human-centric approaches are the key to successful RL, a fact that has not been adequately considered in the existing literature. This paper aims to inform readers about current explainability methods in HITL RL. It also shows how the application of explainable AI (xAI) and specific improvements to existing explainability approaches can enable a better human-agent interaction in HITL RL for all types of users, whether for lay people, domain experts, or machine learning specialists.Accounting for the workflow in HITL RL and based on software and machine learning methodologies, this article identifies four phases for human involvement for creating HITL RL systems: (1) Agent Development, (2) Agent Learning, (3) Agent Evaluation, and (4) Agent Deployment. We highlight human involvement, explanation requirements, new challenges, and goals for each phase.We furthermore identify low-risk, high-return opportunities for explainability research in HITL RL and present long-term research goals to advance the field. Finally, we propose a vision of human-robot collaboration that allows both parties to reach their full potential and cooperate effectively.
Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with superhuman performance. However, we consider RL as fundamentally a Human-in-the-Loop (HITL) paradigm, even when an agent eventually performs its task autonomously.  In cases where the reward function is challenging or impossible to define, HITL approaches are considered particularly advantageous. The application of Reinforcement Learning from Human Feedback (RLHF) in systems such as ChatGPT demonstrates the effectiveness of optimizing for user experience and integrating their feedback into the training loop. In HITL RL, human input is integrated during the agent’s learning process, allowing iterative updates and fine-tuning based on human feedback, thus enhancing the agent’s performance. Since the human is an essential part of this process, we argue that human-centric approaches are the key to successful RL, a fact that has not been adequately considered in the existing literature. This paper aims to inform readers about current explainability methods in HITL RL. It also shows how the application of explainable AI (xAI) and specific improvements to existing explainability approaches can enable a better human-agent interaction in HITL RL for all types of users, whether for lay people, domain experts, or machine learning specialists. Accounting for the workflow in HITL RL and based on software and machine learning methodologies, this article identifies four phases for human involvement for creating HITL RL systems: (1) Agent Development, (2) Agent Learning, (3) Agent Evaluation, and (4) Agent Deployment. We highlight human involvement, explanation requirements, new challenges, and goals for each phase. We furthermore identify low-risk, high-return opportunities for explainability research in HITL RL and present long-term research goals to advance the field. Finally, we propose a vision of human-robot collaboration that allows both parties to reach their full potential and cooperate effectively.
Author Retzlaff, Carl Orge
Saranti, Anna
Holzinger, Andreas
Mousavi, Payam
Yang, Tianpei
Angerschmid, Alessa
Taylor, Matthew E.
Afshari, Mohammad
Das, Srijita
Wayllace, Christabel
Author_xml – sequence: 1
  givenname: Carl Orge
  surname: Retzlaff
  fullname: Retzlaff, Carl Orge
– sequence: 2
  givenname: Srijita
  surname: Das
  fullname: Das, Srijita
– sequence: 3
  givenname: Christabel
  surname: Wayllace
  fullname: Wayllace, Christabel
– sequence: 4
  givenname: Payam
  surname: Mousavi
  fullname: Mousavi, Payam
– sequence: 5
  givenname: Mohammad
  surname: Afshari
  fullname: Afshari, Mohammad
– sequence: 6
  givenname: Tianpei
  surname: Yang
  fullname: Yang, Tianpei
– sequence: 7
  givenname: Anna
  surname: Saranti
  fullname: Saranti, Anna
– sequence: 8
  givenname: Alessa
  surname: Angerschmid
  fullname: Angerschmid, Alessa
– sequence: 9
  givenname: Matthew E.
  surname: Taylor
  fullname: Taylor, Matthew E.
– sequence: 10
  givenname: Andreas
  surname: Holzinger
  fullname: Holzinger, Andreas
BookMark eNptkNFLwzAQxoNMcE7f_AMCvq4zabq08W0MdUJhMvW5ZM11y-iSLmmF_femmw8iwsF9HL_vjvuu0cBYAwjdUTKhnLKHndRuQid0ypLsAg0pSXkk0mk6-KWv0LX3O0KoSOJsiMyi20sTaRO1W4hyaxu8Am0q60rYg2lxDtIZbTaPeIbfO_cFRyyNwm_W61Zbg0Ot4NBpd8L9GM-3sq7BbCDonlw2jXVtZwIO_gZdVrL2cPvTR-jz-eljvojy5cvrfJZHJSO0jeIpsIRTqTLGlBIlCBBKpZJQGjQXKqkSURFRCsbXWZwABypKnq7DIEsVYyN0f97bOHvowLfFznbOhJMFo1kieJLGcaDiM1U6672Dqih1K_u3Wid1XVBS9LkWfa4FLU65BtP4j6lxei_d8X_8GxXzfYI
CitedBy_id crossref_primary_10_1007_s10209_024_01123_0
crossref_primary_10_1007_s10209_024_01163_6
crossref_primary_10_1016_j_progpolymsci_2024_101874
crossref_primary_10_3390_systems13090783
crossref_primary_10_1080_1553118X_2025_2483675
crossref_primary_10_1109_ACCESS_2024_3395532
crossref_primary_10_1016_j_cobeha_2025_101482
crossref_primary_10_3390_technologies12120259
crossref_primary_10_1109_ACCESS_2024_3504735
crossref_primary_10_1109_ACCESS_2025_3556187
crossref_primary_10_1080_2331186X_2024_2412492
crossref_primary_10_1007_s10660_024_09876_9
crossref_primary_10_3390_s24030798
crossref_primary_10_1371_journal_pone_0320777
crossref_primary_10_1002_ange_202513147
crossref_primary_10_1007_s10462_025_11255_1
crossref_primary_10_1109_ACCESS_2024_3401547
crossref_primary_10_1016_j_imu_2024_101587
crossref_primary_10_1007_s12555_025_0127_1
crossref_primary_10_1007_s10729_025_09699_6
crossref_primary_10_1007_s10342_024_01673_1
crossref_primary_10_1016_j_eswa_2025_127794
crossref_primary_10_1080_10549811_2025_2513220
crossref_primary_10_1088_1748_9326_ad959f
crossref_primary_10_3390_math12071024
crossref_primary_10_1016_j_mimet_2025_107232
crossref_primary_10_1007_s00299_024_03294_9
crossref_primary_10_1016_j_aei_2025_103864
crossref_primary_10_1145_3670685
crossref_primary_10_1109_JPROC_2025_3584656
crossref_primary_10_1007_s00521_024_10512_8
crossref_primary_10_1007_s11227_025_07386_5
crossref_primary_10_1002_anie_202513147
crossref_primary_10_1038_s41598_024_72748_7
crossref_primary_10_3390_admsci14070152
crossref_primary_10_1016_j_neucom_2025_129780
crossref_primary_10_1016_j_knosys_2024_112551
ContentType Journal Article
Copyright 2024. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the associated terms available at https://www.jair.org/index.php/jair/about
Copyright_xml – notice: 2024. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the associated terms available at https://www.jair.org/index.php/jair/about
DBID AAYXX
CITATION
8FE
8FG
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
COVID
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
DOI 10.1613/jair.1.15348
DatabaseName CrossRef
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest SciTech Premium Collection Technology Collection Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
ProQuest Technology Collection
ProQuest One
Coronavirus Research Database
ProQuest Central Korea
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
DatabaseTitle CrossRef
Publicly Available Content Database
Advanced Technologies & Aerospace Collection
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest One Academic Eastern Edition
Coronavirus Research Database
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList Publicly Available Content Database
CrossRef
Database_xml – sequence: 1
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1076-9757
1943-5037
EndPage 415
ExternalDocumentID 10_1613_jair_1_15348
GroupedDBID .DC
29J
2WC
5GY
5VS
AAKMM
AAKPC
AALFJ
AAYFX
AAYXX
ACGFO
ACM
ADBBV
ADBSK
ADMLS
AEFXT
AEJOY
AENEX
AFFHD
AFKRA
AFWXC
AKRVB
ALMA_UNASSIGNED_HOLDINGS
AMVHM
ARAPS
BCNDV
BENPR
BGLVJ
CCPQU
CITATION
E3Z
EBS
EJD
F5P
FRJ
FRP
GROUPED_DOAJ
GUFHI
HCIFZ
K7-
KQ8
LHSKQ
LPJ
OK1
OVT
P2P
PHGZM
PHGZT
PIMPY
PQGLB
RNS
TR2
XSB
8FE
8FG
ABUWG
AZQEC
COVID
DWQXO
GNUQQ
JQ2
P62
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-c301t-25e3461ad833dd9ce9e9dd7a011ce969d4f49f09c936b824e6e19c67bc9387d33
IEDL.DBID K7-
ISICitedReferencesCount 55
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001157178000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1076-9757
IngestDate Fri Jul 25 21:17:42 EDT 2025
Tue Nov 18 22:34:28 EST 2025
Sat Nov 29 05:27:06 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c301t-25e3461ad833dd9ce9e9dd7a011ce969d4f49f09c936b824e6e19c67bc9387d33
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://www.proquest.com/docview/3184964722?pq-origsite=%requestingapplication%
PQID 3184964722
PQPubID 5160723
PageCount 57
ParticipantIDs proquest_journals_3184964722
crossref_citationtrail_10_1613_jair_1_15348
crossref_primary_10_1613_jair_1_15348
PublicationCentury 2000
PublicationDate 2024-01-01
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – month: 01
  year: 2024
  text: 2024-01-01
  day: 01
PublicationDecade 2020
PublicationPlace San Francisco
PublicationPlace_xml – name: San Francisco
PublicationTitle The Journal of artificial intelligence research
PublicationYear 2024
Publisher AI Access Foundation
Publisher_xml – name: AI Access Foundation
SSID ssj0019428
Score 2.6444902
Snippet Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with...
SourceID proquest
crossref
SourceType Aggregation Database
Enrichment Source
Index Database
StartPage 359
SubjectTerms Agents (artificial intelligence)
Artificial intelligence
Explainable artificial intelligence
Feedback
Machine learning
User experience
Workflow
Title Human-in-the-Loop Reinforcement Learning: A Survey and Position on Requirements, Challenges, and Opportunities
URI https://www.proquest.com/docview/3184964722
Volume 79
WOSCitedRecordID wos001157178000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1076-9757
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0019428
  issn: 1076-9757
  databaseCode: DOA
  dateStart: 19930101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 1076-9757
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0019428
  issn: 1076-9757
  databaseCode: K7-
  dateStart: 19930101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 1076-9757
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0019428
  issn: 1076-9757
  databaseCode: BENPR
  dateStart: 19930101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database
  customDbUrl:
  eissn: 1076-9757
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0019428
  issn: 1076-9757
  databaseCode: PIMPY
  dateStart: 19930101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELaAMrBQnuJRKg8wgYE4fsQsqKAikKBEBaQyRYntoiKUlr4k_j224xQYYEHKYDk3RLmn7873AbAvcWpTaxQJJhUiStj6bkqQNKE2pSxloS7AJnirFXU6IvYJt5FvqyxtojPUqi9tjvzEyB6xtyYxPh-8I4saZaurHkJjHlQCbIywLcpyNKsiCIKLq3CcIcEp943vxoOdvKa94XFwbPTdIv98d0k_LbJzM1fV_37gClj2ASZsFBKxCuZ0vgaqJXgD9Lq8DnKXvke9HJkQEN32-wPY1m6MqnQZQ-gnr76cwQZ8mAyn-gOmuYKx7_KC5mlr20fsyEdH8LLEZTFrS3k_sLH9JHczWzfA01Xz8fIaefAFJI3OjxGmOiQsSFUUhkoJqYUWSvHU2AOzZkKRLhHdUyFFyLIIE810ICTjmdmIuArDTbCQ93O9BSAjkdLZqVYMpySjNKPdiEgTibAuZTzE2-Cw_P-J9JPJLUDGW2JPKIZbieVWEiSOW9vgYEY9KCZy_EJXK_mUeL0cJV9M2vn79S5YwiZ8KZItNbAwHk70HliU03FvNKyDykWzFbfr7gRfd0Jn9uKbu_j5EzQK4Jo
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9swDCaKrsB2Wdc90G5dq8Ny2tQmsh7WgKIo-kCCZFmQZUBvri0pRYbByfLo0D-131hKttPt0N5yGOCDbPNi6TNJkRQ_gA-GpT60JqiWxlJutc_vppwadLWFkKmMXEE2obrd-PJS99bgT3UWxpdVVjoxKGo7Nj5GfojY4_7UJGPHk1_Us0b57GpFoVHAou1uf-OWbXbUOsP1rTF2cT44bdKSVYAaBPOcMuEiLhupjaPIWm2cdtpalSLQcSy15UOuh3VtdCSzmHEnXUMbqTJ8ECvrA6Co8p9wvPH_VVvRZdZCc1YcvVOSaiVUWWiPFvPwRzqaHjQOUL94pqG_TeC_FiCYtYvN_21CXsDz0oEmJwXit2DN5S9hsyKnIKWuegV5SE_QUU7RxaWd8XhC-i60iTUhIkrKzrLXn8kJ-baY3rhbkuaW9MoqNoJX3_k66SA--0ROK94ZHHvJrxO_d1nkoSfta_i-kq9-A-v5OHfbQCSPrcvqzkqW8kyITAxjbtDTkkMhVcR24GO13okpO697ApCfid-BIToSj46kkQR07EBtKT0pOo48ILdb4SIp9c4suQfF28df78PT5uBLJ-m0uu138Iyhq1YElnZhfT5duPewYW7mo9l0L0CcwNWqIXQHMtA5eA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Human-in-the-Loop+Reinforcement+Learning%3A+A+Survey+and+Position+on+Requirements%2C+Challenges%2C+and+Opportunities&rft.jtitle=The+Journal+of+artificial+intelligence+research&rft.au=Retzlaff%2C+Carl+Orge&rft.au=Das%2C+Srijita&rft.au=Wayllace%2C+Christabel&rft.au=Mousavi%2C+Payam&rft.date=2024-01-01&rft.issn=1076-9757&rft.eissn=1076-9757&rft.volume=79&rft.spage=359&rft.epage=415&rft_id=info:doi/10.1613%2Fjair.1.15348&rft.externalDBID=n%2Fa&rft.externalDocID=10_1613_jair_1_15348
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1076-9757&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1076-9757&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1076-9757&client=summon