Human-in-the-Loop Reinforcement Learning: A Survey and Position on Requirements, Challenges, and Opportunities
Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with superhuman performance. However, we consider RL as fundamentally a Human-in-the-Loop (HITL) paradigm, even when an agent eventually performs its ta...
Gespeichert in:
| Veröffentlicht in: | The Journal of artificial intelligence research Jg. 79; S. 359 - 415 |
|---|---|
| Hauptverfasser: | , , , , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
San Francisco
AI Access Foundation
01.01.2024
|
| Schlagworte: | |
| ISSN: | 1076-9757, 1076-9757, 1943-5037 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with superhuman performance. However, we consider RL as fundamentally a Human-in-the-Loop (HITL) paradigm, even when an agent eventually performs its task autonomously.
In cases where the reward function is challenging or impossible to define, HITL approaches are considered particularly advantageous.
The application of Reinforcement Learning from Human Feedback (RLHF) in systems such as ChatGPT demonstrates the effectiveness of optimizing for user experience and integrating their feedback into the training loop. In HITL RL, human input is integrated during the agent’s learning process, allowing iterative updates and fine-tuning based on human feedback, thus enhancing the agent’s performance. Since the human is an essential part of this process, we argue that human-centric approaches are the key to successful RL, a fact that has not been adequately considered in the existing literature. This paper aims to inform readers about current explainability methods in HITL RL. It also shows how the application of explainable AI (xAI) and specific improvements to existing explainability approaches can enable a better human-agent interaction in HITL RL for all types of users, whether for lay people, domain experts, or machine learning specialists.
Accounting for the workflow in HITL RL and based on software and machine learning methodologies, this article identifies four phases for human involvement for creating HITL RL systems: (1) Agent Development, (2) Agent Learning, (3) Agent Evaluation, and (4) Agent Deployment. We highlight human involvement, explanation requirements, new challenges, and goals for each phase.
We furthermore identify low-risk, high-return opportunities for explainability research in HITL RL and present long-term research goals to advance the field. Finally, we propose a vision of human-robot collaboration that allows both parties to reach their full potential and cooperate effectively. |
|---|---|
| AbstractList | Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with superhuman performance. However, we consider RL as fundamentally a Human-in-the-Loop (HITL) paradigm, even when an agent eventually performs its task autonomously.In cases where the reward function is challenging or impossible to define, HITL approaches are considered particularly advantageous.The application of Reinforcement Learning from Human Feedback (RLHF) in systems such as ChatGPT demonstrates the effectiveness of optimizing for user experience and integrating their feedback into the training loop. In HITL RL, human input is integrated during the agent’s learning process, allowing iterative updates and fine-tuning based on human feedback, thus enhancing the agent’s performance. Since the human is an essential part of this process, we argue that human-centric approaches are the key to successful RL, a fact that has not been adequately considered in the existing literature. This paper aims to inform readers about current explainability methods in HITL RL. It also shows how the application of explainable AI (xAI) and specific improvements to existing explainability approaches can enable a better human-agent interaction in HITL RL for all types of users, whether for lay people, domain experts, or machine learning specialists.Accounting for the workflow in HITL RL and based on software and machine learning methodologies, this article identifies four phases for human involvement for creating HITL RL systems: (1) Agent Development, (2) Agent Learning, (3) Agent Evaluation, and (4) Agent Deployment. We highlight human involvement, explanation requirements, new challenges, and goals for each phase.We furthermore identify low-risk, high-return opportunities for explainability research in HITL RL and present long-term research goals to advance the field. Finally, we propose a vision of human-robot collaboration that allows both parties to reach their full potential and cooperate effectively. Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with superhuman performance. However, we consider RL as fundamentally a Human-in-the-Loop (HITL) paradigm, even when an agent eventually performs its task autonomously. In cases where the reward function is challenging or impossible to define, HITL approaches are considered particularly advantageous. The application of Reinforcement Learning from Human Feedback (RLHF) in systems such as ChatGPT demonstrates the effectiveness of optimizing for user experience and integrating their feedback into the training loop. In HITL RL, human input is integrated during the agent’s learning process, allowing iterative updates and fine-tuning based on human feedback, thus enhancing the agent’s performance. Since the human is an essential part of this process, we argue that human-centric approaches are the key to successful RL, a fact that has not been adequately considered in the existing literature. This paper aims to inform readers about current explainability methods in HITL RL. It also shows how the application of explainable AI (xAI) and specific improvements to existing explainability approaches can enable a better human-agent interaction in HITL RL for all types of users, whether for lay people, domain experts, or machine learning specialists. Accounting for the workflow in HITL RL and based on software and machine learning methodologies, this article identifies four phases for human involvement for creating HITL RL systems: (1) Agent Development, (2) Agent Learning, (3) Agent Evaluation, and (4) Agent Deployment. We highlight human involvement, explanation requirements, new challenges, and goals for each phase. We furthermore identify low-risk, high-return opportunities for explainability research in HITL RL and present long-term research goals to advance the field. Finally, we propose a vision of human-robot collaboration that allows both parties to reach their full potential and cooperate effectively. |
| Author | Retzlaff, Carl Orge Saranti, Anna Holzinger, Andreas Mousavi, Payam Yang, Tianpei Angerschmid, Alessa Taylor, Matthew E. Afshari, Mohammad Das, Srijita Wayllace, Christabel |
| Author_xml | – sequence: 1 givenname: Carl Orge surname: Retzlaff fullname: Retzlaff, Carl Orge – sequence: 2 givenname: Srijita surname: Das fullname: Das, Srijita – sequence: 3 givenname: Christabel surname: Wayllace fullname: Wayllace, Christabel – sequence: 4 givenname: Payam surname: Mousavi fullname: Mousavi, Payam – sequence: 5 givenname: Mohammad surname: Afshari fullname: Afshari, Mohammad – sequence: 6 givenname: Tianpei surname: Yang fullname: Yang, Tianpei – sequence: 7 givenname: Anna surname: Saranti fullname: Saranti, Anna – sequence: 8 givenname: Alessa surname: Angerschmid fullname: Angerschmid, Alessa – sequence: 9 givenname: Matthew E. surname: Taylor fullname: Taylor, Matthew E. – sequence: 10 givenname: Andreas surname: Holzinger fullname: Holzinger, Andreas |
| BookMark | eNptkNFLwzAQxoNMcE7f_AMCvq4zabq08W0MdUJhMvW5ZM11y-iSLmmF_femmw8iwsF9HL_vjvuu0cBYAwjdUTKhnLKHndRuQid0ypLsAg0pSXkk0mk6-KWv0LX3O0KoSOJsiMyi20sTaRO1W4hyaxu8Am0q60rYg2lxDtIZbTaPeIbfO_cFRyyNwm_W61Zbg0Ot4NBpd8L9GM-3sq7BbCDonlw2jXVtZwIO_gZdVrL2cPvTR-jz-eljvojy5cvrfJZHJSO0jeIpsIRTqTLGlBIlCBBKpZJQGjQXKqkSURFRCsbXWZwABypKnq7DIEsVYyN0f97bOHvowLfFznbOhJMFo1kieJLGcaDiM1U6672Dqih1K_u3Wid1XVBS9LkWfa4FLU65BtP4j6lxei_d8X_8GxXzfYI |
| CitedBy_id | crossref_primary_10_1007_s10209_024_01123_0 crossref_primary_10_1007_s10209_024_01163_6 crossref_primary_10_1016_j_progpolymsci_2024_101874 crossref_primary_10_3390_systems13090783 crossref_primary_10_1080_1553118X_2025_2483675 crossref_primary_10_1109_ACCESS_2024_3395532 crossref_primary_10_1016_j_cobeha_2025_101482 crossref_primary_10_3390_technologies12120259 crossref_primary_10_1109_ACCESS_2024_3504735 crossref_primary_10_1109_ACCESS_2025_3556187 crossref_primary_10_1080_2331186X_2024_2412492 crossref_primary_10_1007_s10660_024_09876_9 crossref_primary_10_3390_s24030798 crossref_primary_10_1371_journal_pone_0320777 crossref_primary_10_1002_ange_202513147 crossref_primary_10_1007_s10462_025_11255_1 crossref_primary_10_1109_ACCESS_2024_3401547 crossref_primary_10_1016_j_imu_2024_101587 crossref_primary_10_1007_s12555_025_0127_1 crossref_primary_10_1007_s10729_025_09699_6 crossref_primary_10_1007_s10342_024_01673_1 crossref_primary_10_1016_j_eswa_2025_127794 crossref_primary_10_1080_10549811_2025_2513220 crossref_primary_10_1088_1748_9326_ad959f crossref_primary_10_3390_math12071024 crossref_primary_10_1016_j_mimet_2025_107232 crossref_primary_10_1007_s00299_024_03294_9 crossref_primary_10_1016_j_aei_2025_103864 crossref_primary_10_1145_3670685 crossref_primary_10_1109_JPROC_2025_3584656 crossref_primary_10_1007_s00521_024_10512_8 crossref_primary_10_1007_s11227_025_07386_5 crossref_primary_10_1002_anie_202513147 crossref_primary_10_1038_s41598_024_72748_7 crossref_primary_10_3390_admsci14070152 crossref_primary_10_1016_j_neucom_2025_129780 crossref_primary_10_1016_j_knosys_2024_112551 |
| ContentType | Journal Article |
| Copyright | 2024. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the associated terms available at https://www.jair.org/index.php/jair/about |
| Copyright_xml | – notice: 2024. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the associated terms available at https://www.jair.org/index.php/jair/about |
| DBID | AAYXX CITATION 8FE 8FG ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU COVID DWQXO GNUQQ HCIFZ JQ2 K7- P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS |
| DOI | 10.1613/jair.1.15348 |
| DatabaseName | CrossRef ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest SciTech Premium Collection Technology Collection Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central ProQuest Technology Collection ProQuest One Coronavirus Research Database ProQuest Central Korea ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China |
| DatabaseTitle | CrossRef Publicly Available Content Database Advanced Technologies & Aerospace Collection Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest One Academic Eastern Edition Coronavirus Research Database ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | Publicly Available Content Database CrossRef |
| Database_xml | – sequence: 1 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1076-9757 1943-5037 |
| EndPage | 415 |
| ExternalDocumentID | 10_1613_jair_1_15348 |
| GroupedDBID | .DC 29J 2WC 5GY 5VS AAKMM AAKPC AALFJ AAYFX AAYXX ACGFO ACM ADBBV ADBSK ADMLS AEFXT AEJOY AENEX AFFHD AFKRA AFWXC AKRVB ALMA_UNASSIGNED_HOLDINGS AMVHM ARAPS BCNDV BENPR BGLVJ CCPQU CITATION E3Z EBS EJD F5P FRJ FRP GROUPED_DOAJ GUFHI HCIFZ K7- KQ8 LHSKQ LPJ OK1 OVT P2P PHGZM PHGZT PIMPY PQGLB RNS TR2 XSB 8FE 8FG ABUWG AZQEC COVID DWQXO GNUQQ JQ2 P62 PKEHL PQEST PQQKQ PQUKI PRINS |
| ID | FETCH-LOGICAL-c301t-25e3461ad833dd9ce9e9dd7a011ce969d4f49f09c936b824e6e19c67bc9387d33 |
| IEDL.DBID | K7- |
| ISICitedReferencesCount | 55 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001157178000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1076-9757 |
| IngestDate | Fri Jul 25 21:17:42 EDT 2025 Tue Nov 18 22:34:28 EST 2025 Sat Nov 29 05:27:06 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c301t-25e3461ad833dd9ce9e9dd7a011ce969d4f49f09c936b824e6e19c67bc9387d33 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| OpenAccessLink | https://www.proquest.com/docview/3184964722?pq-origsite=%requestingapplication% |
| PQID | 3184964722 |
| PQPubID | 5160723 |
| PageCount | 57 |
| ParticipantIDs | proquest_journals_3184964722 crossref_citationtrail_10_1613_jair_1_15348 crossref_primary_10_1613_jair_1_15348 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-01-01 |
| PublicationDateYYYYMMDD | 2024-01-01 |
| PublicationDate_xml | – month: 01 year: 2024 text: 2024-01-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | San Francisco |
| PublicationPlace_xml | – name: San Francisco |
| PublicationTitle | The Journal of artificial intelligence research |
| PublicationYear | 2024 |
| Publisher | AI Access Foundation |
| Publisher_xml | – name: AI Access Foundation |
| SSID | ssj0019428 |
| Score | 2.6444902 |
| Snippet | Artificial intelligence (AI) and especially reinforcement learning (RL) have the potential to enable agents to learn and perform tasks autonomously with... |
| SourceID | proquest crossref |
| SourceType | Aggregation Database Enrichment Source Index Database |
| StartPage | 359 |
| SubjectTerms | Agents (artificial intelligence) Artificial intelligence Explainable artificial intelligence Feedback Machine learning User experience Workflow |
| Title | Human-in-the-Loop Reinforcement Learning: A Survey and Position on Requirements, Challenges, and Opportunities |
| URI | https://www.proquest.com/docview/3184964722 |
| Volume | 79 |
| WOSCitedRecordID | wos001157178000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1076-9757 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0019428 issn: 1076-9757 databaseCode: DOA dateStart: 19930101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVPQU databaseName: Computer Science Database customDbUrl: eissn: 1076-9757 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0019428 issn: 1076-9757 databaseCode: K7- dateStart: 19930101 isFulltext: true titleUrlDefault: http://search.proquest.com/compscijour providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 1076-9757 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0019428 issn: 1076-9757 databaseCode: BENPR dateStart: 19930101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Publicly Available Content Database customDbUrl: eissn: 1076-9757 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0019428 issn: 1076-9757 databaseCode: PIMPY dateStart: 19930101 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELaAMrBQnuJRKg8wgYE4fsQsqKAikKBEBaQyRYntoiKUlr4k_j224xQYYEHKYDk3RLmn7873AbAvcWpTaxQJJhUiStj6bkqQNKE2pSxloS7AJnirFXU6IvYJt5FvqyxtojPUqi9tjvzEyB6xtyYxPh-8I4saZaurHkJjHlQCbIywLcpyNKsiCIKLq3CcIcEp943vxoOdvKa94XFwbPTdIv98d0k_LbJzM1fV_37gClj2ASZsFBKxCuZ0vgaqJXgD9Lq8DnKXvke9HJkQEN32-wPY1m6MqnQZQ-gnr76cwQZ8mAyn-gOmuYKx7_KC5mlr20fsyEdH8LLEZTFrS3k_sLH9JHczWzfA01Xz8fIaefAFJI3OjxGmOiQsSFUUhkoJqYUWSvHU2AOzZkKRLhHdUyFFyLIIE810ICTjmdmIuArDTbCQ93O9BSAjkdLZqVYMpySjNKPdiEgTibAuZTzE2-Cw_P-J9JPJLUDGW2JPKIZbieVWEiSOW9vgYEY9KCZy_EJXK_mUeL0cJV9M2vn79S5YwiZ8KZItNbAwHk70HliU03FvNKyDykWzFbfr7gRfd0Jn9uKbu_j5EzQK4Jo |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9swDCaKrsB2Wdc90G5dq8Ny2tQmsh7WgKIo-kCCZFmQZUBvri0pRYbByfLo0D-131hKttPt0N5yGOCDbPNi6TNJkRQ_gA-GpT60JqiWxlJutc_vppwadLWFkKmMXEE2obrd-PJS99bgT3UWxpdVVjoxKGo7Nj5GfojY4_7UJGPHk1_Us0b57GpFoVHAou1uf-OWbXbUOsP1rTF2cT44bdKSVYAaBPOcMuEiLhupjaPIWm2cdtpalSLQcSy15UOuh3VtdCSzmHEnXUMbqTJ8ECvrA6Co8p9wvPH_VVvRZdZCc1YcvVOSaiVUWWiPFvPwRzqaHjQOUL94pqG_TeC_FiCYtYvN_21CXsDz0oEmJwXit2DN5S9hsyKnIKWuegV5SE_QUU7RxaWd8XhC-i60iTUhIkrKzrLXn8kJ-baY3rhbkuaW9MoqNoJX3_k66SA--0ROK94ZHHvJrxO_d1nkoSfta_i-kq9-A-v5OHfbQCSPrcvqzkqW8kyITAxjbtDTkkMhVcR24GO13okpO697ApCfid-BIToSj46kkQR07EBtKT0pOo48ILdb4SIp9c4suQfF28df78PT5uBLJ-m0uu138Iyhq1YElnZhfT5duPewYW7mo9l0L0CcwNWqIXQHMtA5eA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Human-in-the-Loop+Reinforcement+Learning%3A+A+Survey+and+Position+on+Requirements%2C+Challenges%2C+and+Opportunities&rft.jtitle=The+Journal+of+artificial+intelligence+research&rft.au=Retzlaff%2C+Carl+Orge&rft.au=Das%2C+Srijita&rft.au=Wayllace%2C+Christabel&rft.au=Mousavi%2C+Payam&rft.date=2024-01-01&rft.issn=1076-9757&rft.eissn=1076-9757&rft.volume=79&rft.spage=359&rft.epage=415&rft_id=info:doi/10.1613%2Fjair.1.15348&rft.externalDBID=n%2Fa&rft.externalDocID=10_1613_jair_1_15348 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1076-9757&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1076-9757&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1076-9757&client=summon |