Modeling Variation in Human Feedback with User Inputs: An Exploratory Methodology

To expedite the development process of interactive reinforcement learning (IntRL) algorithms, prior work often uses perfect oracles as simulated human teachers to furnish feedback signals. These oracles typically derive from ground-truth knowledge or optimal policies, providing dense and error-free...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2024 19th ACM/IEEE International Conference on Human-Robot Interaction (HRI) S. 303 - 312
Hauptverfasser: Huang, Jindan, Aronson, Reuben M., Short, Elaine Schaertl
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: ACM 11.03.2024
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:To expedite the development process of interactive reinforcement learning (IntRL) algorithms, prior work often uses perfect oracles as simulated human teachers to furnish feedback signals. These oracles typically derive from ground-truth knowledge or optimal policies, providing dense and error-free feedback to a robot learner without delay. However, this machine-like feedback behavior fails to accurately represent the diverse patterns observed in human feedback, which may lead to unstable or unexpected algorithm performance in real-world human-robot interaction. To alleviate this limitation of oracles in oversimplifying user behavior, we propose a method for modeling variation in human feedback that can be applied to a standard oracle. We present a model with 5 dimensions of feedback variation identified in prior work. This model enables the modification of feedback outputs from perfect oracles to introduce more human-like features. We demonstrate how each model attribute can impact on the learning performance of an IntRL algorithm through a simulation experiment. We also conduct a proof-of-concept study to illustrate how our model can be populated from people in two ways. The modeling results intuitively present the feedback variation among participants and help to explain the mismatch between oracles and human teachers. Overall, our method is a promising step towards refining simulated oracles by incorporating insights from real users.CCS CONCEPTS* Human-centered computing → Collaborative and social computing; * Computing methodologies → Modeling and simulation.
DOI:10.1145/3610977.3634925