Relating Human Error–Based Learning to Modern Deep RL Algorithms.

Saved in:
Bibliographic Details
Title: Relating Human Error–Based Learning to Modern Deep RL Algorithms.
Authors: Garibbo, Michele, Ludwig, Casimir J. H., Lepora, Nathan F., Aitchison, Laurence
Source: Neural Computation; Jan2025, Vol. 37 Issue 1, p128-159, 32p
Subject Terms: DEEP reinforcement learning, LEARNING, DEEP learning, ALGORITHMS
Abstract: In human error–based learning, the size and direction of a scalar error (i.e., the "directed error") are used to update future actions. Modern deep reinforcement learning (RL) methods perform a similar operation but in terms of scalar rewards. Despite this similarity, the relationship between action updates of deep RL and human error–based learning has yet to be investigated. Here, we systematically compare the three major families of deep RL algorithms to human error–based learning. We show that all three deep RL approaches are qualitatively different from human error–based learning, as assessed by a mirror-reversal perturbation experiment. To bridge this gap, we developed an alternative deep RL algorithm inspired by human error–based learning, model-based deterministic policy gradients (MB-DPG). We showed that MB-DPG captures human error–based learning under mirror-reversal and rotational perturbations and that MB-DPG learns faster than canonical model-free algorithms on complex arm-based reaching tasks, while being more robust to (forward) model misspecification than model-based RL. [ABSTRACT FROM AUTHOR]
Copyright of Neural Computation is the property of MIT Press and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Complementary Index
Description
Abstract:In human error–based learning, the size and direction of a scalar error (i.e., the "directed error") are used to update future actions. Modern deep reinforcement learning (RL) methods perform a similar operation but in terms of scalar rewards. Despite this similarity, the relationship between action updates of deep RL and human error–based learning has yet to be investigated. Here, we systematically compare the three major families of deep RL algorithms to human error–based learning. We show that all three deep RL approaches are qualitatively different from human error–based learning, as assessed by a mirror-reversal perturbation experiment. To bridge this gap, we developed an alternative deep RL algorithm inspired by human error–based learning, model-based deterministic policy gradients (MB-DPG). We showed that MB-DPG captures human error–based learning under mirror-reversal and rotational perturbations and that MB-DPG learns faster than canonical model-free algorithms on complex arm-based reaching tasks, while being more robust to (forward) model misspecification than model-based RL. [ABSTRACT FROM AUTHOR]
ISSN:08997667
DOI:10.1162/neco_a_01721