Enhancing Deep Reinforcement Learning with Executable Specifications

Deep reinforcement learning (DRL) has become a dominant paradigm for using deep learning to carry out tasks where complex policies are learned for reactive systems. However, these policies are "black-boxes", e.g., opaque to humans and known to be susceptible to bugs. For example, it is har...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Proceedings (IEEE/ACM International Conference on Software Engineering Companion. Online) s. 213 - 217
Hlavný autor: Yerushalmi, Raz
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 01.05.2023
Predmet:
ISSN:2574-1934
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Deep reinforcement learning (DRL) has become a dominant paradigm for using deep learning to carry out tasks where complex policies are learned for reactive systems. However, these policies are "black-boxes", e.g., opaque to humans and known to be susceptible to bugs. For example, it is hard - if not impossible - to guarantee that the trained DRL agent adheres to specific safety and fairness properties that may be required. This doctoral dissertation's first and primary contribution is a novel approach to developing DRL agents, which will improve the DRL training process by pushing the learned policy toward high performance on its main task and compliance with such safety and fairness properties, guaranteeing a high probability of compliance while not compromising the performance of the resulting agent. The approach is realized by incorporating domain-specific knowledge captured as key properties defined by domain experts directly into the DRL optimization process while leveraging behavioral languages that are natural to the domain experts. We have validated the proposed approach by extending the AI-Gym Python framework [1] for training DRL agents and integrating it with the BP-Py framework [2] for specifying scenario-based models [3] in a way that allows scenario objects to affect the training process through reward and cost functions, demonstrating dramatic improvement in the safety and performance of the agent. In addition, we have validated the resulting DRL agents using the Marabou verifier [4], confirming that the resulting agents indeed comply (in full) with the required safety and fairness properties. We have applied the approach, training DRL agents for use cases from network communication and robotic navigation domains, exhibiting strong results. A second contribution of this doctoral dissertation is to develop and leverage probabilistic verification methods for deep neural networks to overcome the current scalability limitations of neural network verification technology, limiting the applicability of verification to practical DRL agents. We carried out an initial validation of the concept in the domain of image classification, showing promising results.
ISSN:2574-1934
DOI:10.1109/ICSE-Companion58688.2023.00058