Learning and Repair of Deep Reinforcement Learning Policies from Fuzz-Testing Data

Reinforcement learning from demonstrations (RLfD) is a promising approach to improve the exploration efficiency of reinforcement learning (RL) by learning from expert demonstrations in addition to interactions with the environment. In this paper, we propose a framework that combines techniques from...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings / International Conference on Software Engineering S. 38 - 50
Hauptverfasser: Tappler, Martin, Pferscher, Andrea, Aichernig, Bernhard K., Konighofer, Bettina
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: ACM 14.04.2024
Schlagworte:
ISSN:1558-1225
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Reinforcement learning from demonstrations (RLfD) is a promising approach to improve the exploration efficiency of reinforcement learning (RL) by learning from expert demonstrations in addition to interactions with the environment. In this paper, we propose a framework that combines techniques from search-based testing with RLfD with the goal to raise the level of dependability of RL policies and to reduce human engineering effort. Within our framework, we provide methods for efficiently training, evaluating, and repairing RL policies. Instead of relying on the costly collection of demonstrations from (human) experts, we automatically compute a diverse set of demonstrations via search-based fuzzing methods and use the fuzz demonstrations for RLfD. To evaluate the safety and robustness of the trained RL agent, we search for safety-critical scenarios in the black-box environment. Finally, when unsafe behavior is detected, we compute demonstrations through fuzz testing that represent safe behavior and use them to repair the policy. Our experiments show that our framework is able to efficiently learn high-performing and safe policies without requiring any expert knowledge.
AbstractList Reinforcement learning from demonstrations (RLfD) is a promising approach to improve the exploration efficiency of reinforcement learning (RL) by learning from expert demonstrations in addition to interactions with the environment. In this paper, we propose a framework that combines techniques from search-based testing with RLfD with the goal to raise the level of dependability of RL policies and to reduce human engineering effort. Within our framework, we provide methods for efficiently training, evaluating, and repairing RL policies. Instead of relying on the costly collection of demonstrations from (human) experts, we automatically compute a diverse set of demonstrations via search-based fuzzing methods and use the fuzz demonstrations for RLfD. To evaluate the safety and robustness of the trained RL agent, we search for safety-critical scenarios in the black-box environment. Finally, when unsafe behavior is detected, we compute demonstrations through fuzz testing that represent safe behavior and use them to repair the policy. Our experiments show that our framework is able to efficiently learn high-performing and safe policies without requiring any expert knowledge.
Author Konighofer, Bettina
Aichernig, Bernhard K.
Pferscher, Andrea
Tappler, Martin
Author_xml – sequence: 1
  givenname: Martin
  surname: Tappler
  fullname: Tappler, Martin
  email: martin.tappler@ist.tugraz.at
  organization: Graz University of Technology, Institute of Software Technology,Graz,Austria
– sequence: 2
  givenname: Andrea
  surname: Pferscher
  fullname: Pferscher, Andrea
  email: andrea.pferscher@ist.tugraz.at
  organization: Graz University of Technology, Institute of Software Technology,Graz,Austria
– sequence: 3
  givenname: Bernhard K.
  surname: Aichernig
  fullname: Aichernig, Bernhard K.
  email: aichernig@ist.tugraz.at
  organization: Graz University of Technology, Institute of Software Technology,Graz,Austria
– sequence: 4
  givenname: Bettina
  surname: Konighofer
  fullname: Konighofer, Bettina
  email: bettina.koenighofer@iaik.tugraz.at
  organization: Graz University of Technology, Institute of Applied Information Processing and Communications,Graz,Austria
BookMark eNo9j0FLxDAUhKMouK49e_GQP9A1yctr2qPsuioUlGU9L2nzIoFtWtJ6cH-9FcXTMMwwzHfNLmIfibFbKVZSarwHrAwKWEGhAKQ8Y1llqlILYYSSRp-zhUQsc6kUXrFsHEMjUAOaQsOC7WqyKYb4wW10fEeDDYn3nm-IhtmG6PvUUkdx4v_Nt_4Y2kAj96nv-PbzdMr3NE4_0cZO9oZdenscKfvTJXvfPu7Xz3n9-vSyfqhzq0DJHBvrAR2W2qNQiIWExhF64Qm8g0Z5ZwhcY1A7UpVphMcZqjXz8cIKD0t297sbiOgwpNDZ9HWQM1wJZQnfEoVRkg
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
ESBDL
RIE
RIO
DOI 10.1145/3597503.3623311
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore Open Access Journals
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798400702174
EISSN 1558-1225
EndPage 50
ExternalDocumentID 10548388
Genre orig-research
GroupedDBID -~X
.4S
.DC
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
ESBDL
FEDTE
I-F
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-a2321-5baf35d584f50255613bde5f0fe3fd3b2fd7e3db754de297b0f5400c77646a0f3
IEDL.DBID RIE
IngestDate Wed Aug 27 01:53:13 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a2321-5baf35d584f50255613bde5f0fe3fd3b2fd7e3db754de297b0f5400c77646a0f3
OpenAccessLink https://ieeexplore.ieee.org/document/10548388
PageCount 13
ParticipantIDs ieee_primary_10548388
PublicationCentury 2000
PublicationDate 2024-April-14
PublicationDateYYYYMMDD 2024-04-14
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-April-14
  day: 14
PublicationDecade 2020
PublicationTitle Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev ICSE
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib054357643
ssib055306466
ssj0006499
Score 2.3261175
Snippet Reinforcement learning from demonstrations (RLfD) is a promising approach to improve the exploration efficiency of reinforcement learning (RL) by learning from...
SourceID ieee
SourceType Publisher
StartPage 38
SubjectTerms Deep reinforcement learning
Fuzzing
Maintenance engineering
Policy repair
Reinforcement learning from demonstrations
Robustness
Safety
Search-based software testing
Software reliability
Training
Video games
Title Learning and Repair of Deep Reinforcement Learning Policies from Fuzz-Testing Data
URI https://ieeexplore.ieee.org/document/10548388
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ27b8IwEMatgjp0og-qvuWhq6mN7TiZS1GHCqGKSmzIjs8VS0AUOvDX9-wEytKhW55SdIl9nx3f7yPkUSqnOfQNczLzTCkvWaHzEP1NeJGXNuga4vpmRqN8Oi3GTbF6qoUBgLT4DHpxM_3L94tyE6fKsIWjvpZ53iItY7K6WGv38WjM--aALRXtcDIVtUrTLWeo7Ru2j1D6SaKS1lz2sAOXMvoHHZirpNwy7PzzqU5J97dKj473-eeMHEF1Tjo7mwbatNoL8t4wVD-prTxFxW3nK7oIdACwxN2ETi3TLCHdX5lwwTiIprH8hA432y2bRB4HnhrYte2Sj-HL5PmVNVYKzKJkEkw7G6T2qDaCTtQxIZ0HHXgAGbx0_eANRNSyVh76hXE8oJTjJYZZZZYHeUna1aKCK0K9EFmpwWbCemW1dcpIlOg4vOU2cF9ek26M0WxZ0zJmu_Dc_HH8lpzgzWk1jFB3pL1ebeCeHJff6_nX6iG94x_gOaSg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3LTwIxEMYniiZ6wgfGtz14LXZpu4-zSDAiIQYTbqS7nRouC0HwwF_vtCzoxYO3fWWzmd12vnY7vw_gXqpcC2wlPJex5UpZyTOdOu9vIrK0ME6vIa69pN9PR6NsUBWrh1oYRAyLz7DpN8O_fDstln6qjFo46WuZpruw562zqnKtzeejKfMnv-hS3hAnVl6tVB1zTOq-ovtESj9I0tJayCZ14VJ6B6Ff9iohu3Tq_3yuI2j81OmxwTYDHcMOlidQ3xg1sKrdnsJbRVH9YKa0jDS3mczZ1LE24ox2Azy1CPOEbHtlAAbTMJr5AhTWWa5WfOiJHHSqbRamAe-dp-Fjl1dmCtyQaIq4zo2T2pLecDpwxyKZW9ROOJTOyrzlbIIetqyVxVaW5MKRmBNFQiGNjXDyDGrltMRzYDaK4kKjiSNjldEmV4kkkU4DXGGcsMUFNHyMxrM1L2O8Cc_lH8fv4KA7fO2Ne8_9lys4pBuFtTGRuobaYr7EG9gvvhaTz_lteN_fpXOn6Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Learning+and+Repair+of+Deep+Reinforcement+Learning+Policies+from+Fuzz-Testing+Data&rft.au=Tappler%2C+Martin&rft.au=Pferscher%2C+Andrea&rft.au=Aichernig%2C+Bernhard+K.&rft.au=Konighofer%2C+Bettina&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=1558-1225&rft.spage=38&rft.epage=50&rft_id=info:doi/10.1145%2F3597503.3623311&rft.externalDocID=10548388