Fast Deterministic Black-Box Context-Free Grammar Inference

Black-box context-free grammar inference is a hard problem as in many practical settings it only has access to a limited number of example programs. The state-of-the-art approach Arvada heuristically generalizes grammar rules starting from flat parse trees and is non-deterministic to explore differe...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings / International Conference on Software Engineering S. 1434 - 1445
Hauptverfasser: Arefin, Mohammad Rifat, Shetiya, Suraj, Wang, Zili, Csallner, Christoph
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: ACM 14.04.2024
Schlagworte:
ISSN:1558-1225
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Black-box context-free grammar inference is a hard problem as in many practical settings it only has access to a limited number of example programs. The state-of-the-art approach Arvada heuristically generalizes grammar rules starting from flat parse trees and is non-deterministic to explore different generalization sequences. We observe that many of Arvada's generalization steps violate common language concept nesting rules. We thus propose to pre-structure input programs along these nesting rules, apply learnt rules recursively, and make black-box context-free grammar inference deterministic. The resulting Tree Vada yielded faster runtime and higher-quality grammars in an empirical comparison. The Treevada source code, scripts, evaluation parameters, and training data are open-source and publicly available (https://doi.org/10.6084/m9.figshare.23907738).
AbstractList Black-box context-free grammar inference is a hard problem as in many practical settings it only has access to a limited number of example programs. The state-of-the-art approach Arvada heuristically generalizes grammar rules starting from flat parse trees and is non-deterministic to explore different generalization sequences. We observe that many of Arvada's generalization steps violate common language concept nesting rules. We thus propose to pre-structure input programs along these nesting rules, apply learnt rules recursively, and make black-box context-free grammar inference deterministic. The resulting Tree Vada yielded faster runtime and higher-quality grammars in an empirical comparison. The Treevada source code, scripts, evaluation parameters, and training data are open-source and publicly available (https://doi.org/10.6084/m9.figshare.23907738).
Author Shetiya, Suraj
Arefin, Mohammad Rifat
Wang, Zili
Csallner, Christoph
Author_xml – sequence: 1
  givenname: Mohammad Rifat
  surname: Arefin
  fullname: Arefin, Mohammad Rifat
  organization: University of Texas at Arlington,Computer Science and Engineering Department,Arlington,Texas,USA
– sequence: 2
  givenname: Suraj
  surname: Shetiya
  fullname: Shetiya, Suraj
  organization: University of Texas at Arlington,Computer Science and Engineering Department,Arlington,Texas,USA
– sequence: 3
  givenname: Zili
  surname: Wang
  fullname: Wang, Zili
  organization: Iowa State University,Department of Computer Science,Ames,Iowa,USA
– sequence: 4
  givenname: Christoph
  surname: Csallner
  fullname: Csallner, Christoph
  organization: University of Texas at Arlington,Computer Science and Engineering Department,Arlington,Texas,USA
BookMark eNotjz1PwzAUAA0CiVIyszDkD7g8-_kjFhMtpFSqxAJz5TjPkkXjICdD-fdUgum2090tu8pjJsbuBayEUPoRtbMacIUGnRTqglXOukYBWJDCqku2EFo3XEipb1g1TakDrVBbo3DBnlo_zfULzVSGlNM0p1Cvjz588fV4qjdjnuk087YQ1dvih8GXepcjFcqB7th19MeJqn8u2Wf7-rF54_v37W7zvOdeolQ8CBm1JPCIjdI2CgWuI92HQF4IJ60BkNI447An633APrgI0EeFnXUBl-zhz5uI6PBd0rni5yDOF42xFn8B0FlIww
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
ESBDL
RIE
RIO
DOI 10.1145/3597503.3639214
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore Open Access Journals
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798400702174
EISSN 1558-1225
EndPage 1445
ExternalDocumentID 10548677
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation
  grantid: 1911017
  funderid: 10.13039/100000001
GroupedDBID -~X
.4S
.DC
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
ESBDL
FEDTE
I-F
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-a2324-c12f52e0a338457f1409be5dccea119276002269693de7aac3dc9f00df43b79c3
IEDL.DBID RIE
IngestDate Wed Aug 27 01:53:12 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a2324-c12f52e0a338457f1409be5dccea119276002269693de7aac3dc9f00df43b79c3
OpenAccessLink https://ieeexplore.ieee.org/document/10548677
PageCount 12
ParticipantIDs ieee_primary_10548677
PublicationCentury 2000
PublicationDate 2024-April-14
PublicationDateYYYYMMDD 2024-04-14
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-April-14
  day: 14
PublicationDecade 2020
PublicationTitle Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev ICSE
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib054357643
ssib055306466
ssj0006499
Score 2.2840228
Snippet Black-box context-free grammar inference is a hard problem as in many practical settings it only has access to a limited number of example programs. The...
SourceID ieee
SourceType Publisher
StartPage 1434
SubjectTerms bracket-implied nesting structure
Closed box
deterministic synthesis
Grammar
Grammar inference
nested language concepts
oracle
Runtime
Source coding
Training data
Title Fast Deterministic Black-Box Context-Free Grammar Inference
URI https://ieeexplore.ieee.org/document/10548677
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFLVoxcBUHkW85YE1JYlfsdh4BFiqDiB1q27sa4mlRWmL-Hxs14EuDGxRMti6tnOufX3OIeRaiKChHlQ-hbAZb6zMqgb8XNbg4QKwKpPZhBqPq-lUTxJZPXJhEDFePsNReIy1fLsw63BU5le4CAJxqkd6SskNWaubPMLjvtrSlgp2OJKHXCX9lqXP7ZO2T8HFDfMdFDkbMQ_RZeDwbJmrRGypB__s1T4Z_rL06OQHfw7IDs4PyaCzaaBp1R6R2xqWK_qQ7r1EYWYaz-2yu8UXjfJUfvdbt4j0qYXAZaMvXQND8lY_vt4_Z8kzIYOQG2WmKJ0oMQe_9eRCuaBn1aCwxiAUPpuLdbhSaqmZRQVgmDXa5bl1nDVKG3ZM-vPFHE8I9V8LcBycH2euLTRVXjrXFBVI4yNtTskwBGP2sZHFmHVxOPvj_TnZK31GEEoxBb8g_VW7xkuyaz5X78v2Kg7mNzpNnKY
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFL2CggRTeRTxxgNrShLbSSw2HqEVpepQpG6V44fE0qK0RXw-vsaBLgxsUTLYurZzrn19zgG45hw11FHlk3MdsUpnUVFJN5eFdHAhTZEGs4l8OCwmEzEKZHXPhTHG-MtnpouPvpav52qFR2VuhXMUiMs3YQutswJdq5k-3CF_vqYuhYY4GcNsJfyYM5fdB3WfhPEb6rrIY9qlDqRTZPGs2at4dCnb_-zXHnR-eXpk9INA-7BhZgfQbowaSFi3h3BbysWSPISbL16amfiTu-hu_km8QJXb_5a1MeSplshmI_2mgQ68lo_j-14UXBMiidlRpJLU8tTE0m0-Gc8tKlpVhmuljExcPucrcWkmMkG1yaVUVCth41hbRqtcKHoErdl8Zo6BuK-JtExaN9JMaFkVcWptlRQyUy7S6gQ6GIzp-7cwxrSJw-kf769gpzd-GUwH_eHzGeymLj_AwkzCzqG1rFfmArbVx_JtUV_6gf0Cltef7w
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Fast+Deterministic+Black-Box+Context-Free+Grammar+Inference&rft.au=Arefin%2C+Mohammad+Rifat&rft.au=Shetiya%2C+Suraj&rft.au=Wang%2C+Zili&rft.au=Csallner%2C+Christoph&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=1558-1225&rft.spage=1434&rft.epage=1445&rft_id=info:doi/10.1145%2F3597503.3639214&rft.externalDocID=10548677