Fast Deterministic Black-Box Context-Free Grammar Inference
Black-box context-free grammar inference is a hard problem as in many practical settings it only has access to a limited number of example programs. The state-of-the-art approach Arvada heuristically generalizes grammar rules starting from flat parse trees and is non-deterministic to explore differe...
Gespeichert in:
| Veröffentlicht in: | Proceedings / International Conference on Software Engineering S. 1434 - 1445 |
|---|---|
| Hauptverfasser: | , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
ACM
14.04.2024
|
| Schlagworte: | |
| ISSN: | 1558-1225 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Black-box context-free grammar inference is a hard problem as in many practical settings it only has access to a limited number of example programs. The state-of-the-art approach Arvada heuristically generalizes grammar rules starting from flat parse trees and is non-deterministic to explore different generalization sequences. We observe that many of Arvada's generalization steps violate common language concept nesting rules. We thus propose to pre-structure input programs along these nesting rules, apply learnt rules recursively, and make black-box context-free grammar inference deterministic. The resulting Tree Vada yielded faster runtime and higher-quality grammars in an empirical comparison. The Treevada source code, scripts, evaluation parameters, and training data are open-source and publicly available (https://doi.org/10.6084/m9.figshare.23907738). |
|---|---|
| AbstractList | Black-box context-free grammar inference is a hard problem as in many practical settings it only has access to a limited number of example programs. The state-of-the-art approach Arvada heuristically generalizes grammar rules starting from flat parse trees and is non-deterministic to explore different generalization sequences. We observe that many of Arvada's generalization steps violate common language concept nesting rules. We thus propose to pre-structure input programs along these nesting rules, apply learnt rules recursively, and make black-box context-free grammar inference deterministic. The resulting Tree Vada yielded faster runtime and higher-quality grammars in an empirical comparison. The Treevada source code, scripts, evaluation parameters, and training data are open-source and publicly available (https://doi.org/10.6084/m9.figshare.23907738). |
| Author | Shetiya, Suraj Arefin, Mohammad Rifat Wang, Zili Csallner, Christoph |
| Author_xml | – sequence: 1 givenname: Mohammad Rifat surname: Arefin fullname: Arefin, Mohammad Rifat organization: University of Texas at Arlington,Computer Science and Engineering Department,Arlington,Texas,USA – sequence: 2 givenname: Suraj surname: Shetiya fullname: Shetiya, Suraj organization: University of Texas at Arlington,Computer Science and Engineering Department,Arlington,Texas,USA – sequence: 3 givenname: Zili surname: Wang fullname: Wang, Zili organization: Iowa State University,Department of Computer Science,Ames,Iowa,USA – sequence: 4 givenname: Christoph surname: Csallner fullname: Csallner, Christoph organization: University of Texas at Arlington,Computer Science and Engineering Department,Arlington,Texas,USA |
| BookMark | eNotjz1PwzAUAA0CiVIyszDkD7g8-_kjFhMtpFSqxAJz5TjPkkXjICdD-fdUgum2090tu8pjJsbuBayEUPoRtbMacIUGnRTqglXOukYBWJDCqku2EFo3XEipb1g1TakDrVBbo3DBnlo_zfULzVSGlNM0p1Cvjz588fV4qjdjnuk087YQ1dvih8GXepcjFcqB7th19MeJqn8u2Wf7-rF54_v37W7zvOdeolQ8CBm1JPCIjdI2CgWuI92HQF4IJ60BkNI447An633APrgI0EeFnXUBl-zhz5uI6PBd0rni5yDOF42xFn8B0FlIww |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK ESBDL RIE RIO |
| DOI | 10.1145/3597503.3639214 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore Open Access Journals IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798400702174 |
| EISSN | 1558-1225 |
| EndPage | 1445 |
| ExternalDocumentID | 10548677 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: National Science Foundation grantid: 1911017 funderid: 10.13039/100000001 |
| GroupedDBID | -~X .4S .DC 29O 5VS 6IE 6IF 6IH 6IK 6IL 6IM 6IN 8US AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS ARCSS AVWKF BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO EDO ESBDL FEDTE I-F IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-a2324-c12f52e0a338457f1409be5dccea119276002269693de7aac3dc9f00df43b79c3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 01:53:12 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a2324-c12f52e0a338457f1409be5dccea119276002269693de7aac3dc9f00df43b79c3 |
| OpenAccessLink | https://ieeexplore.ieee.org/document/10548677 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_10548677 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-April-14 |
| PublicationDateYYYYMMDD | 2024-04-14 |
| PublicationDate_xml | – month: 04 year: 2024 text: 2024-April-14 day: 14 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings / International Conference on Software Engineering |
| PublicationTitleAbbrev | ICSE |
| PublicationYear | 2024 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssib054357643 ssib055306466 ssj0006499 |
| Score | 2.2840228 |
| Snippet | Black-box context-free grammar inference is a hard problem as in many practical settings it only has access to a limited number of example programs. The... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1434 |
| SubjectTerms | bracket-implied nesting structure Closed box deterministic synthesis Grammar Grammar inference nested language concepts oracle Runtime Source coding Training data |
| Title | Fast Deterministic Black-Box Context-Free Grammar Inference |
| URI | https://ieeexplore.ieee.org/document/10548677 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFLVoxcBUHkW85YE1JYlfsdh4BFiqDiB1q27sa4mlRWmL-Hxs14EuDGxRMti6tnOufX3OIeRaiKChHlQ-hbAZb6zMqgb8XNbg4QKwKpPZhBqPq-lUTxJZPXJhEDFePsNReIy1fLsw63BU5le4CAJxqkd6SskNWaubPMLjvtrSlgp2OJKHXCX9lqXP7ZO2T8HFDfMdFDkbMQ_RZeDwbJmrRGypB__s1T4Z_rL06OQHfw7IDs4PyaCzaaBp1R6R2xqWK_qQ7r1EYWYaz-2yu8UXjfJUfvdbt4j0qYXAZaMvXQND8lY_vt4_Z8kzIYOQG2WmKJ0oMQe_9eRCuaBn1aCwxiAUPpuLdbhSaqmZRQVgmDXa5bl1nDVKG3ZM-vPFHE8I9V8LcBycH2euLTRVXjrXFBVI4yNtTskwBGP2sZHFmHVxOPvj_TnZK31GEEoxBb8g_VW7xkuyaz5X78v2Kg7mNzpNnKY |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFL2CggRTeRTxxgNrShLbSSw2HqEVpepQpG6V44fE0qK0RXw-vsaBLgxsUTLYurZzrn19zgG45hw11FHlk3MdsUpnUVFJN5eFdHAhTZEGs4l8OCwmEzEKZHXPhTHG-MtnpouPvpav52qFR2VuhXMUiMs3YQutswJdq5k-3CF_vqYuhYY4GcNsJfyYM5fdB3WfhPEb6rrIY9qlDqRTZPGs2at4dCnb_-zXHnR-eXpk9INA-7BhZgfQbowaSFi3h3BbysWSPISbL16amfiTu-hu_km8QJXb_5a1MeSplshmI_2mgQ68lo_j-14UXBMiidlRpJLU8tTE0m0-Gc8tKlpVhmuljExcPucrcWkmMkG1yaVUVCth41hbRqtcKHoErdl8Zo6BuK-JtExaN9JMaFkVcWptlRQyUy7S6gQ6GIzp-7cwxrSJw-kf769gpzd-GUwH_eHzGeymLj_AwkzCzqG1rFfmArbVx_JtUV_6gf0Cltef7w |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Fast+Deterministic+Black-Box+Context-Free+Grammar+Inference&rft.au=Arefin%2C+Mohammad+Rifat&rft.au=Shetiya%2C+Suraj&rft.au=Wang%2C+Zili&rft.au=Csallner%2C+Christoph&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=1558-1225&rft.spage=1434&rft.epage=1445&rft_id=info:doi/10.1145%2F3597503.3639214&rft.externalDocID=10548677 |