InfeRE: Step-by-Step Regex Generation via Chain of Inference

Automatically generating regular expressions (abbrev. regexes) from natural language description (NL2RE) has been an emerging research area. Prior studies treat regex as a linear sequence of tokens and generate the final expressions autoregressively in a single pass. They did not take into account t...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE/ACM International Conference on Automated Software Engineering : [proceedings] s. 1505 - 1515
Hlavní autoři: Zhang, Shuai, Gu, Xiaodong, Chen, Yuting, Shen, Beijun
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 11.09.2023
Témata:
ISSN:2643-1572
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Automatically generating regular expressions (abbrev. regexes) from natural language description (NL2RE) has been an emerging research area. Prior studies treat regex as a linear sequence of tokens and generate the final expressions autoregressively in a single pass. They did not take into account the step-by-step internal text-matching processes behind the final results. This significantly hinders the efficacy and interpretability of regex generation by neural language models. In this paper, we propose a new paradigm called InfeRE, which decomposes the generation of regexes into chains of step-bystep inference. To enhance the robustness, we introduce a self-consistency decoding mechanism that ensembles multiple outputs sampled from different models. We evaluate InfeRE on two publicly available datasets, NL-RX-Turk and KB13, and compare the results with state-of-the-art approaches and the popular tree-based generation approach TRANX. Experimental results show that InfeRE substantially outperforms previous baselines, yielding 16.3% and 14.7% improvement in DFA@5 accuracy on two datasets, respectively.
AbstractList Automatically generating regular expressions (abbrev. regexes) from natural language description (NL2RE) has been an emerging research area. Prior studies treat regex as a linear sequence of tokens and generate the final expressions autoregressively in a single pass. They did not take into account the step-by-step internal text-matching processes behind the final results. This significantly hinders the efficacy and interpretability of regex generation by neural language models. In this paper, we propose a new paradigm called InfeRE, which decomposes the generation of regexes into chains of step-bystep inference. To enhance the robustness, we introduce a self-consistency decoding mechanism that ensembles multiple outputs sampled from different models. We evaluate InfeRE on two publicly available datasets, NL-RX-Turk and KB13, and compare the results with state-of-the-art approaches and the popular tree-based generation approach TRANX. Experimental results show that InfeRE substantially outperforms previous baselines, yielding 16.3% and 14.7% improvement in DFA@5 accuracy on two datasets, respectively.
Author Shen, Beijun
Zhang, Shuai
Gu, Xiaodong
Chen, Yuting
Author_xml – sequence: 1
  givenname: Shuai
  surname: Zhang
  fullname: Zhang, Shuai
  email: zhangshuai2000@sjtu.edu.cn
  organization: School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University,Shanghai,China
– sequence: 2
  givenname: Xiaodong
  surname: Gu
  fullname: Gu, Xiaodong
  email: xiaodong.gu@sjtu.edu.cn
  organization: School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University,Shanghai,China
– sequence: 3
  givenname: Yuting
  surname: Chen
  fullname: Chen, Yuting
  email: chenyt@sjtu.edu.cn
  organization: School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University,Shanghai,China
– sequence: 4
  givenname: Beijun
  surname: Shen
  fullname: Shen, Beijun
  email: bjshen@sjtu.edu.cn
  organization: School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University,Shanghai,China
BookMark eNotj81Kw0AURkdRsK19Al3MC0y8d34z4qaUWAsFodV1mWbuaEQnJQli394WXR2-xfngjNlFbjMxdoNQIIK_m20qY6X0hQSpCgBEPGNT73ypDCjpvdXnbCStVgKNk1ds3PcfAOY43Ig9LHOidXXPNwPtxe4gTuRreqMfvqBMXRiaNvPvJvD5e2gybxM_KR3lmq7ZZQqfPU3_OWGvj9XL_EmsnhfL-Wwlgiz1IHx0ZGOSFgNA8Ek5o5PSaF2tHCCUZENpIu6i87VJiWQqdZQmxeiPPUFN2O3fb0NE233XfIXusEWQvtTWql_83kjc
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ASE56229.2023.00111
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350329964
EISSN 2643-1572
EndPage 1515
ExternalDocumentID 10298466
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62232003,62102244,62032004,62272296
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IM
6IN
6J9
AAJGR
AAWTH
ABLEC
ACREN
ADYOE
ADZIZ
AFYQB
ALMA_UNASSIGNED_HOLDINGS
AMTXH
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-a284t-9d7e6df261a00a9f3754f34167c370108e6a85d1bd79c5ffe2f84d25fdd9001a3
IEDL.DBID RIE
ISICitedReferencesCount 5
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001103357200120&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:32:41 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a284t-9d7e6df261a00a9f3754f34167c370108e6a85d1bd79c5ffe2f84d25fdd9001a3
PageCount 11
ParticipantIDs ieee_primary_10298466
PublicationCentury 2000
PublicationDate 2023-Sept.-11
PublicationDateYYYYMMDD 2023-09-11
PublicationDate_xml – month: 09
  year: 2023
  text: 2023-Sept.-11
  day: 11
PublicationDecade 2020
PublicationTitle IEEE/ACM International Conference on Automated Software Engineering : [proceedings]
PublicationTitleAbbrev ASE
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0051577
ssib057256115
Score 2.3025491
Snippet Automatically generating regular expressions (abbrev. regexes) from natural language description (NL2RE) has been an emerging research area. Prior studies...
SourceID ieee
SourceType Publisher
StartPage 1505
SubjectTerms Benchmark testing
Chain of Inference
Codes
Decoding
Natural languages
Regex Generation
Robustness
Self-Consistency Decoding
Software engineering
Task analysis
Title InfeRE: Step-by-Step Regex Generation via Chain of Inference
URI https://ieeexplore.ieee.org/document/10298466
WOSCitedRecordID wos001103357200120&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ09T8MwEIYtWjEwlY8ivuWB1RAncRwjFlS1oktVFZC6VbbPhi4pKmkF_55zmhQxMDDFiiI5Osd67uK7ewm5RkJDBlww67VjaSw1QwgmLLJpZhTEFpyuxCbkaJRPp2pcF6tXtTDOuSr5zN2EYXWWDwu7Cr_KcIfHCnmZtUhLSrkp1mo-HiER3pxvfV_ktJR1myEeqduHpz6iPg61KXFSnT_wX4IqFU8GnX--yT7p_lTm0fGWOQdkxxWHpNNIM9B6px6R-yE-Punf0ZDFxcwXC1c6ca_uk246TYcFoeu5pr03PS_owtNhM0OXvAz6z71HVgslMI10KZkC6TLwGAzpKNLKB1lbj3jKpE0kBly5y3QugBuQygrvXezzFGLhARRaQifHpF0sCndCKDdGo8sHBt2mFH2b3EQeY5BMmMSCAHFKusEas_dNL4xZY4izP-6fk71g8JBhwfkFaZfLlbsku3Zdzj-WV9UKfgPqdJma
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PT8IwFMcbRRM94Q-Mv-3Ba3Xt1nU1XgyBQERCEBNupD-VyzAIRP97X8fAePDgac2ypMvrms9763vvi9A1ENqmlnJivHIkYUIRgGBMIpOkWlpmrFOF2ITodrPhUPbKYvWiFsY5VySfuZswLM7y7cTMw68y2OFMAi_TTbTFk4TRZbnW6vPhAvBN6dr7BVILUTYaopG8fXhuAOxZqE5hcXECQX9JqhREaVb_-S57qPZTm4d7a-rsow2XH6DqSpwBl3v1EN234fF-4w6HPC6iv0i44r57dZ942Ws6LAlejBWuv6lxjicet1cz1NBLszGot0gplUAU8GVGpBUutR7CIRVFSvogbOsBUKkwsYCQK3Opyril2gppuPeO-SyxjHtrJVhCxUeokk9yd4ww1VqB02c1OE4JeDeZjjxEISnXsbHc8hNUC9YYvS-7YYxWhjj94_4V2mkNnjqjTrv7eIZ2g_FDvgWl56gym87dBdo2i9n4Y3pZrOY3HM-c4Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=InfeRE%3A+Step-by-Step+Regex+Generation+via+Chain+of+Inference&rft.au=Zhang%2C+Shuai&rft.au=Gu%2C+Xiaodong&rft.au=Chen%2C+Yuting&rft.au=Shen%2C+Beijun&rft.date=2023-09-11&rft.pub=IEEE&rft.eissn=2643-1572&rft.spage=1505&rft.epage=1515&rft_id=info:doi/10.1109%2FASE56229.2023.00111&rft.externalDocID=10298466