InfeRE: Step-by-Step Regex Generation via Chain of Inference

Automatically generating regular expressions (abbrev. regexes) from natural language description (NL2RE) has been an emerging research area. Prior studies treat regex as a linear sequence of tokens and generate the final expressions autoregressively in a single pass. They did not take into account t...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE/ACM International Conference on Automated Software Engineering : [proceedings] s. 1505 - 1515
Hlavní autoři:	Zhang, Shuai, Gu, Xiaodong, Chen, Yuting, Shen, Beijun
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 11.09.2023
Témata:	Benchmark testing Chain of Inference Codes Decoding Natural languages Regex Generation Robustness Self-Consistency Decoding Software engineering Task analysis
ISSN:	2643-1572
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Automatically generating regular expressions (abbrev. regexes) from natural language description (NL2RE) has been an emerging research area. Prior studies treat regex as a linear sequence of tokens and generate the final expressions autoregressively in a single pass. They did not take into account the step-by-step internal text-matching processes behind the final results. This significantly hinders the efficacy and interpretability of regex generation by neural language models. In this paper, we propose a new paradigm called InfeRE, which decomposes the generation of regexes into chains of step-bystep inference. To enhance the robustness, we introduce a self-consistency decoding mechanism that ensembles multiple outputs sampled from different models. We evaluate InfeRE on two publicly available datasets, NL-RX-Turk and KB13, and compare the results with state-of-the-art approaches and the popular tree-based generation approach TRANX. Experimental results show that InfeRE substantially outperforms previous baselines, yielding 16.3% and 14.7% improvement in DFA@5 accuracy on two datasets, respectively.
ISSN:	2643-1572
DOI:	10.1109/ASE56229.2023.00111