Data Extraction via Semantic Regular Expression Synthesis

Many data extraction tasks of practical relevance require not only syntactic pattern matching but also semantic reasoning about the content of the underlying text. While regular expressions are very well suited for tasks that require only syntactic pattern matching, they fall short for data extracti...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of ACM on programming languages Vol. 7; no. OOPSLA2; pp. 1848 - 1877
Main Authors: Chen, Qiaochu, Banerjee, Arko, Demiralp, Çağatay, Durrett, Greg, Dillig, Işıl
Format: Journal Article
Language:English
Published: New York, NY, USA ACM 16.10.2023
Subjects:
ISSN:2475-1421, 2475-1421
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Many data extraction tasks of practical relevance require not only syntactic pattern matching but also semantic reasoning about the content of the underlying text. While regular expressions are very well suited for tasks that require only syntactic pattern matching, they fall short for data extraction tasks that involve both a syntactic and semantic component. To address this issue, we introduce semantic regexes, a generalization of regular expressions that facilitates combined syntactic and semantic reasoning about textual data. We also propose a novel learning algorithm that can synthesize semantic regexes from a small number of positive and negative examples. Our proposed learning algorithm uses a combination of neural sketch generation and compositional type-directed synthesis for fast and effective generalization from a small number of examples. We have implemented these ideas in a new tool called Smore and evaluated it on representative data extraction tasks involving several textual datasets. Our evaluation shows that semantic regexes can better support complex data extraction tasks than standard regular expressions and that our learning algorithm significantly outperforms existing tools, including state-of-the-art neural networks and program synthesis tools.
AbstractList Many data extraction tasks of practical relevance require not only syntactic pattern matching but also semantic reasoning about the content of the underlying text. While regular expressions are very well suited for tasks that require only syntactic pattern matching, they fall short for data extraction tasks that involve both a syntactic and semantic component. To address this issue, we introduce semantic regexes, a generalization of regular expressions that facilitates combined syntactic and semantic reasoning about textual data. We also propose a novel learning algorithm that can synthesize semantic regexes from a small number of positive and negative examples. Our proposed learning algorithm uses a combination of neural sketch generation and compositional type-directed synthesis for fast and effective generalization from a small number of examples. We have implemented these ideas in a new tool called Smore and evaluated it on representative data extraction tasks involving several textual datasets. Our evaluation shows that semantic regexes can better support complex data extraction tasks than standard regular expressions and that our learning algorithm significantly outperforms existing tools, including state-of-the-art neural networks and program synthesis tools.
ArticleNumber 287
Author Chen, Qiaochu
Demiralp, Çağatay
Durrett, Greg
Dillig, Işıl
Banerjee, Arko
Author_xml – sequence: 1
  givenname: Qiaochu
  orcidid: 0000-0003-4680-5157
  surname: Chen
  fullname: Chen, Qiaochu
  email: qchen@cs.utexas.edu
  organization: University of Texas at Austin, Austin, USA
– sequence: 2
  givenname: Arko
  orcidid: 0009-0005-2690-6059
  surname: Banerjee
  fullname: Banerjee, Arko
  email: arko.banerjee@utexas.edu
  organization: University of Texas at Austin, Austin, USA
– sequence: 3
  givenname: Çağatay
  orcidid: 0009-0003-2080-0443
  surname: Demiralp
  fullname: Demiralp, Çağatay
  email: cagatay@csail.mit.edu
  organization: Massachusetts Institute of Technology, Cambridge, USA
– sequence: 4
  givenname: Greg
  orcidid: 0000-0002-7061-7298
  surname: Durrett
  fullname: Durrett, Greg
  email: gdurrett@cs.utexas.edu
  organization: University of Texas at Austin, Austin, USA
– sequence: 5
  givenname: Işıl
  orcidid: 0000-0001-8006-1230
  surname: Dillig
  fullname: Dillig, Işıl
  email: isil@cs.utexas.edu
  organization: University of Texas at Austin, Austin, USA
BookMark eNptj81Lw0AQRxepYK3Fu6fcPEWzO5tscpRaP6AgWD2Hye5EV_JRdlex_70JrSLiaQbe4wfvmE26viPGTnlywblMLyETIs_ggE2FVGnMpeCTX_8Rm3v_liQJL0DmUExZcY0Bo-VncKiD7bvow2K0pha7YHX0SC_vDbqBbxx5P_L1tguv5K0_YYc1Np7m-ztjzzfLp8VdvHq4vV9crWIUSoVYKNJQm1qD0YYUVlUFvJKpBkqLRKYV5ilyVEVW56qiHMEYDgQ1CJRABmbsfLerXe-9o7rcONui25Y8Kcfoch89mPEfU9uAY9VQZ5t__LOdj7r9Gf2GX-tlYpw
CitedBy_id crossref_primary_10_1007_s10115_024_02232_1
crossref_primary_10_1145_3709677
crossref_primary_10_1007_s41060_024_00612_y
crossref_primary_10_1145_3656418
crossref_primary_10_1016_j_softx_2025_102072
crossref_primary_10_1145_3632858
crossref_primary_10_1145_3729300
Cites_doi 10.3115/v1/P14-1037
10.1016/j.websem.2009.07.002
10.1145/1926385.1926423
10.3115/v1/P15-1142
10.1145/3485477
10.1145/3385412.3386027
10.1145/3394486.3403153
10.1016/0890-5401(87)90052-6
10.1145/256167.256195
10.1145/3183713.3183729
10.1007/978-3-031-25803-9_1
10.5281/zenodo.8144182
10.18653/v1/2021.emnlp-main.747
10.1145/2993236.2993244
10.1145/2837614.2837629
10.1145/2594291.2594333
10.1145/2814270.2814310
10.1145/3510003.3510203
10.1145/3485489
10.1016/S0019-9958(78)90562-4
10.18653/v1/2022.naacl-main.396
10.1145/3453483.3454047
10.1145/3192366.3192382
10.5555/2832249.2832359
10.1007/978-3-031-04083-2_11
10.1145/3519939.3523722
10.1145/3318464.3380608
10.1145/3062341.3062351
10.1023/A:1010822518073
10.1145/2737924.2737977
10.18653/v1/2021.findings-emnlp.146
10.1145/3385412.3385988
10.18653/v1/N16-1181
10.1109/CVPR.2016.12
10.1145/2908080.2908093
10.5555/3495724.3496139
10.1145/2737924.2738007
10.1145/3485535
ContentType Journal Article
Copyright Owner/Author
Copyright_xml – notice: Owner/Author
DBID AAYXX
CITATION
DOI 10.1145/3622863
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2475-1421
EndPage 1877
ExternalDocumentID 10_1145_3622863
3622863
GrantInformation_xml – fundername: NSF (National Science Foundation)
  grantid: 1918889,1762299
  funderid: https://doi.org/10.13039/100000001
GroupedDBID AAKMM
AAYFX
ACM
ADPZR
AIKLT
ALMA_UNASSIGNED_HOLDINGS
GUFHI
LHSKQ
M~E
OK1
ROL
AAYXX
AEFXT
AEJOY
AKRVB
CITATION
ID FETCH-LOGICAL-a277t-27ec3fdfc3dcde7abbb31b45c3e59045ba85a1a796f87be8a3dd13e3f32a43ed3
ISICitedReferencesCount 9
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001087279100067&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2475-1421
IngestDate Tue Nov 18 21:33:43 EST 2025
Sun Nov 09 14:46:52 EST 2025
Fri Feb 21 01:29:13 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue OOPSLA2
Keywords Regular Expression
Program Synthesis
Language English
License This work is licensed under a Creative Commons Attribution 4.0 International License.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a277t-27ec3fdfc3dcde7abbb31b45c3e59045ba85a1a796f87be8a3dd13e3f32a43ed3
ORCID 0000-0003-4680-5157
0000-0002-7061-7298
0009-0005-2690-6059
0000-0001-8006-1230
0009-0003-2080-0443
OpenAccessLink https://dl.acm.org/doi/10.1145/3622863
PageCount 30
ParticipantIDs crossref_primary_10_1145_3622863
crossref_citationtrail_10_1145_3622863
acm_primary_3622863
PublicationCentury 2000
PublicationDate 2023-10-16
PublicationDateYYYYMMDD 2023-10-16
PublicationDate_xml – month: 10
  year: 2023
  text: 2023-10-16
  day: 16
PublicationDecade 2020
PublicationPlace New York, NY, USA
PublicationPlace_xml – name: New York, NY, USA
PublicationTitle Proceedings of ACM on programming languages
PublicationTitleAbbrev ACM PACMPL
PublicationYear 2023
Publisher ACM
Publisher_xml – name: ACM
References E Mark Gold. 1978. Complexity of automaton identification from given data. Information and Control, 37, 3 (1978), 302 – 320.
Panupong Pasupat and Percy Liang. 2014. Zero-shot Entity Extraction from Web Pages. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland. 391–401. https://doi.org/10.3115/v1/P14-1037 10.3115/v1/P14-1037
Guoqiang Zhang, Yuanchao Xu, Xipeng Shen, and Işıl Dillig. 2021. UDF to SQL Translation through Compositional Lazy Inductive Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 112, oct, 26 pages. https://doi.org/10.1145/3485489 10.1145/3485489
Richard Shin and Benjamin Van Durme. 2022. Few-Shot Semantic Parsing with Language Models Trained on Code. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States. 5417–5425. https://doi.org/10.18653/v1/2022.naacl-main.396 10.18653/v1/2022.naacl-main.396
Vu Le and Sumit Gulwani. 2014. FlashExtract: A Framework for Data Extraction by Examples. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, 542–553. isbn:978-1-4503-2784-8 https://doi.org/10.1145/2594291.2594333 10.1145/2594291.2594333
Laura Firoiu, Tim Oates, and Paul R. Cohen. 1998. Learning Regular Languages from Positive Evidence. In Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. 350–355.
Suguman Bansal, Giuseppe De Giacomo, Antonio Di Stasio, Yong Li, Moshe Y. Vardi, and Shufang Zhu. 2023. Compositional Safety LTL Synthesis. In Verified Software. Theories, Tools and Experiments.: 14th International Conference, VSTTE 2022, Trento, Italy, October 17–18, 2022, Revised Selected Papers. Springer-Verlag, Berlin, Heidelberg. 1–19. isbn:978-3-031-25802-2 https://doi.org/10.1007/978-3-031-25803-9_1 10.1007/978-3-031-25803-9_1
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam M. Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Benton C. Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier García, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Díaz, Orhan Firat, Michele Catasta, Jason Wei, Kathleen S. Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. ArXiv, abs/2204.02311 (2022).
Jacob Austin, Augustus Odena, Maxwell I. Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. CoRR, abs/2108.07732 (2021), arXiv:2108.07732. arxiv:2108.07732
R. Alquezar and A. Sanfeliu. 1994. Incremental Grammatical Inference From Positive And Negative Data Using Unbiased Finite State Automata. In In Proceedings of the ACL’02 Workshop on Unsupervised Lexical Acquisition. 291–300.
Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-Based Synthesis of Table Consolidation and Transformation Tasks from Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). Association for Computing Machinery, New York, NY, USA. 422–436. isbn:9781450349888 https://doi.org/10.1145/3062341.3062351 10.1145/3062341.3062351
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Neural Module Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 39–48. https://doi.org/10.1109/CVPR.2016.12 10.1109/CVPR.2016.12
R. L. Rivest and R. E. Schapire. 1989. Inference of Finite Automata Using Homing Sequences. In Proceedings of the Twenty-first Annual ACM Symposium on Theory of Computing (STOC ’89). ACM, 411–420.
Naman Jain, Skanda Vaidyanath, Arun Iyer, Nagarajan Natarajan, Suresh Parthasarathy, Sriram Rajamani, and Rahul Sharma. 2022. Jigsaw: Large Language Models Meet Program Synthesis. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). Association for Computing Machinery, New York, NY, USA. 1219–1231. isbn:9781450392211 https://doi.org/10.1145/3510003.3510203 10.1145/3510003.3510203
Kangjing Huang, Xiaokang Qiu, Peiyuan Shen, and Yanjun Wang. 2020. Reconciling Enumerative and Deductive Program Synthesis. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 1159–1174. isbn:9781450376136 https://doi.org/10.1145/3385412.3386027 10.1145/3385412.3386027
Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig. 2020. Multi-Modal Synthesis of Regular Expressions. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 487–502. isbn:9781450376136 https://doi.org/10.1145/3385412.3385988 10.1145/3385412.3385988
Rajesh Parekh and Vasant Honavar. 1996. An incremental interactive algorithm for regular grammar inference. In Grammatical Interference: Learning Syntax from Sentences, Laurent Miclet and Colin de la Higuera (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 238–249. isbn:978-3-540-70678-6
Panupong Pasupat and Percy Liang. 2015. Compositional Semantic Parsing on Semi-Structured Tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China. 1470–1480. https://doi.org/10.3115/v1/P15-1142 10.3115/v1/P15-1142
Alexander L. Gaunt, Marc Brockschmidt, Nate Kushman, and Daniel Tarlow. 2017. Differentiable Programs with Neural Libraries. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML’17). JMLR.org, 1213–1222.
Kia Rahmani, Mohammad Raza, Sumit Gulwani, Vu Le, Daniel Morris, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2021. Multi-Modal Program Inference: A Marriage of Pre-Trained Language Models and Component-Based Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 158, oct, 29 pages. https://doi.org/10.1145/3485535 10.1145/3485535
Alexander Dunn, John Dagdelen, Nicholas Walker, Sanghoon Lee, Andrew S. Rosen, Gerbrand Ceder, Kristin Persson, and Anubhav Jain. 2022. Structured information extraction from complex scientific text with fine-tuned large language models. arXiv, 2212.05238 (2022).
John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-Output Examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 229–239. isbn:9781450334686 https://doi.org/10.1145/2737924.2737977 10.1145/2737924.2737977
Dexter Kozen. 1997. Kleene Algebra with Tests. ACM Trans. Program. Lang. Syst., 19, 3 (1997), may, 427–443. issn:0164-0925 https://doi.org/10.1145/256167.256195 10.1145/256167.256195
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. CoRR, abs/2107.03374 (2021), arXiv:2107.03374. arxiv:2107.03374
Bill Yuchen Lin, Ying Sheng, Nguyen Vo, and Sandeep Tata. 2020. FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’20). Association for Computing Machinery, New York, NY, USA. 1092–1102. isbn:9781450379984 https://doi.org/10.1145/3394486.3403153 10.1145/3394486.3403153
Gabriel Poesia, Alex Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. 2022. Synchromesh: Reliable Code Generation from Pre-trained Language Models. In International Conference on Learning Representations. https://openreview.net/forum?id=KmtVD97J43e
Osbert Bastani, Jeevana Priya Inala, and Armando Solar-Lezama. 2022. Interpretable, Verifiable, and Robust Reinforcement Learning via Program Synthesis. Springer International Publishing, Cham. 207–228. isbn:978-3-031-04083-2 https://doi.org/10.1007/978-3-031-04083-2_11 10.1007/978-3-031-04083-2_11
OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt Accessed on March 16, 2023
Mina Lee, Sunbeom So, and Hakjoo Oh. 20
Gaunt Alexander L. (e_1_2_1_22_1) 2017; 70
e_1_2_1_41_1
e_1_2_1_24_1
e_1_2_1_45_1
e_1_2_1_43_1
e_1_2_1_28_1
e_1_2_1_49_1
e_1_2_1_47_1
Alquezar R. (e_1_2_1_1_1)
Huang Jiani (e_1_2_1_26_1) 2020; 4506
Austin Jacob (e_1_2_1_5_1) 2021
e_1_2_1_31_1
e_1_2_1_54_1
e_1_2_1_8_1
e_1_2_1_6_1
e_1_2_1_12_1
Nijkamp Erik (e_1_2_1_35_1) 2023
e_1_2_1_50_1
e_1_2_1_4_1
e_1_2_1_10_1
e_1_2_1_33_1
e_1_2_1_52_1
e_1_2_1_2_1
Parekh Rajesh (e_1_2_1_38_1)
e_1_2_1_39_1
Introducing AI. (e_1_2_1_36_1) 2023
e_1_2_1_37_1
e_1_2_1_18_1
Poesia Gabriel (e_1_2_1_42_1) 2022
Rivest R. L. (e_1_2_1_48_1)
Cheng Zhoujun (e_1_2_1_14_1) 2023
e_1_2_1_40_1
e_1_2_1_23_1
e_1_2_1_46_1
e_1_2_1_21_1
Morris James Hiram (e_1_2_1_34_1)
e_1_2_1_44_1
e_1_2_1_27_1
Dunn Alexander (e_1_2_1_16_1) 2022
e_1_2_1_25_1
Zhou Shuyan (e_1_2_1_56_1) 2023
Firoiu Laura (e_1_2_1_20_1)
e_1_2_1_29_1
e_1_2_1_7_1
e_1_2_1_30_1
e_1_2_1_55_1
Zhuo Terry Yue (e_1_2_1_57_1) 2023
e_1_2_1_3_1
e_1_2_1_13_1
e_1_2_1_11_1
e_1_2_1_32_1
e_1_2_1_53_1
e_1_2_1_17_1
e_1_2_1_15_1
Brown Tom (e_1_2_1_9_1) 1877
e_1_2_1_19_1
Valkov Lazar (e_1_2_1_51_1) 2018
References_xml – reference: Kia Rahmani, Mohammad Raza, Sumit Gulwani, Vu Le, Daniel Morris, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2021. Multi-Modal Program Inference: A Marriage of Pre-Trained Language Models and Component-Based Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 158, oct, 29 pages. https://doi.org/10.1145/3485535 10.1145/3485535
– reference: R. Alquezar and A. Sanfeliu. 1994. Incremental Grammatical Inference From Positive And Negative Data Using Unbiased Finite State Automata. In In Proceedings of the ACL’02 Workshop on Unsupervised Lexical Acquisition. 291–300.
– reference: Shuyan Zhou, Uri Alon, Frank F. Xu, Zhengbao Jiang, and Graham Neubig. 2023. DocPrompting: Generating Code by Retrieving the Docs. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=ZTCxT2t2Ru
– reference: Suguman Bansal, Giuseppe De Giacomo, Antonio Di Stasio, Yong Li, Moshe Y. Vardi, and Shufang Zhu. 2023. Compositional Safety LTL Synthesis. In Verified Software. Theories, Tools and Experiments.: 14th International Conference, VSTTE 2022, Trento, Italy, October 17–18, 2022, Revised Selected Papers. Springer-Verlag, Berlin, Heidelberg. 1–19. isbn:978-3-031-25802-2 https://doi.org/10.1007/978-3-031-25803-9_1 10.1007/978-3-031-25803-9_1
– reference: Ameesh Shah, Eric Zhan, Jennifer J. Sun, Abhinav Verma, Yisong Yue, and Swarat Chaudhuri. 2020. Learning Differentiable Programs with Admissible Neural Heuristics. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS’20). Curran Associates Inc., Red Hook, NY, USA. Article 415, 13 pages. isbn:9781713829546
– reference: Richard Shin and Benjamin Van Durme. 2022. Few-Shot Semantic Parsing with Language Models Trained on Code. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States. 5417–5425. https://doi.org/10.18653/v1/2022.naacl-main.396 10.18653/v1/2022.naacl-main.396
– reference: Osbert Bastani, Jeevana Priya Inala, and Armando Solar-Lezama. 2022. Interpretable, Verifiable, and Robust Reinforcement Learning via Program Synthesis. Springer International Publishing, Cham. 207–228. isbn:978-3-031-04083-2 https://doi.org/10.1007/978-3-031-04083-2_11 10.1007/978-3-031-04083-2_11
– reference: Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program Synthesis Using Conflict-Driven Learning. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA. 420–435. isbn:9781450356985 https://doi.org/10.1145/3192366.3192382 10.1145/3192366.3192382
– reference: Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=iaYcJKpY2B_
– reference: Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). Association for Computing Machinery, New York, NY, USA. 522–538. isbn:9781450342612 https://doi.org/10.1145/2908080.2908093 10.1145/2908080.2908093
– reference: Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, and Tao Yu. 2023. Binding Language Models in Symbolic Languages. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=lH1PV42cbF
– reference: John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-Output Examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 229–239. isbn:9781450334686 https://doi.org/10.1145/2737924.2737977 10.1145/2737924.2737977
– reference: Mohammad Raza and Sumit Gulwani. 2020. Web Data Extraction Using Hybrid Program Synthesis: A Combination of Top-down and Bottom-up Inference. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA. 1967–1978. isbn:9781450367356 https://doi.org/10.1145/3318464.3380608 10.1145/3318464.3380608
– reference: Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia - A crystallization point for the Web of Data. Journal of Web Semantics, 7, 3 (2009), 154–165. issn:1570-8268 https://doi.org/10.1016/j.websem.2009.07.002 The Web of Data 10.1016/j.websem.2009.07.002
– reference: Rajesh Parekh and Vasant Honavar. 2001. Learning DFA from Simple Examples. Machine Learning, 44, 1 (2001), 01 Jul, 9–35. issn:1573-0565 https://doi.org/10.1023/A:1010822518073 10.1023/A:1010822518073
– reference: Guoqiang Zhang, Yuanchao Xu, Xipeng Shen, and Işıl Dillig. 2021. UDF to SQL Translation through Compositional Lazy Inductive Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 112, oct, 26 pages. https://doi.org/10.1145/3485489 10.1145/3485489
– reference: Dana Angluin. 1987. Learning Regular Sets from Queries and Counterexamples. Inf. Comput., 75, 2 (1987), 87–106.
– reference: Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Learning to Compose Neural Networks for Question Answering. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California. 1545–1554. https://doi.org/10.18653/v1/N16-1181 10.18653/v1/N16-1181
– reference: Qiaochu Chen, Aaron Lamoreaux, Xinyu Wang, Greg Durrett, Osbert Bastani, and Isil Dillig. 2021. Web Question Answering with Neurosymbolic Program Synthesis. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2021). Association for Computing Machinery, New York, NY, USA. 328–343. isbn:9781450383912 https://doi.org/10.1145/3453483.3454047 10.1145/3453483.3454047
– reference: Vu Le and Sumit Gulwani. 2014. FlashExtract: A Framework for Data Extraction by Examples. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, 542–553. isbn:978-1-4503-2784-8 https://doi.org/10.1145/2594291.2594333 10.1145/2594291.2594333
– reference: Xi Ye, Qiaochu Chen, Isil Dillig, and Greg Durrett. 2021. Optimal Neural Program Synthesis from Multimodal Specifications. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic. 1691–1704. https://doi.org/10.18653/v1/2021.findings-emnlp.146 10.18653/v1/2021.findings-emnlp.146
– reference: Michael Greenberg, Ryan Beckett, and Eric Campbell. 2022. Kleene Algebra modulo Theories: A Framework for Concrete KATs. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA. 594–608. isbn:9781450392655 https://doi.org/10.1145/3519939.3523722 10.1145/3519939.3523722
– reference: Alexander L. Gaunt, Marc Brockschmidt, Nate Kushman, and Daniel Tarlow. 2017. Differentiable Programs with Neural Libraries. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML’17). JMLR.org, 1213–1222.
– reference: Mohammad Raza, Sumit Gulwani, and Natasa Milic-Frayling. 2015. Compositional Program Synthesis from Natural Language and Examples. In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI’15). AAAI Press, 792–800. isbn:9781577357384
– reference: Dexter Kozen. 1997. Kleene Algebra with Tests. ACM Trans. Program. Lang. Syst., 19, 3 (1997), may, 427–443. issn:0164-0925 https://doi.org/10.1145/256167.256195 10.1145/256167.256195
– reference: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. CoRR, abs/2107.03374 (2021), arXiv:2107.03374. arxiv:2107.03374
– reference: Rajesh Parekh and Vasant Honavar. 1996. An incremental interactive algorithm for regular grammar inference. In Grammatical Interference: Learning Syntax from Sentences, Laurent Miclet and Colin de la Higuera (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 238–249. isbn:978-3-540-70678-6
– reference: Panupong Pasupat and Percy Liang. 2014. Zero-shot Entity Extraction from Web Pages. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland. 391–401. https://doi.org/10.3115/v1/P14-1037 10.3115/v1/P14-1037
– reference: Jiani Huang, Calvin Smith, Osbert Bastani, Rishabh Singh, Aws Albarghouthi, and Mayur Naik. 2020. Generating Programmatic Referring Expressions via Program Synthesis. In Proceedings of the 37th International Conference on Machine Learning, Hal Daumé III and Aarti Singh (Eds.) (Proceedings of Machine Learning Research, Vol. 119). PMLR, 4495–4506. https://proceedings.mlr.press/v119/huang20h.html
– reference: Terry Yue Zhuo, Zhuang Li, Yujin Huang, Fatemeh Shiri, Weiqing Wang, Gholamreza Haffari, and Yuan-Fang Li. 2023. On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex. arXiv, 2301.12868 (2023).
– reference: Jacob Austin, Augustus Odena, Maxwell I. Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. CoRR, abs/2108.07732 (2021), arXiv:2108.07732. arxiv:2108.07732
– reference: Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, and Christopher Ré. 2018. Fonduer: Knowledge Base Construction from Richly Formatted Data. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD ’18). Association for Computing Machinery, New York, NY, USA. 1301–1316. isbn:9781450347037 https://doi.org/10.1145/3183713.3183729 10.1145/3183713.3183729
– reference: Mina Lee, Sunbeom So, and Hakjoo Oh. 2016. Synthesizing Regular Expressions from Examples for Introductory Automata Assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2016). Association for Computing Machinery, New York, NY, USA. 70–80. isbn:9781450344463 https://doi.org/10.1145/2993236.2993244 10.1145/2993236.2993244
– reference: R. L. Rivest and R. E. Schapire. 1989. Inference of Finite Automata Using Homing Sequences. In Proceedings of the Twenty-first Annual ACM Symposium on Theory of Computing (STOC ’89). ACM, 411–420.
– reference: Lazar Valkov, Dipak Chaudhari, Akash Srivastava, Charles Sutton, and Swarat Chaudhuri. 2018. HOUDINI: Lifelong Learning as Program Synthesis. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). 31, Curran Associates, Inc.. https://proceedings.neurips.cc/paper/2018/file/edc27f139c3b4e4bb29d1cdbc45663f9-Paper.pdf
– reference: Gabriel Poesia, Alex Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. 2022. Synchromesh: Reliable Code Generation from Pre-trained Language Models. In International Conference on Learning Representations. https://openreview.net/forum?id=KmtVD97J43e
– reference: Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig. 2020. Multi-Modal Synthesis of Regular Expressions. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 487–502. isbn:9781450376136 https://doi.org/10.1145/3385412.3385988 10.1145/3385412.3385988
– reference: Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. 2016. Example-Directed Synthesis: A Type-Theoretic Interpretation. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). Association for Computing Machinery, New York, NY, USA. 802–815. isbn:9781450335492 https://doi.org/10.1145/2837614.2837629 10.1145/2837614.2837629
– reference: Kangjing Huang, Xiaokang Qiu, Peiyuan Shen, and Yanjun Wang. 2020. Reconciling Enumerative and Deductive Program Synthesis. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 1159–1174. isbn:9781450376136 https://doi.org/10.1145/3385412.3386027 10.1145/3385412.3386027
– reference: Alexander Dunn, John Dagdelen, Nicholas Walker, Sanghoon Lee, Andrew S. Rosen, Gerbrand Ceder, Kristin Persson, and Anubhav Jain. 2022. Structured information extraction from complex scientific text with fine-tuned large language models. arXiv, 2212.05238 (2022).
– reference: Bill Yuchen Lin, Ying Sheng, Nguyen Vo, and Sandeep Tata. 2020. FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’20). Association for Computing Machinery, New York, NY, USA. 1092–1102. isbn:9781450379984 https://doi.org/10.1145/3394486.3403153 10.1145/3394486.3403153
– reference: James Hiram Morris. 1968. Lambda-calculus models of programming languages. Ph. D. Dissertation. Massachusetts Institute of Technology. Cambridge.
– reference: Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 1877–1901.
– reference: Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Neural Module Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 39–48. https://doi.org/10.1109/CVPR.2016.12 10.1109/CVPR.2016.12
– reference: Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam M. Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Benton C. Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier García, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Díaz, Orhan Firat, Michele Catasta, Jason Wei, Kathleen S. Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. ArXiv, abs/2204.02311 (2022).
– reference: Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-Example-Directed Program Synthesis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 619–630. isbn:9781450334686 https://doi.org/10.1145/2737924.2738007 10.1145/2737924.2738007
– reference: OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt Accessed on March 16, 2023
– reference: Naman Jain, Skanda Vaidyanath, Arun Iyer, Nagarajan Natarajan, Suresh Parthasarathy, Sriram Rajamani, and Rahul Sharma. 2022. Jigsaw: Large Language Models Meet Program Synthesis. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). Association for Computing Machinery, New York, NY, USA. 1219–1231. isbn:9781450392211 https://doi.org/10.1145/3510003.3510203 10.1145/3510003.3510203
– reference: Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-Based Synthesis of Table Consolidation and Transformation Tasks from Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). Association for Computing Machinery, New York, NY, USA. 422–436. isbn:9781450349888 https://doi.org/10.1145/3062341.3062351 10.1145/3062341.3062351
– reference: Qiaochu Chen, Arko Banerjee, Çağatay Demiralp, Greg Durrett, and Isil Dillig. 2023. Data Extraction via Semantic Regular Expression Synthesis. https://doi.org/10.5281/zenodo.8144182 10.5281/zenodo.8144182
– reference: Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A Framework for Inductive Program Synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). Association for Computing Machinery, New York, NY, USA. 107–126. isbn:9781450336895 https://doi.org/10.1145/2814270.2814310 10.1145/2814270.2814310
– reference: Chengyue Jiang, Zijian Jin, and Kewei Tu. 2021. Neuralizing Regular Expressions for Slot Filling. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic. 9481–9498. https://doi.org/10.18653/v1/2021.emnlp-main.747 10.18653/v1/2021.emnlp-main.747
– reference: E Mark Gold. 1978. Complexity of automaton identification from given data. Information and Control, 37, 3 (1978), 302 – 320.
– reference: Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-Output Examples. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). Association for Computing Machinery, New York, NY, USA. 317–330. isbn:9781450304900 https://doi.org/10.1145/1926385.1926423 10.1145/1926385.1926423
– reference: Laura Firoiu, Tim Oates, and Paul R. Cohen. 1998. Learning Regular Languages from Positive Evidence. In Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. 350–355.
– reference: Panupong Pasupat and Percy Liang. 2015. Compositional Semantic Parsing on Semi-Structured Tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China. 1470–1480. https://doi.org/10.3115/v1/P15-1142 10.3115/v1/P15-1142
– reference: Gust Verbruggen, Vu Le, and Sumit Gulwani. 2021. Semantic Programming by Example with Pre-Trained Models. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 100, oct, 25 pages. https://doi.org/10.1145/3485477 10.1145/3485477
– volume: 4506
  volume-title: Proceedings of the 37th International Conference on Machine Learning, Hal Daumé III and Aarti Singh (Eds.) (Proceedings of Machine Learning Research
  year: 2020
  ident: e_1_2_1_26_1
– ident: e_1_2_1_40_1
  doi: 10.3115/v1/P14-1037
– volume-title: Grammatical Interference: Learning Syntax from Sentences, Laurent Miclet and Colin de la Higuera (Eds.)
  ident: e_1_2_1_38_1
– ident: e_1_2_1_8_1
  doi: 10.1016/j.websem.2009.07.002
– volume-title: https://openai.com/blog/chatgpt Accessed on
  year: 2023
  ident: e_1_2_1_36_1
– ident: e_1_2_1_25_1
  doi: 10.1145/1926385.1926423
– ident: e_1_2_1_41_1
  doi: 10.3115/v1/P15-1142
– volume-title: Program Synthesis with Large Language Models. CoRR, abs/2108.07732
  year: 2021
  ident: e_1_2_1_5_1
– volume-title: Advances in Neural Information Processing Systems
  year: 1877
  ident: e_1_2_1_9_1
– ident: e_1_2_1_52_1
  doi: 10.1145/3485477
– volume-title: International Conference on Learning Representations. https://openreview.net/forum?id=KmtVD97J43e
  year: 2022
  ident: e_1_2_1_42_1
– ident: e_1_2_1_27_1
  doi: 10.1145/3385412.3386027
– ident: e_1_2_1_33_1
  doi: 10.1145/3394486.3403153
– ident: e_1_2_1_4_1
  doi: 10.1016/0890-5401(87)90052-6
– ident: e_1_2_1_30_1
  doi: 10.1145/256167.256195
– ident: e_1_2_1_53_1
  doi: 10.1145/3183713.3183729
– ident: e_1_2_1_6_1
  doi: 10.1007/978-3-031-25803-9_1
– volume-title: The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=ZTCxT2t2Ru
  year: 2023
  ident: e_1_2_1_56_1
– ident: e_1_2_1_11_1
  doi: 10.5281/zenodo.8144182
– ident: e_1_2_1_29_1
  doi: 10.18653/v1/2021.emnlp-main.747
– ident: e_1_2_1_32_1
  doi: 10.1145/2993236.2993244
– ident: e_1_2_1_21_1
  doi: 10.1145/2837614.2837629
– ident: e_1_2_1_31_1
  doi: 10.1145/2594291.2594333
– ident: e_1_2_1_10_1
– volume-title: HOUDINI: Lifelong Learning as Program Synthesis. In Advances in Neural Information Processing Systems
  year: 2018
  ident: e_1_2_1_51_1
– ident: e_1_2_1_44_1
  doi: 10.1145/2814270.2814310
– ident: e_1_2_1_28_1
  doi: 10.1145/3510003.3510203
– ident: e_1_2_1_55_1
  doi: 10.1145/3485489
– ident: e_1_2_1_23_1
  doi: 10.1016/S0019-9958(78)90562-4
– ident: e_1_2_1_50_1
  doi: 10.18653/v1/2022.naacl-main.396
– ident: e_1_2_1_12_1
  doi: 10.1145/3453483.3454047
– ident: e_1_2_1_17_1
  doi: 10.1145/3192366.3192382
– ident: e_1_2_1_47_1
  doi: 10.5555/2832249.2832359
– volume-title: Structured information extraction from complex scientific text with fine-tuned large language models. arXiv, 2212.05238
  year: 2022
  ident: e_1_2_1_16_1
– ident: e_1_2_1_7_1
  doi: 10.1007/978-3-031-04083-2_11
– ident: e_1_2_1_24_1
  doi: 10.1145/3519939.3523722
– ident: e_1_2_1_46_1
  doi: 10.1145/3318464.3380608
– volume-title: Binding Language Models in Symbolic Languages. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=lH1PV42cbF
  year: 2023
  ident: e_1_2_1_14_1
– ident: e_1_2_1_18_1
  doi: 10.1145/3062341.3062351
– ident: e_1_2_1_39_1
  doi: 10.1023/A:1010822518073
– ident: e_1_2_1_15_1
– ident: e_1_2_1_19_1
  doi: 10.1145/2737924.2737977
– ident: e_1_2_1_54_1
  doi: 10.18653/v1/2021.findings-emnlp.146
– ident: e_1_2_1_13_1
  doi: 10.1145/3385412.3385988
– volume-title: Incremental Grammatical Inference From Positive And Negative Data Using Unbiased Finite State Automata. In In Proceedings of the ACL’02 Workshop on Unsupervised Lexical Acquisition. 291–300
  ident: e_1_2_1_1_1
– ident: e_1_2_1_2_1
  doi: 10.18653/v1/N16-1181
– ident: e_1_2_1_3_1
  doi: 10.1109/CVPR.2016.12
– ident: e_1_2_1_43_1
  doi: 10.1145/2908080.2908093
– ident: e_1_2_1_49_1
  doi: 10.5555/3495724.3496139
– volume-title: Lambda-calculus models of programming languages. Ph. D. Dissertation
  ident: e_1_2_1_34_1
– volume-title: CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=iaYcJKpY2B_
  year: 2023
  ident: e_1_2_1_35_1
– ident: e_1_2_1_37_1
  doi: 10.1145/2737924.2738007
– volume-title: On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex. arXiv, 2301.12868
  year: 2023
  ident: e_1_2_1_57_1
– volume-title: Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. 350–355
  ident: e_1_2_1_20_1
– volume: 70
  volume-title: Proceedings of the 34th International Conference on Machine Learning -
  year: 2017
  ident: e_1_2_1_22_1
– ident: e_1_2_1_45_1
  doi: 10.1145/3485535
– volume-title: Proceedings of the Twenty-first Annual ACM Symposium on Theory of Computing (STOC ’89)
  ident: e_1_2_1_48_1
SSID ssj0001934839
Score 2.3558238
Snippet Many data extraction tasks of practical relevance require not only syntactic pattern matching but also semantic reasoning about the content of the underlying...
SourceID crossref
acm
SourceType Enrichment Source
Index Database
Publisher
StartPage 1848
SubjectTerms Domain specific languages
Programming by example
Software and its engineering
SubjectTermsDisplay Software and its engineering -- Domain specific languages
Software and its engineering -- Programming by example
Title Data Extraction via Semantic Regular Expression Synthesis
URI https://dl.acm.org/doi/10.1145/3622863
Volume 7
WOSCitedRecordID wos001087279100067&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2475-1421
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001934839
  issn: 2475-1421
  databaseCode: M~E
  dateStart: 20170101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Pb9MwFLfKxoELsAGiMJAPiAsKUNup7WPVdeKwP0UMabfKcRwWaNMpTaty4UPwifccO27oJjEOXKzIsa32vV-e33t-zw-hN2BUmVimJGKEx9BIHknNwWrNCNNgcLBEZ3WxCX56Ki4u5LjT-d3kwqymvCjEei2v_iuroQ-YbVNn_4HdYVHogGdgOrTAdmjvxPhDVal3o3VV-irgq1yBRJgBBeuY-W913Olo7QNg4ev-WYASuMgXbT11HPa1OtRjMDyxpwo-mGtm3QuNozPo5EOf5_E5V3N9udy4SAtTfnfhPoPyxzxozmaWl2pal8ezp_WCK6vYyiP4-cHNf7gMgcQ2mabtoiB1sJvLoHSSjDDAQY-5VOj35pY-L4p5C3FnZ-MvxwPSEq5gjIrWRt0TrgDMzU2A2fsyYGcmwgvPP67Z3tr-QlCiS9GOJ37iPbQL8JU2TPDkV8tvJykTdYG68A9cLrad-8HPtaqOnrVUnZbOcv4YPfTGBh44kOyhjin20aOmkAf2cv0JkhYzeIMZDJjBDWawxwzeYAYHzDxFX49G58NPka-pESnCeRURbjTN0kzTVKeGqyRJaC9hsabwyYJ6nygRq57isp8JnhihaJpaR3lGiWLUpPQZ2inmhXmOcCb6xh7cSxIbJvsc3rNEgk7Upx8NrNZF-0CEyZW7NaUhaxe9bYgy0f4aelsNZTrZon8X4TCwWWNryIu_D3mJHmwQeYB2qnJpXqH7elXli_J1zdprDT9yKA
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+Extraction+via+Semantic+Regular+Expression+Synthesis&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Chen%2C+Qiaochu&rft.au=Banerjee%2C+Arko&rft.au=Demiralp%2C+%C3%87a%C4%9Fatay&rft.au=Durrett%2C+Greg&rft.date=2023-10-16&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=7&rft.issue=OOPSLA2&rft.spage=1848&rft.epage=1877&rft_id=info:doi/10.1145%2F3622863&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3622863
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon