Data Extraction via Semantic Regular Expression Synthesis
Many data extraction tasks of practical relevance require not only syntactic pattern matching but also semantic reasoning about the content of the underlying text. While regular expressions are very well suited for tasks that require only syntactic pattern matching, they fall short for data extracti...
Saved in:
| Published in: | Proceedings of ACM on programming languages Vol. 7; no. OOPSLA2; pp. 1848 - 1877 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York, NY, USA
ACM
16.10.2023
|
| Subjects: | |
| ISSN: | 2475-1421, 2475-1421 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Many data extraction tasks of practical relevance require not only syntactic pattern matching but also semantic reasoning about the content of the underlying text. While regular expressions are very well suited for tasks that require only syntactic pattern matching, they fall short for data extraction tasks that involve both a syntactic and semantic component. To address this issue, we introduce semantic regexes, a generalization of regular expressions that facilitates combined syntactic and semantic reasoning about textual data. We also propose a novel learning algorithm that can synthesize semantic regexes from a small number of positive and negative examples. Our proposed learning algorithm uses a combination of neural sketch generation and compositional type-directed synthesis for fast and effective generalization from a small number of examples. We have implemented these ideas in a new tool called Smore and evaluated it on representative data extraction tasks involving several textual datasets. Our evaluation shows that semantic regexes can better support complex data extraction tasks than standard regular expressions and that our learning algorithm significantly outperforms existing tools, including state-of-the-art neural networks and program synthesis tools. |
|---|---|
| AbstractList | Many data extraction tasks of practical relevance require not only syntactic pattern matching but also semantic reasoning about the content of the underlying text. While regular expressions are very well suited for tasks that require only syntactic pattern matching, they fall short for data extraction tasks that involve both a syntactic and semantic component. To address this issue, we introduce semantic regexes, a generalization of regular expressions that facilitates combined syntactic and semantic reasoning about textual data. We also propose a novel learning algorithm that can synthesize semantic regexes from a small number of positive and negative examples. Our proposed learning algorithm uses a combination of neural sketch generation and compositional type-directed synthesis for fast and effective generalization from a small number of examples. We have implemented these ideas in a new tool called Smore and evaluated it on representative data extraction tasks involving several textual datasets. Our evaluation shows that semantic regexes can better support complex data extraction tasks than standard regular expressions and that our learning algorithm significantly outperforms existing tools, including state-of-the-art neural networks and program synthesis tools. |
| ArticleNumber | 287 |
| Author | Chen, Qiaochu Demiralp, Çağatay Durrett, Greg Dillig, Işıl Banerjee, Arko |
| Author_xml | – sequence: 1 givenname: Qiaochu orcidid: 0000-0003-4680-5157 surname: Chen fullname: Chen, Qiaochu email: qchen@cs.utexas.edu organization: University of Texas at Austin, Austin, USA – sequence: 2 givenname: Arko orcidid: 0009-0005-2690-6059 surname: Banerjee fullname: Banerjee, Arko email: arko.banerjee@utexas.edu organization: University of Texas at Austin, Austin, USA – sequence: 3 givenname: Çağatay orcidid: 0009-0003-2080-0443 surname: Demiralp fullname: Demiralp, Çağatay email: cagatay@csail.mit.edu organization: Massachusetts Institute of Technology, Cambridge, USA – sequence: 4 givenname: Greg orcidid: 0000-0002-7061-7298 surname: Durrett fullname: Durrett, Greg email: gdurrett@cs.utexas.edu organization: University of Texas at Austin, Austin, USA – sequence: 5 givenname: Işıl orcidid: 0000-0001-8006-1230 surname: Dillig fullname: Dillig, Işıl email: isil@cs.utexas.edu organization: University of Texas at Austin, Austin, USA |
| BookMark | eNptj81Lw0AQRxepYK3Fu6fcPEWzO5tscpRaP6AgWD2Hye5EV_JRdlex_70JrSLiaQbe4wfvmE26viPGTnlywblMLyETIs_ggE2FVGnMpeCTX_8Rm3v_liQJL0DmUExZcY0Bo-VncKiD7bvow2K0pha7YHX0SC_vDbqBbxx5P_L1tguv5K0_YYc1Np7m-ztjzzfLp8VdvHq4vV9crWIUSoVYKNJQm1qD0YYUVlUFvJKpBkqLRKYV5ilyVEVW56qiHMEYDgQ1CJRABmbsfLerXe-9o7rcONui25Y8Kcfoch89mPEfU9uAY9VQZ5t__LOdj7r9Gf2GX-tlYpw |
| CitedBy_id | crossref_primary_10_1007_s10115_024_02232_1 crossref_primary_10_1145_3709677 crossref_primary_10_1007_s41060_024_00612_y crossref_primary_10_1145_3656418 crossref_primary_10_1016_j_softx_2025_102072 crossref_primary_10_1145_3632858 crossref_primary_10_1145_3729300 |
| Cites_doi | 10.3115/v1/P14-1037 10.1016/j.websem.2009.07.002 10.1145/1926385.1926423 10.3115/v1/P15-1142 10.1145/3485477 10.1145/3385412.3386027 10.1145/3394486.3403153 10.1016/0890-5401(87)90052-6 10.1145/256167.256195 10.1145/3183713.3183729 10.1007/978-3-031-25803-9_1 10.5281/zenodo.8144182 10.18653/v1/2021.emnlp-main.747 10.1145/2993236.2993244 10.1145/2837614.2837629 10.1145/2594291.2594333 10.1145/2814270.2814310 10.1145/3510003.3510203 10.1145/3485489 10.1016/S0019-9958(78)90562-4 10.18653/v1/2022.naacl-main.396 10.1145/3453483.3454047 10.1145/3192366.3192382 10.5555/2832249.2832359 10.1007/978-3-031-04083-2_11 10.1145/3519939.3523722 10.1145/3318464.3380608 10.1145/3062341.3062351 10.1023/A:1010822518073 10.1145/2737924.2737977 10.18653/v1/2021.findings-emnlp.146 10.1145/3385412.3385988 10.18653/v1/N16-1181 10.1109/CVPR.2016.12 10.1145/2908080.2908093 10.5555/3495724.3496139 10.1145/2737924.2738007 10.1145/3485535 |
| ContentType | Journal Article |
| Copyright | Owner/Author |
| Copyright_xml | – notice: Owner/Author |
| DBID | AAYXX CITATION |
| DOI | 10.1145/3622863 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2475-1421 |
| EndPage | 1877 |
| ExternalDocumentID | 10_1145_3622863 3622863 |
| GrantInformation_xml | – fundername: NSF (National Science Foundation) grantid: 1918889,1762299 funderid: https://doi.org/10.13039/100000001 |
| GroupedDBID | AAKMM AAYFX ACM ADPZR AIKLT ALMA_UNASSIGNED_HOLDINGS GUFHI LHSKQ M~E OK1 ROL AAYXX AEFXT AEJOY AKRVB CITATION |
| ID | FETCH-LOGICAL-a277t-27ec3fdfc3dcde7abbb31b45c3e59045ba85a1a796f87be8a3dd13e3f32a43ed3 |
| ISICitedReferencesCount | 9 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001087279100067&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2475-1421 |
| IngestDate | Tue Nov 18 21:33:43 EST 2025 Sun Nov 09 14:46:52 EST 2025 Fri Feb 21 01:29:13 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | OOPSLA2 |
| Keywords | Regular Expression Program Synthesis |
| Language | English |
| License | This work is licensed under a Creative Commons Attribution 4.0 International License. |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-a277t-27ec3fdfc3dcde7abbb31b45c3e59045ba85a1a796f87be8a3dd13e3f32a43ed3 |
| ORCID | 0000-0003-4680-5157 0000-0002-7061-7298 0009-0005-2690-6059 0000-0001-8006-1230 0009-0003-2080-0443 |
| OpenAccessLink | https://dl.acm.org/doi/10.1145/3622863 |
| PageCount | 30 |
| ParticipantIDs | crossref_primary_10_1145_3622863 crossref_citationtrail_10_1145_3622863 acm_primary_3622863 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-10-16 |
| PublicationDateYYYYMMDD | 2023-10-16 |
| PublicationDate_xml | – month: 10 year: 2023 text: 2023-10-16 day: 16 |
| PublicationDecade | 2020 |
| PublicationPlace | New York, NY, USA |
| PublicationPlace_xml | – name: New York, NY, USA |
| PublicationTitle | Proceedings of ACM on programming languages |
| PublicationTitleAbbrev | ACM PACMPL |
| PublicationYear | 2023 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| References | E Mark Gold. 1978. Complexity of automaton identification from given data. Information and Control, 37, 3 (1978), 302 – 320. Panupong Pasupat and Percy Liang. 2014. Zero-shot Entity Extraction from Web Pages. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland. 391–401. https://doi.org/10.3115/v1/P14-1037 10.3115/v1/P14-1037 Guoqiang Zhang, Yuanchao Xu, Xipeng Shen, and Işıl Dillig. 2021. UDF to SQL Translation through Compositional Lazy Inductive Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 112, oct, 26 pages. https://doi.org/10.1145/3485489 10.1145/3485489 Richard Shin and Benjamin Van Durme. 2022. Few-Shot Semantic Parsing with Language Models Trained on Code. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States. 5417–5425. https://doi.org/10.18653/v1/2022.naacl-main.396 10.18653/v1/2022.naacl-main.396 Vu Le and Sumit Gulwani. 2014. FlashExtract: A Framework for Data Extraction by Examples. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, 542–553. isbn:978-1-4503-2784-8 https://doi.org/10.1145/2594291.2594333 10.1145/2594291.2594333 Laura Firoiu, Tim Oates, and Paul R. Cohen. 1998. Learning Regular Languages from Positive Evidence. In Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. 350–355. Suguman Bansal, Giuseppe De Giacomo, Antonio Di Stasio, Yong Li, Moshe Y. Vardi, and Shufang Zhu. 2023. Compositional Safety LTL Synthesis. In Verified Software. Theories, Tools and Experiments.: 14th International Conference, VSTTE 2022, Trento, Italy, October 17–18, 2022, Revised Selected Papers. Springer-Verlag, Berlin, Heidelberg. 1–19. isbn:978-3-031-25802-2 https://doi.org/10.1007/978-3-031-25803-9_1 10.1007/978-3-031-25803-9_1 Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam M. Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Benton C. Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier García, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Díaz, Orhan Firat, Michele Catasta, Jason Wei, Kathleen S. Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. ArXiv, abs/2204.02311 (2022). Jacob Austin, Augustus Odena, Maxwell I. Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. CoRR, abs/2108.07732 (2021), arXiv:2108.07732. arxiv:2108.07732 R. Alquezar and A. Sanfeliu. 1994. Incremental Grammatical Inference From Positive And Negative Data Using Unbiased Finite State Automata. In In Proceedings of the ACL’02 Workshop on Unsupervised Lexical Acquisition. 291–300. Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-Based Synthesis of Table Consolidation and Transformation Tasks from Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). Association for Computing Machinery, New York, NY, USA. 422–436. isbn:9781450349888 https://doi.org/10.1145/3062341.3062351 10.1145/3062341.3062351 Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Neural Module Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 39–48. https://doi.org/10.1109/CVPR.2016.12 10.1109/CVPR.2016.12 R. L. Rivest and R. E. Schapire. 1989. Inference of Finite Automata Using Homing Sequences. In Proceedings of the Twenty-first Annual ACM Symposium on Theory of Computing (STOC ’89). ACM, 411–420. Naman Jain, Skanda Vaidyanath, Arun Iyer, Nagarajan Natarajan, Suresh Parthasarathy, Sriram Rajamani, and Rahul Sharma. 2022. Jigsaw: Large Language Models Meet Program Synthesis. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). Association for Computing Machinery, New York, NY, USA. 1219–1231. isbn:9781450392211 https://doi.org/10.1145/3510003.3510203 10.1145/3510003.3510203 Kangjing Huang, Xiaokang Qiu, Peiyuan Shen, and Yanjun Wang. 2020. Reconciling Enumerative and Deductive Program Synthesis. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 1159–1174. isbn:9781450376136 https://doi.org/10.1145/3385412.3386027 10.1145/3385412.3386027 Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig. 2020. Multi-Modal Synthesis of Regular Expressions. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 487–502. isbn:9781450376136 https://doi.org/10.1145/3385412.3385988 10.1145/3385412.3385988 Rajesh Parekh and Vasant Honavar. 1996. An incremental interactive algorithm for regular grammar inference. In Grammatical Interference: Learning Syntax from Sentences, Laurent Miclet and Colin de la Higuera (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 238–249. isbn:978-3-540-70678-6 Panupong Pasupat and Percy Liang. 2015. Compositional Semantic Parsing on Semi-Structured Tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China. 1470–1480. https://doi.org/10.3115/v1/P15-1142 10.3115/v1/P15-1142 Alexander L. Gaunt, Marc Brockschmidt, Nate Kushman, and Daniel Tarlow. 2017. Differentiable Programs with Neural Libraries. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML’17). JMLR.org, 1213–1222. Kia Rahmani, Mohammad Raza, Sumit Gulwani, Vu Le, Daniel Morris, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2021. Multi-Modal Program Inference: A Marriage of Pre-Trained Language Models and Component-Based Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 158, oct, 29 pages. https://doi.org/10.1145/3485535 10.1145/3485535 Alexander Dunn, John Dagdelen, Nicholas Walker, Sanghoon Lee, Andrew S. Rosen, Gerbrand Ceder, Kristin Persson, and Anubhav Jain. 2022. Structured information extraction from complex scientific text with fine-tuned large language models. arXiv, 2212.05238 (2022). John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-Output Examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 229–239. isbn:9781450334686 https://doi.org/10.1145/2737924.2737977 10.1145/2737924.2737977 Dexter Kozen. 1997. Kleene Algebra with Tests. ACM Trans. Program. Lang. Syst., 19, 3 (1997), may, 427–443. issn:0164-0925 https://doi.org/10.1145/256167.256195 10.1145/256167.256195 Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. CoRR, abs/2107.03374 (2021), arXiv:2107.03374. arxiv:2107.03374 Bill Yuchen Lin, Ying Sheng, Nguyen Vo, and Sandeep Tata. 2020. FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’20). Association for Computing Machinery, New York, NY, USA. 1092–1102. isbn:9781450379984 https://doi.org/10.1145/3394486.3403153 10.1145/3394486.3403153 Gabriel Poesia, Alex Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. 2022. Synchromesh: Reliable Code Generation from Pre-trained Language Models. In International Conference on Learning Representations. https://openreview.net/forum?id=KmtVD97J43e Osbert Bastani, Jeevana Priya Inala, and Armando Solar-Lezama. 2022. Interpretable, Verifiable, and Robust Reinforcement Learning via Program Synthesis. Springer International Publishing, Cham. 207–228. isbn:978-3-031-04083-2 https://doi.org/10.1007/978-3-031-04083-2_11 10.1007/978-3-031-04083-2_11 OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt Accessed on March 16, 2023 Mina Lee, Sunbeom So, and Hakjoo Oh. 20 Gaunt Alexander L. (e_1_2_1_22_1) 2017; 70 e_1_2_1_41_1 e_1_2_1_24_1 e_1_2_1_45_1 e_1_2_1_43_1 e_1_2_1_28_1 e_1_2_1_49_1 e_1_2_1_47_1 Alquezar R. (e_1_2_1_1_1) Huang Jiani (e_1_2_1_26_1) 2020; 4506 Austin Jacob (e_1_2_1_5_1) 2021 e_1_2_1_31_1 e_1_2_1_54_1 e_1_2_1_8_1 e_1_2_1_6_1 e_1_2_1_12_1 Nijkamp Erik (e_1_2_1_35_1) 2023 e_1_2_1_50_1 e_1_2_1_4_1 e_1_2_1_10_1 e_1_2_1_33_1 e_1_2_1_52_1 e_1_2_1_2_1 Parekh Rajesh (e_1_2_1_38_1) e_1_2_1_39_1 Introducing AI. (e_1_2_1_36_1) 2023 e_1_2_1_37_1 e_1_2_1_18_1 Poesia Gabriel (e_1_2_1_42_1) 2022 Rivest R. L. (e_1_2_1_48_1) Cheng Zhoujun (e_1_2_1_14_1) 2023 e_1_2_1_40_1 e_1_2_1_23_1 e_1_2_1_46_1 e_1_2_1_21_1 Morris James Hiram (e_1_2_1_34_1) e_1_2_1_44_1 e_1_2_1_27_1 Dunn Alexander (e_1_2_1_16_1) 2022 e_1_2_1_25_1 Zhou Shuyan (e_1_2_1_56_1) 2023 Firoiu Laura (e_1_2_1_20_1) e_1_2_1_29_1 e_1_2_1_7_1 e_1_2_1_30_1 e_1_2_1_55_1 Zhuo Terry Yue (e_1_2_1_57_1) 2023 e_1_2_1_3_1 e_1_2_1_13_1 e_1_2_1_11_1 e_1_2_1_32_1 e_1_2_1_53_1 e_1_2_1_17_1 e_1_2_1_15_1 Brown Tom (e_1_2_1_9_1) 1877 e_1_2_1_19_1 Valkov Lazar (e_1_2_1_51_1) 2018 |
| References_xml | – reference: Kia Rahmani, Mohammad Raza, Sumit Gulwani, Vu Le, Daniel Morris, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2021. Multi-Modal Program Inference: A Marriage of Pre-Trained Language Models and Component-Based Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 158, oct, 29 pages. https://doi.org/10.1145/3485535 10.1145/3485535 – reference: R. Alquezar and A. Sanfeliu. 1994. Incremental Grammatical Inference From Positive And Negative Data Using Unbiased Finite State Automata. In In Proceedings of the ACL’02 Workshop on Unsupervised Lexical Acquisition. 291–300. – reference: Shuyan Zhou, Uri Alon, Frank F. Xu, Zhengbao Jiang, and Graham Neubig. 2023. DocPrompting: Generating Code by Retrieving the Docs. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=ZTCxT2t2Ru – reference: Suguman Bansal, Giuseppe De Giacomo, Antonio Di Stasio, Yong Li, Moshe Y. Vardi, and Shufang Zhu. 2023. Compositional Safety LTL Synthesis. In Verified Software. Theories, Tools and Experiments.: 14th International Conference, VSTTE 2022, Trento, Italy, October 17–18, 2022, Revised Selected Papers. Springer-Verlag, Berlin, Heidelberg. 1–19. isbn:978-3-031-25802-2 https://doi.org/10.1007/978-3-031-25803-9_1 10.1007/978-3-031-25803-9_1 – reference: Ameesh Shah, Eric Zhan, Jennifer J. Sun, Abhinav Verma, Yisong Yue, and Swarat Chaudhuri. 2020. Learning Differentiable Programs with Admissible Neural Heuristics. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS’20). Curran Associates Inc., Red Hook, NY, USA. Article 415, 13 pages. isbn:9781713829546 – reference: Richard Shin and Benjamin Van Durme. 2022. Few-Shot Semantic Parsing with Language Models Trained on Code. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, United States. 5417–5425. https://doi.org/10.18653/v1/2022.naacl-main.396 10.18653/v1/2022.naacl-main.396 – reference: Osbert Bastani, Jeevana Priya Inala, and Armando Solar-Lezama. 2022. Interpretable, Verifiable, and Robust Reinforcement Learning via Program Synthesis. Springer International Publishing, Cham. 207–228. isbn:978-3-031-04083-2 https://doi.org/10.1007/978-3-031-04083-2_11 10.1007/978-3-031-04083-2_11 – reference: Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program Synthesis Using Conflict-Driven Learning. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA. 420–435. isbn:9781450356985 https://doi.org/10.1145/3192366.3192382 10.1145/3192366.3192382 – reference: Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=iaYcJKpY2B_ – reference: Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). Association for Computing Machinery, New York, NY, USA. 522–538. isbn:9781450342612 https://doi.org/10.1145/2908080.2908093 10.1145/2908080.2908093 – reference: Zhoujun Cheng, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong, Dragomir Radev, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, and Tao Yu. 2023. Binding Language Models in Symbolic Languages. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=lH1PV42cbF – reference: John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-Output Examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 229–239. isbn:9781450334686 https://doi.org/10.1145/2737924.2737977 10.1145/2737924.2737977 – reference: Mohammad Raza and Sumit Gulwani. 2020. Web Data Extraction Using Hybrid Program Synthesis: A Combination of Top-down and Bottom-up Inference. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA. 1967–1978. isbn:9781450367356 https://doi.org/10.1145/3318464.3380608 10.1145/3318464.3380608 – reference: Christian Bizer, Jens Lehmann, Georgi Kobilarov, Sören Auer, Christian Becker, Richard Cyganiak, and Sebastian Hellmann. 2009. DBpedia - A crystallization point for the Web of Data. Journal of Web Semantics, 7, 3 (2009), 154–165. issn:1570-8268 https://doi.org/10.1016/j.websem.2009.07.002 The Web of Data 10.1016/j.websem.2009.07.002 – reference: Rajesh Parekh and Vasant Honavar. 2001. Learning DFA from Simple Examples. Machine Learning, 44, 1 (2001), 01 Jul, 9–35. issn:1573-0565 https://doi.org/10.1023/A:1010822518073 10.1023/A:1010822518073 – reference: Guoqiang Zhang, Yuanchao Xu, Xipeng Shen, and Işıl Dillig. 2021. UDF to SQL Translation through Compositional Lazy Inductive Synthesis. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 112, oct, 26 pages. https://doi.org/10.1145/3485489 10.1145/3485489 – reference: Dana Angluin. 1987. Learning Regular Sets from Queries and Counterexamples. Inf. Comput., 75, 2 (1987), 87–106. – reference: Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Learning to Compose Neural Networks for Question Answering. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California. 1545–1554. https://doi.org/10.18653/v1/N16-1181 10.18653/v1/N16-1181 – reference: Qiaochu Chen, Aaron Lamoreaux, Xinyu Wang, Greg Durrett, Osbert Bastani, and Isil Dillig. 2021. Web Question Answering with Neurosymbolic Program Synthesis. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2021). Association for Computing Machinery, New York, NY, USA. 328–343. isbn:9781450383912 https://doi.org/10.1145/3453483.3454047 10.1145/3453483.3454047 – reference: Vu Le and Sumit Gulwani. 2014. FlashExtract: A Framework for Data Extraction by Examples. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, 542–553. isbn:978-1-4503-2784-8 https://doi.org/10.1145/2594291.2594333 10.1145/2594291.2594333 – reference: Xi Ye, Qiaochu Chen, Isil Dillig, and Greg Durrett. 2021. Optimal Neural Program Synthesis from Multimodal Specifications. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, Punta Cana, Dominican Republic. 1691–1704. https://doi.org/10.18653/v1/2021.findings-emnlp.146 10.18653/v1/2021.findings-emnlp.146 – reference: Michael Greenberg, Ryan Beckett, and Eric Campbell. 2022. Kleene Algebra modulo Theories: A Framework for Concrete KATs. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2022). Association for Computing Machinery, New York, NY, USA. 594–608. isbn:9781450392655 https://doi.org/10.1145/3519939.3523722 10.1145/3519939.3523722 – reference: Alexander L. Gaunt, Marc Brockschmidt, Nate Kushman, and Daniel Tarlow. 2017. Differentiable Programs with Neural Libraries. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (ICML’17). JMLR.org, 1213–1222. – reference: Mohammad Raza, Sumit Gulwani, and Natasa Milic-Frayling. 2015. Compositional Program Synthesis from Natural Language and Examples. In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI’15). AAAI Press, 792–800. isbn:9781577357384 – reference: Dexter Kozen. 1997. Kleene Algebra with Tests. ACM Trans. Program. Lang. Syst., 19, 3 (1997), may, 427–443. issn:0164-0925 https://doi.org/10.1145/256167.256195 10.1145/256167.256195 – reference: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. CoRR, abs/2107.03374 (2021), arXiv:2107.03374. arxiv:2107.03374 – reference: Rajesh Parekh and Vasant Honavar. 1996. An incremental interactive algorithm for regular grammar inference. In Grammatical Interference: Learning Syntax from Sentences, Laurent Miclet and Colin de la Higuera (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 238–249. isbn:978-3-540-70678-6 – reference: Panupong Pasupat and Percy Liang. 2014. Zero-shot Entity Extraction from Web Pages. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland. 391–401. https://doi.org/10.3115/v1/P14-1037 10.3115/v1/P14-1037 – reference: Jiani Huang, Calvin Smith, Osbert Bastani, Rishabh Singh, Aws Albarghouthi, and Mayur Naik. 2020. Generating Programmatic Referring Expressions via Program Synthesis. In Proceedings of the 37th International Conference on Machine Learning, Hal Daumé III and Aarti Singh (Eds.) (Proceedings of Machine Learning Research, Vol. 119). PMLR, 4495–4506. https://proceedings.mlr.press/v119/huang20h.html – reference: Terry Yue Zhuo, Zhuang Li, Yujin Huang, Fatemeh Shiri, Weiqing Wang, Gholamreza Haffari, and Yuan-Fang Li. 2023. On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex. arXiv, 2301.12868 (2023). – reference: Jacob Austin, Augustus Odena, Maxwell I. Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. CoRR, abs/2108.07732 (2021), arXiv:2108.07732. arxiv:2108.07732 – reference: Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, and Christopher Ré. 2018. Fonduer: Knowledge Base Construction from Richly Formatted Data. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD ’18). Association for Computing Machinery, New York, NY, USA. 1301–1316. isbn:9781450347037 https://doi.org/10.1145/3183713.3183729 10.1145/3183713.3183729 – reference: Mina Lee, Sunbeom So, and Hakjoo Oh. 2016. Synthesizing Regular Expressions from Examples for Introductory Automata Assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2016). Association for Computing Machinery, New York, NY, USA. 70–80. isbn:9781450344463 https://doi.org/10.1145/2993236.2993244 10.1145/2993236.2993244 – reference: R. L. Rivest and R. E. Schapire. 1989. Inference of Finite Automata Using Homing Sequences. In Proceedings of the Twenty-first Annual ACM Symposium on Theory of Computing (STOC ’89). ACM, 411–420. – reference: Lazar Valkov, Dipak Chaudhari, Akash Srivastava, Charles Sutton, and Swarat Chaudhuri. 2018. HOUDINI: Lifelong Learning as Program Synthesis. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). 31, Curran Associates, Inc.. https://proceedings.neurips.cc/paper/2018/file/edc27f139c3b4e4bb29d1cdbc45663f9-Paper.pdf – reference: Gabriel Poesia, Alex Polozov, Vu Le, Ashish Tiwari, Gustavo Soares, Christopher Meek, and Sumit Gulwani. 2022. Synchromesh: Reliable Code Generation from Pre-trained Language Models. In International Conference on Learning Representations. https://openreview.net/forum?id=KmtVD97J43e – reference: Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig. 2020. Multi-Modal Synthesis of Regular Expressions. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 487–502. isbn:9781450376136 https://doi.org/10.1145/3385412.3385988 10.1145/3385412.3385988 – reference: Jonathan Frankle, Peter-Michael Osera, David Walker, and Steve Zdancewic. 2016. Example-Directed Synthesis: A Type-Theoretic Interpretation. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). Association for Computing Machinery, New York, NY, USA. 802–815. isbn:9781450335492 https://doi.org/10.1145/2837614.2837629 10.1145/2837614.2837629 – reference: Kangjing Huang, Xiaokang Qiu, Peiyuan Shen, and Yanjun Wang. 2020. Reconciling Enumerative and Deductive Program Synthesis. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 1159–1174. isbn:9781450376136 https://doi.org/10.1145/3385412.3386027 10.1145/3385412.3386027 – reference: Alexander Dunn, John Dagdelen, Nicholas Walker, Sanghoon Lee, Andrew S. Rosen, Gerbrand Ceder, Kristin Persson, and Anubhav Jain. 2022. Structured information extraction from complex scientific text with fine-tuned large language models. arXiv, 2212.05238 (2022). – reference: Bill Yuchen Lin, Ying Sheng, Nguyen Vo, and Sandeep Tata. 2020. FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’20). Association for Computing Machinery, New York, NY, USA. 1092–1102. isbn:9781450379984 https://doi.org/10.1145/3394486.3403153 10.1145/3394486.3403153 – reference: James Hiram Morris. 1968. Lambda-calculus models of programming languages. Ph. D. Dissertation. Massachusetts Institute of Technology. Cambridge. – reference: Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 1877–1901. – reference: Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Neural Module Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 39–48. https://doi.org/10.1109/CVPR.2016.12 10.1109/CVPR.2016.12 – reference: Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam M. Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Benton C. Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier García, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Díaz, Orhan Firat, Michele Catasta, Jason Wei, Kathleen S. Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, and Noah Fiedel. 2022. PaLM: Scaling Language Modeling with Pathways. ArXiv, abs/2204.02311 (2022). – reference: Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-Example-Directed Program Synthesis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 619–630. isbn:9781450334686 https://doi.org/10.1145/2737924.2738007 10.1145/2737924.2738007 – reference: OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt Accessed on March 16, 2023 – reference: Naman Jain, Skanda Vaidyanath, Arun Iyer, Nagarajan Natarajan, Suresh Parthasarathy, Sriram Rajamani, and Rahul Sharma. 2022. Jigsaw: Large Language Models Meet Program Synthesis. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22). Association for Computing Machinery, New York, NY, USA. 1219–1231. isbn:9781450392211 https://doi.org/10.1145/3510003.3510203 10.1145/3510003.3510203 – reference: Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-Based Synthesis of Table Consolidation and Transformation Tasks from Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). Association for Computing Machinery, New York, NY, USA. 422–436. isbn:9781450349888 https://doi.org/10.1145/3062341.3062351 10.1145/3062341.3062351 – reference: Qiaochu Chen, Arko Banerjee, Çağatay Demiralp, Greg Durrett, and Isil Dillig. 2023. Data Extraction via Semantic Regular Expression Synthesis. https://doi.org/10.5281/zenodo.8144182 10.5281/zenodo.8144182 – reference: Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: A Framework for Inductive Program Synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). Association for Computing Machinery, New York, NY, USA. 107–126. isbn:9781450336895 https://doi.org/10.1145/2814270.2814310 10.1145/2814270.2814310 – reference: Chengyue Jiang, Zijian Jin, and Kewei Tu. 2021. Neuralizing Regular Expressions for Slot Filling. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic. 9481–9498. https://doi.org/10.18653/v1/2021.emnlp-main.747 10.18653/v1/2021.emnlp-main.747 – reference: E Mark Gold. 1978. Complexity of automaton identification from given data. Information and Control, 37, 3 (1978), 302 – 320. – reference: Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-Output Examples. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). Association for Computing Machinery, New York, NY, USA. 317–330. isbn:9781450304900 https://doi.org/10.1145/1926385.1926423 10.1145/1926385.1926423 – reference: Laura Firoiu, Tim Oates, and Paul R. Cohen. 1998. Learning Regular Languages from Positive Evidence. In Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. 350–355. – reference: Panupong Pasupat and Percy Liang. 2015. Compositional Semantic Parsing on Semi-Structured Tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China. 1470–1480. https://doi.org/10.3115/v1/P15-1142 10.3115/v1/P15-1142 – reference: Gust Verbruggen, Vu Le, and Sumit Gulwani. 2021. Semantic Programming by Example with Pre-Trained Models. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 100, oct, 25 pages. https://doi.org/10.1145/3485477 10.1145/3485477 – volume: 4506 volume-title: Proceedings of the 37th International Conference on Machine Learning, Hal Daumé III and Aarti Singh (Eds.) (Proceedings of Machine Learning Research year: 2020 ident: e_1_2_1_26_1 – ident: e_1_2_1_40_1 doi: 10.3115/v1/P14-1037 – volume-title: Grammatical Interference: Learning Syntax from Sentences, Laurent Miclet and Colin de la Higuera (Eds.) ident: e_1_2_1_38_1 – ident: e_1_2_1_8_1 doi: 10.1016/j.websem.2009.07.002 – volume-title: https://openai.com/blog/chatgpt Accessed on year: 2023 ident: e_1_2_1_36_1 – ident: e_1_2_1_25_1 doi: 10.1145/1926385.1926423 – ident: e_1_2_1_41_1 doi: 10.3115/v1/P15-1142 – volume-title: Program Synthesis with Large Language Models. CoRR, abs/2108.07732 year: 2021 ident: e_1_2_1_5_1 – volume-title: Advances in Neural Information Processing Systems year: 1877 ident: e_1_2_1_9_1 – ident: e_1_2_1_52_1 doi: 10.1145/3485477 – volume-title: International Conference on Learning Representations. https://openreview.net/forum?id=KmtVD97J43e year: 2022 ident: e_1_2_1_42_1 – ident: e_1_2_1_27_1 doi: 10.1145/3385412.3386027 – ident: e_1_2_1_33_1 doi: 10.1145/3394486.3403153 – ident: e_1_2_1_4_1 doi: 10.1016/0890-5401(87)90052-6 – ident: e_1_2_1_30_1 doi: 10.1145/256167.256195 – ident: e_1_2_1_53_1 doi: 10.1145/3183713.3183729 – ident: e_1_2_1_6_1 doi: 10.1007/978-3-031-25803-9_1 – volume-title: The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=ZTCxT2t2Ru year: 2023 ident: e_1_2_1_56_1 – ident: e_1_2_1_11_1 doi: 10.5281/zenodo.8144182 – ident: e_1_2_1_29_1 doi: 10.18653/v1/2021.emnlp-main.747 – ident: e_1_2_1_32_1 doi: 10.1145/2993236.2993244 – ident: e_1_2_1_21_1 doi: 10.1145/2837614.2837629 – ident: e_1_2_1_31_1 doi: 10.1145/2594291.2594333 – ident: e_1_2_1_10_1 – volume-title: HOUDINI: Lifelong Learning as Program Synthesis. In Advances in Neural Information Processing Systems year: 2018 ident: e_1_2_1_51_1 – ident: e_1_2_1_44_1 doi: 10.1145/2814270.2814310 – ident: e_1_2_1_28_1 doi: 10.1145/3510003.3510203 – ident: e_1_2_1_55_1 doi: 10.1145/3485489 – ident: e_1_2_1_23_1 doi: 10.1016/S0019-9958(78)90562-4 – ident: e_1_2_1_50_1 doi: 10.18653/v1/2022.naacl-main.396 – ident: e_1_2_1_12_1 doi: 10.1145/3453483.3454047 – ident: e_1_2_1_17_1 doi: 10.1145/3192366.3192382 – ident: e_1_2_1_47_1 doi: 10.5555/2832249.2832359 – volume-title: Structured information extraction from complex scientific text with fine-tuned large language models. arXiv, 2212.05238 year: 2022 ident: e_1_2_1_16_1 – ident: e_1_2_1_7_1 doi: 10.1007/978-3-031-04083-2_11 – ident: e_1_2_1_24_1 doi: 10.1145/3519939.3523722 – ident: e_1_2_1_46_1 doi: 10.1145/3318464.3380608 – volume-title: Binding Language Models in Symbolic Languages. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=lH1PV42cbF year: 2023 ident: e_1_2_1_14_1 – ident: e_1_2_1_18_1 doi: 10.1145/3062341.3062351 – ident: e_1_2_1_39_1 doi: 10.1023/A:1010822518073 – ident: e_1_2_1_15_1 – ident: e_1_2_1_19_1 doi: 10.1145/2737924.2737977 – ident: e_1_2_1_54_1 doi: 10.18653/v1/2021.findings-emnlp.146 – ident: e_1_2_1_13_1 doi: 10.1145/3385412.3385988 – volume-title: Incremental Grammatical Inference From Positive And Negative Data Using Unbiased Finite State Automata. In In Proceedings of the ACL’02 Workshop on Unsupervised Lexical Acquisition. 291–300 ident: e_1_2_1_1_1 – ident: e_1_2_1_2_1 doi: 10.18653/v1/N16-1181 – ident: e_1_2_1_3_1 doi: 10.1109/CVPR.2016.12 – ident: e_1_2_1_43_1 doi: 10.1145/2908080.2908093 – ident: e_1_2_1_49_1 doi: 10.5555/3495724.3496139 – volume-title: Lambda-calculus models of programming languages. Ph. D. Dissertation ident: e_1_2_1_34_1 – volume-title: CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=iaYcJKpY2B_ year: 2023 ident: e_1_2_1_35_1 – ident: e_1_2_1_37_1 doi: 10.1145/2737924.2738007 – volume-title: On Robustness of Prompt-based Semantic Parsing with Large Pre-trained Language Model: An Empirical Study on Codex. arXiv, 2301.12868 year: 2023 ident: e_1_2_1_57_1 – volume-title: Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. 350–355 ident: e_1_2_1_20_1 – volume: 70 volume-title: Proceedings of the 34th International Conference on Machine Learning - year: 2017 ident: e_1_2_1_22_1 – ident: e_1_2_1_45_1 doi: 10.1145/3485535 – volume-title: Proceedings of the Twenty-first Annual ACM Symposium on Theory of Computing (STOC ’89) ident: e_1_2_1_48_1 |
| SSID | ssj0001934839 |
| Score | 2.3558238 |
| Snippet | Many data extraction tasks of practical relevance require not only syntactic pattern matching but also semantic reasoning about the content of the underlying... |
| SourceID | crossref acm |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 1848 |
| SubjectTerms | Domain specific languages Programming by example Software and its engineering |
| SubjectTermsDisplay | Software and its engineering -- Domain specific languages Software and its engineering -- Programming by example |
| Title | Data Extraction via Semantic Regular Expression Synthesis |
| URI | https://dl.acm.org/doi/10.1145/3622863 |
| Volume | 7 |
| WOSCitedRecordID | wos001087279100067&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2475-1421 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001934839 issn: 2475-1421 databaseCode: M~E dateStart: 20170101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Pb9MwFLfKxoELsAGiMJAPiAsKUNup7WPVdeKwP0UMabfKcRwWaNMpTaty4UPwifccO27oJjEOXKzIsa32vV-e33t-zw-hN2BUmVimJGKEx9BIHknNwWrNCNNgcLBEZ3WxCX56Ki4u5LjT-d3kwqymvCjEei2v_iuroQ-YbVNn_4HdYVHogGdgOrTAdmjvxPhDVal3o3VV-irgq1yBRJgBBeuY-W913Olo7QNg4ev-WYASuMgXbT11HPa1OtRjMDyxpwo-mGtm3QuNozPo5EOf5_E5V3N9udy4SAtTfnfhPoPyxzxozmaWl2pal8ezp_WCK6vYyiP4-cHNf7gMgcQ2mabtoiB1sJvLoHSSjDDAQY-5VOj35pY-L4p5C3FnZ-MvxwPSEq5gjIrWRt0TrgDMzU2A2fsyYGcmwgvPP67Z3tr-QlCiS9GOJ37iPbQL8JU2TPDkV8tvJykTdYG68A9cLrad-8HPtaqOnrVUnZbOcv4YPfTGBh44kOyhjin20aOmkAf2cv0JkhYzeIMZDJjBDWawxwzeYAYHzDxFX49G58NPka-pESnCeRURbjTN0kzTVKeGqyRJaC9hsabwyYJ6nygRq57isp8JnhihaJpaR3lGiWLUpPQZ2inmhXmOcCb6xh7cSxIbJvsc3rNEgk7Upx8NrNZF-0CEyZW7NaUhaxe9bYgy0f4aelsNZTrZon8X4TCwWWNryIu_D3mJHmwQeYB2qnJpXqH7elXli_J1zdprDT9yKA |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+Extraction+via+Semantic+Regular+Expression+Synthesis&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Chen%2C+Qiaochu&rft.au=Banerjee%2C+Arko&rft.au=Demiralp%2C+%C3%87a%C4%9Fatay&rft.au=Durrett%2C+Greg&rft.date=2023-10-16&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=7&rft.issue=OOPSLA2&rft.spage=1848&rft.epage=1877&rft_id=info:doi/10.1145%2F3622863&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3622863 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon |