SQLizer: query synthesis from natural language
This paper presents a new technique for automatically synthesizing SQL queries from natural language (NL). At the core of our technique is a new NL-based program synthesis methodology that combines semantic parsing techniques from the NLP community with type-directed program synthesis and automated...
Uloženo v:
| Vydáno v: | Proceedings of ACM on programming languages Ročník 1; číslo OOPSLA; s. 1 - 26 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
01.10.2017
|
| ISSN: | 2475-1421, 2475-1421 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | This paper presents a new technique for automatically synthesizing SQL queries from natural language (NL). At the core of our technique is a new NL-based program synthesis methodology that combines semantic parsing techniques from the NLP community with type-directed program synthesis and automated program repair. Starting with a program sketch obtained using standard parsing techniques, our approach involves an iterative refinement loop that alternates between probabilistic type inhabitation and automated sketch repair. We use the proposed idea to build an end-to-end system called SQLIZER that can synthesize SQL queries from natural language. Our method is fully automated, works for any database without requiring additional customization, and does not require users to know the underlying database schema. We evaluate our approach on over 450 natural language queries concerning three different databases, namely MAS, IMDB, and YELP. Our experiments show that the desired query is ranked within the top 5 candidates in close to 90% of the cases and that SQLIZER outperforms NALIR, a state-of-the-art tool that won a best paper award at VLDB'14. |
|---|---|
| AbstractList | This paper presents a new technique for automatically synthesizing SQL queries from natural language (NL). At the core of our technique is a new NL-based program synthesis methodology that combines semantic parsing techniques from the NLP community with type-directed program synthesis and automated program repair. Starting with a program sketch obtained using standard parsing techniques, our approach involves an iterative refinement loop that alternates between probabilistic type inhabitation and automated sketch repair. We use the proposed idea to build an end-to-end system called SQLIZER that can synthesize SQL queries from natural language. Our method is fully automated, works for any database without requiring additional customization, and does not require users to know the underlying database schema. We evaluate our approach on over 450 natural language queries concerning three different databases, namely MAS, IMDB, and YELP. Our experiments show that the desired query is ranked within the top 5 candidates in close to 90% of the cases and that SQLIZER outperforms NALIR, a state-of-the-art tool that won a best paper award at VLDB'14. |
| Author | Yaghmazadeh, Navid Dillig, Isil Wang, Yuepeng Dillig, Thomas |
| Author_xml | – sequence: 1 givenname: Navid surname: Yaghmazadeh fullname: Yaghmazadeh, Navid organization: University of Texas at Austin, USA – sequence: 2 givenname: Yuepeng surname: Wang fullname: Wang, Yuepeng organization: University of Texas at Austin, USA – sequence: 3 givenname: Isil surname: Dillig fullname: Dillig, Isil organization: University of Texas at Austin, USA – sequence: 4 givenname: Thomas surname: Dillig fullname: Dillig, Thomas organization: University of Texas at Austin, USA |
| BookMark | eNplj0FLwzAYhoNMcM7hX-jNU2a-JmkabzJ0CgUR9Vy-dOmMdKkm6aH-eivuIHp638PDy_OekpnvvSXkHNgKQMhLDpyXpToi81woSUHkMPvVT8gyxjfGGGguSq7nZPX0WLlPG66yj8GGMYujT682upi1od9nHtMQsMs69LsBd_aMHLfYRbs85IK83N48r-9o9bC5X19XtMllmajSOjdSl6pgSqMQhTbWFApzC61uueRGbRkUhgnDCikarbbcqklLop5kgS8I_dltQh9jsG3duITJ9T4FdF0NrP7-Wx_-TvzFH_49uD2G8R_5BZYgU48 |
| CitedBy_id | crossref_primary_10_1145_3360594 crossref_primary_10_1007_s00778_023_00809_w crossref_primary_10_14778_3749646_3749685 crossref_primary_10_1145_3428269 crossref_primary_10_1145_3632860 crossref_primary_10_1145_3654930 crossref_primary_10_1162_tacl_a_00654 crossref_primary_10_1162_tacl_a_00339 crossref_primary_10_1007_s13748_021_00271_1 crossref_primary_10_1016_j_knosys_2023_110264 crossref_primary_10_1145_3639305 crossref_primary_10_1109_TSE_2023_3275380 crossref_primary_10_1145_3527312 crossref_primary_10_33889_IJMEMS_2024_9_4_048 crossref_primary_10_1016_j_artint_2023_103962 crossref_primary_10_3390_e23091174 crossref_primary_10_1109_TKDE_2025_3592032 crossref_primary_10_1145_3430952 crossref_primary_10_3390_e25030513 crossref_primary_10_14778_3407790_3407858 crossref_primary_10_3390_computers14050183 crossref_primary_10_1007_s10515_022_00359_5 crossref_primary_10_1145_3563307 crossref_primary_10_2196_32698 crossref_primary_10_1145_3485535 crossref_primary_10_1145_3591622 crossref_primary_10_1515_comp_2020_0125 crossref_primary_10_1145_3586047 crossref_primary_10_1007_s10115_024_02151_1 crossref_primary_10_1145_3276528 crossref_primary_10_1145_3341698 crossref_primary_10_1109_ACCESS_2019_2931464 crossref_primary_10_1145_3650114 crossref_primary_10_1109_TR_2023_3336330 crossref_primary_10_1145_3290385 crossref_primary_10_1145_3360614 crossref_primary_10_1109_TKDE_2024_3400824 crossref_primary_10_3390_electronics12092093 crossref_primary_10_1109_ACCESS_2023_3308908 crossref_primary_10_1109_ACCESS_2025_3607879 crossref_primary_10_1145_3296979_3192382 crossref_primary_10_1016_j_is_2019_03_002 crossref_primary_10_1109_TII_2019_2952929 crossref_primary_10_1007_s00778_022_00776_8 crossref_primary_10_3233_JIFS_210359 crossref_primary_10_1145_3725271 crossref_primary_10_1109_ACCESS_2022_3147586 crossref_primary_10_7759_s44389_025_05516_x crossref_primary_10_1109_TVCG_2022_3148007 crossref_primary_10_14778_3717755_3717772 crossref_primary_10_3390_app112412116 crossref_primary_10_1007_s10707_023_00494_5 |
| Cites_doi | 10.1145/2491956.2462180 10.1007/3-540-44829-2_8 10.1145/3009837.3009851 10.3115/1693756.1693772 10.1145/1168857.1168907 10.1016/0004-3702(87)90011-7 10.1145/2837614.2837617 10.1145/1101908.1101949 10.1145/2666356.2594333 10.1145/2462456.2464443 10.1145/2858965.2814310 10.1109/ASE.2013.6693082 10.1145/581396.581397 10.1145/1065010.1065045 10.1145/2884781.2884786 10.14778/2735461.2735468 10.1145/2908080.2908093 10.1145/3062341.3062365 10.3115/1117794.1117811 10.1145/1449764.1449767 10.1109/ICSE.2012.6227211 10.18653/v1/D13-1160 10.3115/v1/P14-5010 10.1145/2786805.2786811 10.1145/1282480.1282482 10.1145/2908080.2908088 10.1145/320251.320253 10.3115/1220355.1220376 10.1146/annurev-linguist-030514-125312 10.1145/1559845.1559902 10.1145/1993498.1993550 10.1007/978-3-642-22110-1_40 10.1145/2491956.2462192 10.1145/604131.604140 10.1145/2813885.2737977 10.3115/981863.981871 10.1145/3062341.3062351 10.1145/2588555.2612177 10.3115/1220175.1220290 10.1007/11687238_44 10.1145/1926385.1926423 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION |
| DOI | 10.1145/3133887 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2475-1421 |
| EndPage | 26 |
| ExternalDocumentID | 10_1145_3133887 |
| GroupedDBID | AAKMM AAYFX AAYXX ACM AEFXT AEJOY AIKLT AKRVB ALMA_UNASSIGNED_HOLDINGS CITATION EBS GUFHI LHSKQ M~E OK1 ROL |
| ID | FETCH-LOGICAL-c258t-7992b59876079a4469beb67a2e1f9f353b7d016b04b0654c97d3e70015a924713 |
| ISICitedReferencesCount | 150 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000688014000019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2475-1421 |
| IngestDate | Sat Nov 29 07:49:00 EST 2025 Tue Nov 18 22:38:55 EST 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | OOPSLA |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c258t-7992b59876079a4469beb67a2e1f9f353b7d016b04b0654c97d3e70015a924713 |
| OpenAccessLink | http://dl.acm.org/ft_gateway.cfm?id=3133887&type=pdf |
| PageCount | 26 |
| ParticipantIDs | crossref_citationtrail_10_1145_3133887 crossref_primary_10_1145_3133887 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-10-01 |
| PublicationDateYYYYMMDD | 2017-10-01 |
| PublicationDate_xml | – month: 10 year: 2017 text: 2017-10-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings of ACM on programming languages |
| PublicationYear | 2017 |
| References | Elmasri Ramez (e_1_2_1_10_1) 2011 Androutsopoulos I (e_1_2_1_1_1) Codd E. F. (e_1_2_1_8_1) 1974 e_1_2_1_60_1 e_1_2_1_20_1 e_1_2_1_41_1 e_1_2_1_24_1 e_1_2_1_45_1 e_1_2_1_22_1 e_1_2_1_28_1 e_1_2_1_49_1 e_1_2_1_47_1 Kate Rohit J. (e_1_2_1_26_1) 2005 Zelle John M. (e_1_2_1_58_1) 1996 e_1_2_1_31_1 e_1_2_1_54_1 e_1_2_1_56_1 e_1_2_1_12_1 e_1_2_1_35_1 e_1_2_1_50_1 e_1_2_1_33_1 e_1_2_1_16_1 e_1_2_1_39_1 e_1_2_1_14_1 e_1_2_1_37_1 e_1_2_1_18_1 Popescu Ana-Maria (e_1_2_1_43_1) 2003 e_1_2_1_42_1 e_1_2_1_40_1 e_1_2_1_23_1 e_1_2_1_21_1 e_1_2_1_44_1 e_1_2_1_27_1 Barowy Daniel W. (e_1_2_1_4_1) 2015 e_1_2_1_25_1 e_1_2_1_48_1 Weimer Westley (e_1_2_1_53_1) 2009 e_1_2_1_29_1 Androutsopoulos Ion (e_1_2_1_2_1) 1995 Zelle John M. (e_1_2_1_57_1) 1993 Carpenter Bob (e_1_2_1_6_1) e_1_2_1_7_1 e_1_2_1_30_1 e_1_2_1_55_1 e_1_2_1_5_1 Warren David H. D. (e_1_2_1_52_1) 1982; 8 e_1_2_1_3_1 e_1_2_1_13_1 e_1_2_1_34_1 e_1_2_1_51_1 e_1_2_1_11_1 e_1_2_1_32_1 e_1_2_1_17_1 e_1_2_1_38_1 e_1_2_1_15_1 e_1_2_1_36_1 e_1_2_1_59_1 e_1_2_1_9_1 e_1_2_1_19_1 |
| References_xml | – ident: e_1_2_1_7_1 doi: 10.1145/2491956.2462180 – ident: e_1_2_1_15_1 doi: 10.1007/3-540-44829-2_8 – ident: e_1_2_1_12_1 doi: 10.1145/3009837.3009851 – ident: e_1_2_1_34_1 doi: 10.3115/1693756.1693772 – ident: e_1_2_1_48_1 doi: 10.1145/1168857.1168907 – ident: e_1_2_1_16_1 doi: 10.1016/0004-3702(87)90011-7 – volume-title: Claire Le Goues, and Stephanie Forrest year: 2009 ident: e_1_2_1_53_1 – ident: e_1_2_1_33_1 doi: 10.1145/2837614.2837617 – volume-title: Kautz year: 2003 ident: e_1_2_1_43_1 – volume-title: Mooney year: 1993 ident: e_1_2_1_57_1 – ident: e_1_2_1_21_1 doi: 10.1145/1101908.1101949 – ident: e_1_2_1_27_1 doi: 10.1145/2666356.2594333 – ident: e_1_2_1_28_1 doi: 10.1145/2462456.2464443 – ident: e_1_2_1_41_1 doi: 10.1145/2858965.2814310 – ident: e_1_2_1_45_1 – volume-title: Type-logical semantics ident: e_1_2_1_6_1 – volume: 8 start-page: 3 year: 1982 ident: e_1_2_1_52_1 article-title: An Efficient Easily Adaptable System for Interpreting Natural Language Queries publication-title: American Journal of Computational Linguistics – ident: e_1_2_1_59_1 doi: 10.1109/ASE.2013.6693082 – ident: e_1_2_1_22_1 doi: 10.1145/581396.581397 – ident: e_1_2_1_47_1 doi: 10.1145/1065010.1065045 – ident: e_1_2_1_9_1 doi: 10.1145/2884781.2884786 – ident: e_1_2_1_29_1 doi: 10.14778/2735461.2735468 – ident: e_1_2_1_40_1 doi: 10.1145/2908080.2908093 – volume-title: Navathe year: 2011 ident: e_1_2_1_10_1 – volume-title: An Efficient and Portable Natural Language Query Interface for Relational Databases. Tech report ident: e_1_2_1_1_1 – ident: e_1_2_1_51_1 doi: 10.1145/3062341.3062365 – ident: e_1_2_1_49_1 doi: 10.3115/1117794.1117811 – ident: e_1_2_1_54_1 doi: 10.1145/1449764.1449767 – ident: e_1_2_1_14_1 doi: 10.1109/ICSE.2012.6227211 – ident: e_1_2_1_38_1 – ident: e_1_2_1_5_1 doi: 10.18653/v1/D13-1160 – ident: e_1_2_1_35_1 doi: 10.3115/v1/P14-5010 – ident: e_1_2_1_32_1 doi: 10.1145/2786805.2786811 – volume-title: IFIP Working Conference Data Base Management. 179–200 year: 1974 ident: e_1_2_1_8_1 – ident: e_1_2_1_60_1 doi: 10.1145/1282480.1282482 – ident: e_1_2_1_56_1 doi: 10.1145/2908080.2908088 – ident: e_1_2_1_20_1 doi: 10.1145/320251.320253 – ident: e_1_2_1_42_1 doi: 10.3115/1220355.1220376 – volume-title: Zorn year: 2015 ident: e_1_2_1_4_1 – ident: e_1_2_1_31_1 doi: 10.1146/annurev-linguist-030514-125312 – ident: e_1_2_1_50_1 doi: 10.1145/1559845.1559902 – ident: e_1_2_1_24_1 doi: 10.1145/1993498.1993550 – ident: e_1_2_1_23_1 doi: 10.1007/978-3-642-22110-1_40 – volume-title: Mooney year: 1996 ident: e_1_2_1_58_1 – volume-title: Natural language interfaces to databases - An Introduction. Natural Language Engineering year: 1995 ident: e_1_2_1_2_1 – ident: e_1_2_1_19_1 doi: 10.1145/2491956.2462192 – ident: e_1_2_1_3_1 doi: 10.1145/604131.604140 – ident: e_1_2_1_13_1 doi: 10.1145/2813885.2737977 – volume-title: Yuk Wah Wong, and Raymond J. Mooney year: 2005 ident: e_1_2_1_26_1 – ident: e_1_2_1_37_1 doi: 10.3115/981863.981871 – ident: e_1_2_1_44_1 – ident: e_1_2_1_11_1 doi: 10.1145/3062341.3062351 – ident: e_1_2_1_18_1 doi: 10.1145/2588555.2612177 – ident: e_1_2_1_25_1 doi: 10.3115/1220175.1220290 – ident: e_1_2_1_30_1 doi: 10.1007/11687238_44 – ident: e_1_2_1_39_1 – ident: e_1_2_1_17_1 doi: 10.1145/1926385.1926423 – ident: e_1_2_1_55_1 – ident: e_1_2_1_36_1 |
| SSID | ssj0001934839 |
| Score | 2.5153334 |
| Snippet | This paper presents a new technique for automatically synthesizing SQL queries from natural language (NL). At the core of our technique is a new NL-based... |
| SourceID | crossref |
| SourceType | Enrichment Source Index Database |
| StartPage | 1 |
| Title | SQLizer: query synthesis from natural language |
| Volume | 1 |
| WOSCitedRecordID | wos000688014000019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2475-1421 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001934839 issn: 2475-1421 databaseCode: M~E dateStart: 20170101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NT9swFLc6xmGXscEmujHkA-IWyIddx7tVaBNIBYoACU6VnTgFCUJVCgIO_O285zhuKJMGBy5R5dpV0_f6vvL7vUfImmZaJbroBBEvsKm2yQNlFA9SmadhIeDvpyxRuCf29tKTE9lvte5rLszthSjL9O5Ojt5V1LAGwkbq7BvE7T8UFuA1CB2uIHa4vkrwhwe98wdjpz-CzR_fY1MCiPKw8YjlkthWniCYulLZDE_73p1ZhEd3axcfJjgM1yVWFepTPhQ_VcOzS_WgcnNWWWuPkccqfWVJTm9w1O7QB81Y47Fv7FxPER7T1QZoydUjwMfVyDZntmImeBCxive8Yf6xVtvdhnrt7_cPe92GIY0aHrmi1L-09QzbYiSYZDuf_ayb9oyX89jDionNB-7gB_IxFlwiGnD3sVGekwlL7Rw6_90ryjWe3XRnG7FMIyg5-kI-u2yCdist-EpaplwkC_WkDuoM9xLZcErxm1qVoF4lKKoEdSpBa-F-I8d__xxtbQduUkaQxTydBELKWMNNiE4opIIMX2qjO0LFJipkkfBEixxiex0yjWTiTIo8MYg44ArybxEl38lceVWaZUJV0TGhYmHOcsZUjE-NecYhU2cZT7JYtsl6fc-DzLWRx2kmF4OZH7ZNqN84qjqnzG758f8tP8mnqZKtkLnJ-Mb8IvPZ7eT8erxqZfYEltleog |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SQLizer%3A+query+synthesis+from+natural+language&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Yaghmazadeh%2C+Navid&rft.au=Wang%2C+Yuepeng&rft.au=Dillig%2C+Isil&rft.au=Dillig%2C+Thomas&rft.date=2017-10-01&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=1&rft.issue=OOPSLA&rft.spage=1&rft.epage=26&rft_id=info:doi/10.1145%2F3133887&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3133887 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon |