SQLizer: query synthesis from natural language

This paper presents a new technique for automatically synthesizing SQL queries from natural language (NL). At the core of our technique is a new NL-based program synthesis methodology that combines semantic parsing techniques from the NLP community with type-directed program synthesis and automated...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of ACM on programming languages Vol. 1; no. OOPSLA; pp. 1 - 26
Main Authors: Yaghmazadeh, Navid, Wang, Yuepeng, Dillig, Isil, Dillig, Thomas
Format: Journal Article
Language:English
Published: 01.10.2017
ISSN:2475-1421, 2475-1421
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract This paper presents a new technique for automatically synthesizing SQL queries from natural language (NL). At the core of our technique is a new NL-based program synthesis methodology that combines semantic parsing techniques from the NLP community with type-directed program synthesis and automated program repair. Starting with a program sketch obtained using standard parsing techniques, our approach involves an iterative refinement loop that alternates between probabilistic type inhabitation and automated sketch repair. We use the proposed idea to build an end-to-end system called SQLIZER that can synthesize SQL queries from natural language. Our method is fully automated, works for any database without requiring additional customization, and does not require users to know the underlying database schema. We evaluate our approach on over 450 natural language queries concerning three different databases, namely MAS, IMDB, and YELP. Our experiments show that the desired query is ranked within the top 5 candidates in close to 90% of the cases and that SQLIZER outperforms NALIR, a state-of-the-art tool that won a best paper award at VLDB'14.
AbstractList This paper presents a new technique for automatically synthesizing SQL queries from natural language (NL). At the core of our technique is a new NL-based program synthesis methodology that combines semantic parsing techniques from the NLP community with type-directed program synthesis and automated program repair. Starting with a program sketch obtained using standard parsing techniques, our approach involves an iterative refinement loop that alternates between probabilistic type inhabitation and automated sketch repair. We use the proposed idea to build an end-to-end system called SQLIZER that can synthesize SQL queries from natural language. Our method is fully automated, works for any database without requiring additional customization, and does not require users to know the underlying database schema. We evaluate our approach on over 450 natural language queries concerning three different databases, namely MAS, IMDB, and YELP. Our experiments show that the desired query is ranked within the top 5 candidates in close to 90% of the cases and that SQLIZER outperforms NALIR, a state-of-the-art tool that won a best paper award at VLDB'14.
Author Yaghmazadeh, Navid
Dillig, Isil
Wang, Yuepeng
Dillig, Thomas
Author_xml – sequence: 1
  givenname: Navid
  surname: Yaghmazadeh
  fullname: Yaghmazadeh, Navid
  organization: University of Texas at Austin, USA
– sequence: 2
  givenname: Yuepeng
  surname: Wang
  fullname: Wang, Yuepeng
  organization: University of Texas at Austin, USA
– sequence: 3
  givenname: Isil
  surname: Dillig
  fullname: Dillig, Isil
  organization: University of Texas at Austin, USA
– sequence: 4
  givenname: Thomas
  surname: Dillig
  fullname: Dillig, Thomas
  organization: University of Texas at Austin, USA
BookMark eNplj0FLwzAYhoNMcM7hX-jNU2a-JmkabzJ0CgUR9Vy-dOmMdKkm6aH-eivuIHp638PDy_OekpnvvSXkHNgKQMhLDpyXpToi81woSUHkMPvVT8gyxjfGGGguSq7nZPX0WLlPG66yj8GGMYujT682upi1od9nHtMQsMs69LsBd_aMHLfYRbs85IK83N48r-9o9bC5X19XtMllmajSOjdSl6pgSqMQhTbWFApzC61uueRGbRkUhgnDCikarbbcqklLop5kgS8I_dltQh9jsG3duITJ9T4FdF0NrP7-Wx_-TvzFH_49uD2G8R_5BZYgU48
CitedBy_id crossref_primary_10_1145_3360594
crossref_primary_10_1007_s00778_023_00809_w
crossref_primary_10_14778_3749646_3749685
crossref_primary_10_1145_3428269
crossref_primary_10_1145_3632860
crossref_primary_10_1145_3654930
crossref_primary_10_1162_tacl_a_00654
crossref_primary_10_1162_tacl_a_00339
crossref_primary_10_1007_s13748_021_00271_1
crossref_primary_10_1016_j_knosys_2023_110264
crossref_primary_10_1145_3639305
crossref_primary_10_1109_TSE_2023_3275380
crossref_primary_10_1145_3527312
crossref_primary_10_33889_IJMEMS_2024_9_4_048
crossref_primary_10_1016_j_artint_2023_103962
crossref_primary_10_3390_e23091174
crossref_primary_10_1109_TKDE_2025_3592032
crossref_primary_10_1145_3430952
crossref_primary_10_3390_e25030513
crossref_primary_10_14778_3407790_3407858
crossref_primary_10_3390_computers14050183
crossref_primary_10_1007_s10515_022_00359_5
crossref_primary_10_1145_3563307
crossref_primary_10_2196_32698
crossref_primary_10_1145_3485535
crossref_primary_10_1145_3591622
crossref_primary_10_1515_comp_2020_0125
crossref_primary_10_1145_3586047
crossref_primary_10_1007_s10115_024_02151_1
crossref_primary_10_1145_3276528
crossref_primary_10_1145_3341698
crossref_primary_10_1109_ACCESS_2019_2931464
crossref_primary_10_1145_3650114
crossref_primary_10_1109_TR_2023_3336330
crossref_primary_10_1145_3290385
crossref_primary_10_1145_3360614
crossref_primary_10_1109_TKDE_2024_3400824
crossref_primary_10_3390_electronics12092093
crossref_primary_10_1109_ACCESS_2023_3308908
crossref_primary_10_1109_ACCESS_2025_3607879
crossref_primary_10_1145_3296979_3192382
crossref_primary_10_1016_j_is_2019_03_002
crossref_primary_10_1109_TII_2019_2952929
crossref_primary_10_1007_s00778_022_00776_8
crossref_primary_10_3233_JIFS_210359
crossref_primary_10_1145_3725271
crossref_primary_10_1109_ACCESS_2022_3147586
crossref_primary_10_7759_s44389_025_05516_x
crossref_primary_10_1109_TVCG_2022_3148007
crossref_primary_10_14778_3717755_3717772
crossref_primary_10_3390_app112412116
crossref_primary_10_1007_s10707_023_00494_5
Cites_doi 10.1145/2491956.2462180
10.1007/3-540-44829-2_8
10.1145/3009837.3009851
10.3115/1693756.1693772
10.1145/1168857.1168907
10.1016/0004-3702(87)90011-7
10.1145/2837614.2837617
10.1145/1101908.1101949
10.1145/2666356.2594333
10.1145/2462456.2464443
10.1145/2858965.2814310
10.1109/ASE.2013.6693082
10.1145/581396.581397
10.1145/1065010.1065045
10.1145/2884781.2884786
10.14778/2735461.2735468
10.1145/2908080.2908093
10.1145/3062341.3062365
10.3115/1117794.1117811
10.1145/1449764.1449767
10.1109/ICSE.2012.6227211
10.18653/v1/D13-1160
10.3115/v1/P14-5010
10.1145/2786805.2786811
10.1145/1282480.1282482
10.1145/2908080.2908088
10.1145/320251.320253
10.3115/1220355.1220376
10.1146/annurev-linguist-030514-125312
10.1145/1559845.1559902
10.1145/1993498.1993550
10.1007/978-3-642-22110-1_40
10.1145/2491956.2462192
10.1145/604131.604140
10.1145/2813885.2737977
10.3115/981863.981871
10.1145/3062341.3062351
10.1145/2588555.2612177
10.3115/1220175.1220290
10.1007/11687238_44
10.1145/1926385.1926423
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.1145/3133887
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2475-1421
EndPage 26
ExternalDocumentID 10_1145_3133887
GroupedDBID AAKMM
AAYFX
AAYXX
ACM
AEFXT
AEJOY
AIKLT
AKRVB
ALMA_UNASSIGNED_HOLDINGS
CITATION
EBS
GUFHI
LHSKQ
M~E
OK1
ROL
ID FETCH-LOGICAL-c258t-7992b59876079a4469beb67a2e1f9f353b7d016b04b0654c97d3e70015a924713
ISICitedReferencesCount 150
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000688014000019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2475-1421
IngestDate Sat Nov 29 07:49:00 EST 2025
Tue Nov 18 22:38:55 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue OOPSLA
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c258t-7992b59876079a4469beb67a2e1f9f353b7d016b04b0654c97d3e70015a924713
OpenAccessLink http://dl.acm.org/ft_gateway.cfm?id=3133887&type=pdf
PageCount 26
ParticipantIDs crossref_citationtrail_10_1145_3133887
crossref_primary_10_1145_3133887
PublicationCentury 2000
PublicationDate 2017-10-01
PublicationDateYYYYMMDD 2017-10-01
PublicationDate_xml – month: 10
  year: 2017
  text: 2017-10-01
  day: 01
PublicationDecade 2010
PublicationTitle Proceedings of ACM on programming languages
PublicationYear 2017
References Elmasri Ramez (e_1_2_1_10_1) 2011
Androutsopoulos I (e_1_2_1_1_1)
Codd E. F. (e_1_2_1_8_1) 1974
e_1_2_1_60_1
e_1_2_1_20_1
e_1_2_1_41_1
e_1_2_1_24_1
e_1_2_1_45_1
e_1_2_1_22_1
e_1_2_1_28_1
e_1_2_1_49_1
e_1_2_1_47_1
Kate Rohit J. (e_1_2_1_26_1) 2005
Zelle John M. (e_1_2_1_58_1) 1996
e_1_2_1_31_1
e_1_2_1_54_1
e_1_2_1_56_1
e_1_2_1_12_1
e_1_2_1_35_1
e_1_2_1_50_1
e_1_2_1_33_1
e_1_2_1_16_1
e_1_2_1_39_1
e_1_2_1_14_1
e_1_2_1_37_1
e_1_2_1_18_1
Popescu Ana-Maria (e_1_2_1_43_1) 2003
e_1_2_1_42_1
e_1_2_1_40_1
e_1_2_1_23_1
e_1_2_1_21_1
e_1_2_1_44_1
e_1_2_1_27_1
Barowy Daniel W. (e_1_2_1_4_1) 2015
e_1_2_1_25_1
e_1_2_1_48_1
Weimer Westley (e_1_2_1_53_1) 2009
e_1_2_1_29_1
Androutsopoulos Ion (e_1_2_1_2_1) 1995
Zelle John M. (e_1_2_1_57_1) 1993
Carpenter Bob (e_1_2_1_6_1)
e_1_2_1_7_1
e_1_2_1_30_1
e_1_2_1_55_1
e_1_2_1_5_1
Warren David H. D. (e_1_2_1_52_1) 1982; 8
e_1_2_1_3_1
e_1_2_1_13_1
e_1_2_1_34_1
e_1_2_1_51_1
e_1_2_1_11_1
e_1_2_1_32_1
e_1_2_1_17_1
e_1_2_1_38_1
e_1_2_1_15_1
e_1_2_1_36_1
e_1_2_1_59_1
e_1_2_1_9_1
e_1_2_1_19_1
References_xml – ident: e_1_2_1_7_1
  doi: 10.1145/2491956.2462180
– ident: e_1_2_1_15_1
  doi: 10.1007/3-540-44829-2_8
– ident: e_1_2_1_12_1
  doi: 10.1145/3009837.3009851
– ident: e_1_2_1_34_1
  doi: 10.3115/1693756.1693772
– ident: e_1_2_1_48_1
  doi: 10.1145/1168857.1168907
– ident: e_1_2_1_16_1
  doi: 10.1016/0004-3702(87)90011-7
– volume-title: Claire Le Goues, and Stephanie Forrest
  year: 2009
  ident: e_1_2_1_53_1
– ident: e_1_2_1_33_1
  doi: 10.1145/2837614.2837617
– volume-title: Kautz
  year: 2003
  ident: e_1_2_1_43_1
– volume-title: Mooney
  year: 1993
  ident: e_1_2_1_57_1
– ident: e_1_2_1_21_1
  doi: 10.1145/1101908.1101949
– ident: e_1_2_1_27_1
  doi: 10.1145/2666356.2594333
– ident: e_1_2_1_28_1
  doi: 10.1145/2462456.2464443
– ident: e_1_2_1_41_1
  doi: 10.1145/2858965.2814310
– ident: e_1_2_1_45_1
– volume-title: Type-logical semantics
  ident: e_1_2_1_6_1
– volume: 8
  start-page: 3
  year: 1982
  ident: e_1_2_1_52_1
  article-title: An Efficient Easily Adaptable System for Interpreting Natural Language Queries
  publication-title: American Journal of Computational Linguistics
– ident: e_1_2_1_59_1
  doi: 10.1109/ASE.2013.6693082
– ident: e_1_2_1_22_1
  doi: 10.1145/581396.581397
– ident: e_1_2_1_47_1
  doi: 10.1145/1065010.1065045
– ident: e_1_2_1_9_1
  doi: 10.1145/2884781.2884786
– ident: e_1_2_1_29_1
  doi: 10.14778/2735461.2735468
– ident: e_1_2_1_40_1
  doi: 10.1145/2908080.2908093
– volume-title: Navathe
  year: 2011
  ident: e_1_2_1_10_1
– volume-title: An Efficient and Portable Natural Language Query Interface for Relational Databases. Tech report
  ident: e_1_2_1_1_1
– ident: e_1_2_1_51_1
  doi: 10.1145/3062341.3062365
– ident: e_1_2_1_49_1
  doi: 10.3115/1117794.1117811
– ident: e_1_2_1_54_1
  doi: 10.1145/1449764.1449767
– ident: e_1_2_1_14_1
  doi: 10.1109/ICSE.2012.6227211
– ident: e_1_2_1_38_1
– ident: e_1_2_1_5_1
  doi: 10.18653/v1/D13-1160
– ident: e_1_2_1_35_1
  doi: 10.3115/v1/P14-5010
– ident: e_1_2_1_32_1
  doi: 10.1145/2786805.2786811
– volume-title: IFIP Working Conference Data Base Management. 179–200
  year: 1974
  ident: e_1_2_1_8_1
– ident: e_1_2_1_60_1
  doi: 10.1145/1282480.1282482
– ident: e_1_2_1_56_1
  doi: 10.1145/2908080.2908088
– ident: e_1_2_1_20_1
  doi: 10.1145/320251.320253
– ident: e_1_2_1_42_1
  doi: 10.3115/1220355.1220376
– volume-title: Zorn
  year: 2015
  ident: e_1_2_1_4_1
– ident: e_1_2_1_31_1
  doi: 10.1146/annurev-linguist-030514-125312
– ident: e_1_2_1_50_1
  doi: 10.1145/1559845.1559902
– ident: e_1_2_1_24_1
  doi: 10.1145/1993498.1993550
– ident: e_1_2_1_23_1
  doi: 10.1007/978-3-642-22110-1_40
– volume-title: Mooney
  year: 1996
  ident: e_1_2_1_58_1
– volume-title: Natural language interfaces to databases - An Introduction. Natural Language Engineering
  year: 1995
  ident: e_1_2_1_2_1
– ident: e_1_2_1_19_1
  doi: 10.1145/2491956.2462192
– ident: e_1_2_1_3_1
  doi: 10.1145/604131.604140
– ident: e_1_2_1_13_1
  doi: 10.1145/2813885.2737977
– volume-title: Yuk Wah Wong, and Raymond J. Mooney
  year: 2005
  ident: e_1_2_1_26_1
– ident: e_1_2_1_37_1
  doi: 10.3115/981863.981871
– ident: e_1_2_1_44_1
– ident: e_1_2_1_11_1
  doi: 10.1145/3062341.3062351
– ident: e_1_2_1_18_1
  doi: 10.1145/2588555.2612177
– ident: e_1_2_1_25_1
  doi: 10.3115/1220175.1220290
– ident: e_1_2_1_30_1
  doi: 10.1007/11687238_44
– ident: e_1_2_1_39_1
– ident: e_1_2_1_17_1
  doi: 10.1145/1926385.1926423
– ident: e_1_2_1_55_1
– ident: e_1_2_1_36_1
SSID ssj0001934839
Score 2.5153334
Snippet This paper presents a new technique for automatically synthesizing SQL queries from natural language (NL). At the core of our technique is a new NL-based...
SourceID crossref
SourceType Enrichment Source
Index Database
StartPage 1
Title SQLizer: query synthesis from natural language
Volume 1
WOSCitedRecordID wos000688014000019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2475-1421
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001934839
  issn: 2475-1421
  databaseCode: M~E
  dateStart: 20170101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV07T8MwELbKY2DhjXjLA2ILNHZSx2wVAoFUoIiHYKrixAEkCKgUBAz8du4cxw0FCRhYosi1o6bf6V797o6QtUBozRLNPJVArBrorO6plPke0zpr-DHPhBnJct4Sh4fRxYVs12qvZS3M863I8-jlRT78K9SwBmBj6ewf4HYPhQW4B9DhCrDD9VfAnxy3bt60mf4IOr_7ik0JwMvDxiOmlsS08gRgykxl1T1tO3NmGB7N7QP8M8FyuO4wq1Cecq74ZXx1fRe_xam-LrS148hjlr7QJJdPOGr3yjnNmOMxH-w_9hke_dUKacnmI8DGlcw2q7ZYIELPD4q65w39zVqpdyvidXTUPmk1K4rUr1jkoqT-q64PsC0GxyDb2uxP3bQHrJzjHhaV2GHHHhwiI0yEEtmAB--V9JzkQWTm0LnvXpRc49lNe7biy1ScktNJMm6jCdospGCK1HQ-TSbKSR3UKu4ZsmGFYosakaBOJCiKBLUiQUtwZ8nZ7s7p9p5nJ2V4CQujniekZApeQjTqQsYQ4UulVUPETPuZzHjIlUjBt1f1QGExcSJFyjUyDsIY4m_h8zkynN_nep7QACLOKI254KlpJRTVFYuiUKRZppArs0DWy3fuJLaNPE4zue0M_LALhLqND0XnlMEtiz9vWSJjfSFbJsO97pNeIaPJc-_msbtqMPsA_YdePA
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SQLizer%3A+query+synthesis+from+natural+language&rft.jtitle=Proceedings+of+ACM+on+programming+languages&rft.au=Yaghmazadeh%2C+Navid&rft.au=Wang%2C+Yuepeng&rft.au=Dillig%2C+Isil&rft.au=Dillig%2C+Thomas&rft.date=2017-10-01&rft.issn=2475-1421&rft.eissn=2475-1421&rft.volume=1&rft.issue=OOPSLA&rft.spage=1&rft.epage=26&rft_id=info:doi/10.1145%2F3133887&rft.externalDBID=n%2Fa&rft.externalDocID=10_1145_3133887
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2475-1421&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2475-1421&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2475-1421&client=summon