Bibliographic Details
| Title: |
Refining Zero-Shot Text-to-SQL Benchmarks via Prompt Strategies with Large Language Models. |
| Authors: |
Zhou, Ruikang, Zhang, Fan |
| Source: |
Applied Sciences (2076-3417); May2025, Vol. 15 Issue 10, p5306, 23p |
| Subject Terms: |
LANGUAGE models, DATABASES, GENERATIVE pre-trained transformers, NATURAL languages, SQL |
| Abstract: |
Text-to-SQL leverages large language models (LLMs) for natural language database queries, yet existing benchmarks like BIRD (12,751 question–SQL pairs, 95 databases) suffer from inconsistencies—e.g., 30% of queries misalign with SQL outputs—and ambiguities that impair LLM evaluation. This study refines such datasets by distilling logically sound question–SQL pairs and enhancing table schemas, yielding a benchmark of 146 high-complexity tasks across 11 domains. We assess GPT-4o, GPT-4o-Mini, Qwen-2.5-Instruct, llama 370b, DPSK-v3 and O1-Preview in zero-shot scenarios, achieving average accuracies of 51.23%, 41.65%, 44.25%, 47.80%, and 49.10% and a peak of 78.08% (O1-Preview), respectively. Prompt-based strategies improve performance by up to 4.78%, addressing issues like poor domain adaptability and inconsistent training data interpretation. Error-annotated datasets further reveal LLM limitations. This refined benchmark ensures robust evaluation of logical reasoning, supporting reliable NLP-driven database systems. [ABSTRACT FROM AUTHOR] |
|
Copyright of Applied Sciences (2076-3417) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
| Database: |
Complementary Index |