TQWERE: Transformer-Based SQL Query Executor

Uloženo v:
Podrobná bibliografie
Název: TQWERE: Transformer-Based SQL Query Executor
Autoři: Nir Regev, Asaf Shabtai, Lior Rokach
Zdroj: Natural Language Processing, Information Retrieval and AI Trends 2025. :115-131
Informace o vydavateli: Academy & Industry Research Collaboration Center, 2025.
Rok vydání: 2025
Popis: Recent developments in large language models (LLMs) trained on large-scale unstructured textual data have produced high-achieving models. However, it remains a challenge to train an LLM on vast structured (tabular) data for the task of understanding the information captured in the data and answering questions regarding the data. We propose a novel method - TQwerE, for approximating SQL aggregated queries’ results over large data sets. Our main focus was to reduce query latency and incurred costs. Moreover, since we focus on large data sets, majority of models that scan raw data are not applicable. Instead, our method fine tunes Jurassic-2 to learn the relations between aggregated SQL queries and their results without referring directly to the underlying raw data. We demonstrate TQwerE’s ability to approximate aggregated queries with state-of-the-art accuracy and speed. We evaluated TQwerE on twelve datasets, and our results demonstrated its superiority to both the state-of-the-art methods.
Druh dokumentu: Article
DOI: 10.5121/csit.2024.150209
Přístupové číslo: edsair.doi...........5e64252f2eeb8c8a0739c9dd4cb80a08
Databáze: OpenAIRE
Popis
Abstrakt:Recent developments in large language models (LLMs) trained on large-scale unstructured textual data have produced high-achieving models. However, it remains a challenge to train an LLM on vast structured (tabular) data for the task of understanding the information captured in the data and answering questions regarding the data. We propose a novel method - TQwerE, for approximating SQL aggregated queries’ results over large data sets. Our main focus was to reduce query latency and incurred costs. Moreover, since we focus on large data sets, majority of models that scan raw data are not applicable. Instead, our method fine tunes Jurassic-2 to learn the relations between aggregated SQL queries and their results without referring directly to the underlying raw data. We demonstrate TQwerE’s ability to approximate aggregated queries with state-of-the-art accuracy and speed. We evaluated TQwerE on twelve datasets, and our results demonstrated its superiority to both the state-of-the-art methods.
DOI:10.5121/csit.2024.150209