Enhancing security in text-to-SQL systems: A novel dataset and agent-based framework.

Saved in:
Bibliographic Details
Title: Enhancing security in text-to-SQL systems: A novel dataset and agent-based framework.
Authors: Chafik, Salmane, Ezzini, Saad, Berrada, Ismail
Source: Natural Language Processing (29770424); Nov2025, Vol. 31 Issue 6, p1399-1422, 24p
Subject Terms: LANGUAGE models, DATABASES, TRANSFORMER models, NATURAL languages, SECURITY systems
Abstract: This paper explores the significant advancements in generating Structured Query Language (SQL) from natural language, primarily driven by Large Language Models (LLMs). These advancements have led to the development of sophisticated text-to-SQL integrated applications, enabling easier database (DB) querying for users unfamiliar with SQL syntax using natural language queries. However, reliance on LLMs exposes these applications to potential attacks through the introduction of malicious prompts or by compromising models with malicious data during the training phase. Such attacks pose severe risks, including unauthorized data access or even complete DB destruction upon success. To address these concerns, we introduce a novel large-scale dataset comprising malicious and safe prompts along with their corresponding SQL queries, enabling model fine-tuning on malicious query detection tasks. Moreover, we propose the implementation of two transformer-based classification solutions to aid in the detection of malicious attacks. Finally, we present a secure agent-based text-to-SQL architecture that incorporates these solutions to enhance overall system security, resulting in a 70% security enhancement overall compared to solely relying on a conventional text-to-SQL model. [ABSTRACT FROM AUTHOR]
Copyright of Natural Language Processing (29770424) is the property of Cambridge University Press and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Complementary Index
Description
Abstract:This paper explores the significant advancements in generating Structured Query Language (SQL) from natural language, primarily driven by Large Language Models (LLMs). These advancements have led to the development of sophisticated text-to-SQL integrated applications, enabling easier database (DB) querying for users unfamiliar with SQL syntax using natural language queries. However, reliance on LLMs exposes these applications to potential attacks through the introduction of malicious prompts or by compromising models with malicious data during the training phase. Such attacks pose severe risks, including unauthorized data access or even complete DB destruction upon success. To address these concerns, we introduce a novel large-scale dataset comprising malicious and safe prompts along with their corresponding SQL queries, enabling model fine-tuning on malicious query detection tasks. Moreover, we propose the implementation of two transformer-based classification solutions to aid in the detection of malicious attacks. Finally, we present a secure agent-based text-to-SQL architecture that incorporates these solutions to enhance overall system security, resulting in a 70% security enhancement overall compared to solely relying on a conventional text-to-SQL model. [ABSTRACT FROM AUTHOR]
ISSN:29770424
DOI:10.1017/nlp.2025.10008