Automatic annotation for accurate text-to-SPARQL translation using hybrid encoder–decoder models.

Uložené v:
Podrobná bibliografia
Názov: Automatic annotation for accurate text-to-SPARQL translation using hybrid encoder–decoder models.
Autori: Chen, Yi-Hui1,2,3 (AUTHOR), Lu, Eric Jui-Lin4 (AUTHOR) jllu@nchu.edu.tw, Hsu, Cheng-Hsien4 (AUTHOR)
Zdroj: Journal of Supercomputing. Jan2026, Vol. 82 Issue 1, p1-41. 41p.
Abstrakt: Knowledge graph question answering (KGQA) systems translate natural language questions into structured query languages (e.g., SQL/SPARQL). With advances in sequence-to-sequence models and large pre-trained language models (LPLMs), neural machine translation (NMT) has become a prevailing approach for Text-to-SPARQL. This study leverages the large-scale pre-trained language model T5 to obtain rich, transferable representations for SPARQL query generation and addresses translation errors that T5 may produce during decoding. Through an ablation study, we show that manually provided annotations (e.g., partial answers or gold entities) can effectively mitigate such errors; however, they require technical expertise and are therefore impractical for real-world deployment. To overcome this limitation, we propose a T5-based framework that integrates an MHC-LSTM architecture with an automatic annotation and correction mechanism. The automatic annotator, combined with the correction mechanism, yields the best overall results with T5-MHC-LSTM, narrowing the gap to manually annotated performance. Empirically, the proposed method achieves F1-measures of 89.63% and 95.58% on QALD-9 and LC-QuAD 1.0 for text-to-text translation, and 59% and 81% in end-to-end evaluations, respectively—surpassing existing KGQA systems. These findings confirm that combining LPLMs, MHC-LSTM, and automated annotation with correction substantially enhances SPARQL query generation and overall KGQA effectiveness. [ABSTRACT FROM AUTHOR]
Databáza: Academic Search Index
Popis
Abstrakt:Knowledge graph question answering (KGQA) systems translate natural language questions into structured query languages (e.g., SQL/SPARQL). With advances in sequence-to-sequence models and large pre-trained language models (LPLMs), neural machine translation (NMT) has become a prevailing approach for Text-to-SPARQL. This study leverages the large-scale pre-trained language model <monospace>T5</monospace> to obtain rich, transferable representations for SPARQL query generation and addresses translation errors that <monospace>T5</monospace> may produce during decoding. Through an ablation study, we show that manually provided annotations (e.g., partial answers or gold entities) can effectively mitigate such errors; however, they require technical expertise and are therefore impractical for real-world deployment. To overcome this limitation, we propose a T5-based framework that integrates an <monospace>MHC-LSTM</monospace> architecture with an <monospace>automatic annotation and correction mechanism</monospace>. The automatic annotator, combined with the correction mechanism, yields the best overall results with <monospace>T5-MHC-LSTM</monospace>, narrowing the gap to manually annotated performance. Empirically, the proposed method achieves F1-measures of <monospace>89.63%</monospace> and <monospace>95.58%</monospace> on <monospace>QALD-9</monospace> and <monospace>LC-QuAD 1.0</monospace> for text-to-text translation, and <monospace>59%</monospace> and <monospace>81%</monospace> in end-to-end evaluations, respectively—surpassing existing KGQA systems. These findings confirm that combining LPLMs, MHC-LSTM, and automated annotation with correction substantially enhances SPARQL query generation and overall KGQA effectiveness. [ABSTRACT FROM AUTHOR]
ISSN:09208542
DOI:10.1007/s11227-025-08127-4