Zero-Shot Prompting Strategies for Table Question Answering in Portuguese: An Exploration of Prompt-Based Approaches for Text2SQL in the Portuguese Language

Uložené v:
Podrobná bibliografia
Názov: Zero-Shot Prompting Strategies for Table Question Answering in Portuguese: An Exploration of Prompt-Based Approaches for Text2SQL in the Portuguese Language
Autori: Jannuzzi, Marcelo Poles
Prispievatelia: Castelli, Mauro, Peres, Fernando Augusto Junqueira
Rok vydania: 1483
Zbierka: Repositório da Universidade Nova de Lisboa (UNL)
Predmety: Natural Language Processing, Text2SQL, Table Question Answering, Large Language Models, GPT-3, GPT-4, Zero-Shot Prompting, Portuguese Language, Spider Benchmark, Natural Language Interface for Databases, Processamento de Linguagem Natural, Língua Portuguesa, Conjunto de Dados Spider, Interface de Linguagem Natural para Bancos de Dados, Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação
Popis: Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science ; This thesis explores the application of zero-shot prompting strategies for table question answering (TQA) in Portuguese, focusing specifically on the Text2SQL task. This task involves translating questions posed in natural language into SQL queries which can be executed against a database to answer the original question. Given the popularity of relational databases across various domains, advancements in this field can substan tially impact the accessibility and democratization of data as simpler and more intuitive interfaces for database interaction are developed. Despite this significant potential, progress in developing Portuguese TQA solutions remains limited. We propose a previously unexplored approach to the Text2SQL task in Portuguese by leveraging Large Language Models (LLMs)—specifically the GPT-3.5 and GPT 4 models—through zero-shot prompting. The primary objectives are to assess the effectiveness of such LLMs in this task and to identify the most suitable prompt styles. These are evaluated using a Portuguese translation of the popular Spider Text2SQL benchmark. Results from this work reveal that, while not outperforming state-of-the-art models, our proposed approach can generate adequate SQL queries to answer Portuguese lan guage questions about various databases, particularly when using GPT-4. The findings suggest that including schema information and database content in the prompts is critical for satisfactory outcomes. Furthermore, we point out issues with the automatic evaluation process used in the Spider benchmark, which may lead to underestimating the performance of the approaches tested here. ; Esta tese explora a aplicação de estratégias de zero-shot prompting para responder per guntas na língua portuguesa a respeito de informações contidas em tabelas (área conhecida como Table Question Answering — TQA), com foco específico em Text2SQL. ...
Druh dokumentu: master thesis
Jazyk: English
Relation: http://hdl.handle.net/10362/159406; 203377222
Dostupnosť: http://hdl.handle.net/10362/159406
Rights: embargoedAccess ; http://creativecommons.org/licenses/by/4.0/
Prístupové číslo: edsbas.F509F23
Databáza: BASE
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://hdl.handle.net/10362/159406#
    Name: EDS - BASE (s4221598)
    Category: fullText
    Text: View record from BASE
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Jannuzzi%20MP
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edsbas
DbLabel: BASE
An: edsbas.F509F23
RelevancyScore: 674
AccessLevel: 3
PubType: Dissertation/ Thesis
PubTypeId: dissertation
PreciseRelevancyScore: 673.574829101563
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Zero-Shot Prompting Strategies for Table Question Answering in Portuguese: An Exploration of Prompt-Based Approaches for Text2SQL in the Portuguese Language
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Jannuzzi%2C+Marcelo+Poles%22">Jannuzzi, Marcelo Poles</searchLink>
– Name: Author
  Label: Contributors
  Group: Au
  Data: Castelli, Mauro<br />Peres, Fernando Augusto Junqueira
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 1483
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Repositório da Universidade Nova de Lisboa (UNL)
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Natural+Language+Processing%22">Natural Language Processing</searchLink><br /><searchLink fieldCode="DE" term="%22Text2SQL%22">Text2SQL</searchLink><br /><searchLink fieldCode="DE" term="%22Table+Question+Answering%22">Table Question Answering</searchLink><br /><searchLink fieldCode="DE" term="%22Large+Language+Models%22">Large Language Models</searchLink><br /><searchLink fieldCode="DE" term="%22GPT-3%22">GPT-3</searchLink><br /><searchLink fieldCode="DE" term="%22GPT-4%22">GPT-4</searchLink><br /><searchLink fieldCode="DE" term="%22Zero-Shot+Prompting%22">Zero-Shot Prompting</searchLink><br /><searchLink fieldCode="DE" term="%22Portuguese+Language%22">Portuguese Language</searchLink><br /><searchLink fieldCode="DE" term="%22Spider+Benchmark%22">Spider Benchmark</searchLink><br /><searchLink fieldCode="DE" term="%22Natural+Language+Interface+for+Databases%22">Natural Language Interface for Databases</searchLink><br /><searchLink fieldCode="DE" term="%22Processamento+de+Linguagem+Natural%22">Processamento de Linguagem Natural</searchLink><br /><searchLink fieldCode="DE" term="%22Língua+Portuguesa%22">Língua Portuguesa</searchLink><br /><searchLink fieldCode="DE" term="%22Conjunto+de+Dados+Spider%22">Conjunto de Dados Spider</searchLink><br /><searchLink fieldCode="DE" term="%22Interface+de+Linguagem+Natural+para+Bancos+de+Dados%22">Interface de Linguagem Natural para Bancos de Dados</searchLink><br /><searchLink fieldCode="DE" term="%22Domínio%2FÁrea+Científica%3A%3ACiências+Naturais%3A%3ACiências+da+Computação+e+da+Informação%22">Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science ; This thesis explores the application of zero-shot prompting strategies for table question answering (TQA) in Portuguese, focusing specifically on the Text2SQL task. This task involves translating questions posed in natural language into SQL queries which can be executed against a database to answer the original question. Given the popularity of relational databases across various domains, advancements in this field can substan tially impact the accessibility and democratization of data as simpler and more intuitive interfaces for database interaction are developed. Despite this significant potential, progress in developing Portuguese TQA solutions remains limited. We propose a previously unexplored approach to the Text2SQL task in Portuguese by leveraging Large Language Models (LLMs)—specifically the GPT-3.5 and GPT 4 models—through zero-shot prompting. The primary objectives are to assess the effectiveness of such LLMs in this task and to identify the most suitable prompt styles. These are evaluated using a Portuguese translation of the popular Spider Text2SQL benchmark. Results from this work reveal that, while not outperforming state-of-the-art models, our proposed approach can generate adequate SQL queries to answer Portuguese lan guage questions about various databases, particularly when using GPT-4. The findings suggest that including schema information and database content in the prompts is critical for satisfactory outcomes. Furthermore, we point out issues with the automatic evaluation process used in the Spider benchmark, which may lead to underestimating the performance of the approaches tested here. ; Esta tese explora a aplicação de estratégias de zero-shot prompting para responder per guntas na língua portuguesa a respeito de informações contidas em tabelas (área conhecida como Table Question Answering — TQA), com foco específico em Text2SQL. ...
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: master thesis
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: NoteTitleSource
  Label: Relation
  Group: SrcInfo
  Data: http://hdl.handle.net/10362/159406; 203377222
– Name: URL
  Label: Availability
  Group: URL
  Data: http://hdl.handle.net/10362/159406
– Name: Copyright
  Label: Rights
  Group: Cpyrght
  Data: embargoedAccess ; http://creativecommons.org/licenses/by/4.0/
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsbas.F509F23
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.F509F23
RecordInfo BibRecord:
  BibEntity:
    Languages:
      – Text: English
    Subjects:
      – SubjectFull: Natural Language Processing
        Type: general
      – SubjectFull: Text2SQL
        Type: general
      – SubjectFull: Table Question Answering
        Type: general
      – SubjectFull: Large Language Models
        Type: general
      – SubjectFull: GPT-3
        Type: general
      – SubjectFull: GPT-4
        Type: general
      – SubjectFull: Zero-Shot Prompting
        Type: general
      – SubjectFull: Portuguese Language
        Type: general
      – SubjectFull: Spider Benchmark
        Type: general
      – SubjectFull: Natural Language Interface for Databases
        Type: general
      – SubjectFull: Processamento de Linguagem Natural
        Type: general
      – SubjectFull: Língua Portuguesa
        Type: general
      – SubjectFull: Conjunto de Dados Spider
        Type: general
      – SubjectFull: Interface de Linguagem Natural para Bancos de Dados
        Type: general
      – SubjectFull: Domínio/Área Científica::Ciências Naturais::Ciências da Computação e da Informação
        Type: general
    Titles:
      – TitleFull: Zero-Shot Prompting Strategies for Table Question Answering in Portuguese: An Exploration of Prompt-Based Approaches for Text2SQL in the Portuguese Language
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Jannuzzi, Marcelo Poles
      – PersonEntity:
          Name:
            NameFull: Castelli, Mauro
      – PersonEntity:
          Name:
            NameFull: Peres, Fernando Augusto Junqueira
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Type: published
              Y: 1483
          Identifiers:
            – Type: issn-locals
              Value: edsbas
            – Type: issn-locals
              Value: edsbas.oa
ResultId 1