Podrobná bibliografie
| Název: |
Generating highly customizable python code for data processing with large language models: Generating Highly Customizable Python Code for Data Processing...: I. Trummer. |
| Autoři: |
Trummer, Immanuel |
| Zdroj: |
VLDB Journal International Journal on Very Large Data Bases; Mar2025, Vol. 34 Issue 2, p1-19, 19p |
| Abstrakt: |
CARD (Coding Assistant for Relational Data analysis) generates Python code that processes relational queries on raw data. Users can customize generated code via natural language instructions, e.g., by instructing the system to use specific libraries or produce certain output. Internally, CARD uses large language models such as GPT-4o to synthesize code. CARD automatically constructs prompts describing code generation tasks to the language models. Those prompts contain information on data format, customization requirements, as well as processing plans, generated by CARD’s scenario-specific query planner. CARD automatically tests generated code by comparing its output to the output of a reference SQL engine. In case of inconsistencies, CARD re-generates code with a certain degree of randomization. Furthermore, CARD can automatically generate libraries of code samples for specific customization scenarios in a pre-processing step, leveraging those samples at run time for few-shot learning. The experiments show that CARD generates accurate code in the vast majority of scenarios. Furthermore, current trends in language models are likely to benefit CARD’s performance in the future. [ABSTRACT FROM AUTHOR] |
|
Copyright of VLDB Journal International Journal on Very Large Data Bases is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
| Databáze: |
Complementary Index |