Zobraziť v EDS

Cloud-assisted LLM-enhanced datasets for AST hierarchy-aware code summarization model.

Uložené v:

Podrobná bibliografia
Názov:	Cloud-assisted LLM-enhanced datasets for AST hierarchy-aware code summarization model.
Autori:	Zhang, Junsan, Yan, Yudie, Han, Junxiao, Lu, Ao, Guo, Juncai, Pourzamani, Javad
Zdroj:	Journal of Cloud Computing (2192-113X); 2/14/2026, Vol. 15 Issue 1, p1-18, 18p
Abstrakt:	Code summarization is an important task in software engineering that helps developers understand and maintain code by generating natural language summaries. Existing approaches predominantly rely on single models, facing a dilemma: directly deploying large language models (LLMs) incurs high training costs, while lightweight models specialized for summarization are constrained by the quality of training data and their ability to capture the complex structural semantics of code. This highlights the urgent need for synergistic collaboration between large and small models in cloud computing environments. To address these issues, this paper proposes a cloud-assisted code summarization framework. First, we achieve code enhancement by invoking cloud-deployed LLM services. The specific workflow involves using preset prompt templates to guide the model in evaluating code quality and automatically repairing defects based on its feedback, thereby constructing high-quality datasets Java-QE and Python-QE. Second, for efficient edge deployment, we introduce HiSum: AST Hierarchy-Aware Code Summarization model, a lightweight model. HiSum transforms code AST into Directed Syntax Graphs (DSG) to preserve structural semantics, encodes them via a directed graph convolutional network and decode to improve summary quality. Experimental results show that our framework significantly enhances code summarization performance. On the constructed Java-QE and Python-QE datasets, the HiSum model achieves notable improvements over state-of-the-art baselines in BLEU, METEOR, and ROUGE-L metrics (increases of 1.06%, 1.98%, 3.12% for Java-QE, and 1.46%, 3.24%, 2.20% for Python-QE, respectively). This research provides a solution that utilizes cloud LLM-assisted data enhancement to empower a lightweight hierarchical-aware model. [ABSTRACT FROM AUTHOR]
	Copyright of Journal of Cloud Computing (2192-113X) is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Databáza:	Complementary Index

Full Text Finder

Nájsť tento článok vo Web of Science

Popis
Abstrakt:	Code summarization is an important task in software engineering that helps developers understand and maintain code by generating natural language summaries. Existing approaches predominantly rely on single models, facing a dilemma: directly deploying large language models (LLMs) incurs high training costs, while lightweight models specialized for summarization are constrained by the quality of training data and their ability to capture the complex structural semantics of code. This highlights the urgent need for synergistic collaboration between large and small models in cloud computing environments. To address these issues, this paper proposes a cloud-assisted code summarization framework. First, we achieve code enhancement by invoking cloud-deployed LLM services. The specific workflow involves using preset prompt templates to guide the model in evaluating code quality and automatically repairing defects based on its feedback, thereby constructing high-quality datasets Java-QE and Python-QE. Second, for efficient edge deployment, we introduce HiSum: AST Hierarchy-Aware Code Summarization model, a lightweight model. HiSum transforms code AST into Directed Syntax Graphs (DSG) to preserve structural semantics, encodes them via a directed graph convolutional network and decode to improve summary quality. Experimental results show that our framework significantly enhances code summarization performance. On the constructed Java-QE and Python-QE datasets, the HiSum model achieves notable improvements over state-of-the-art baselines in BLEU, METEOR, and ROUGE-L metrics (increases of 1.06%, 1.98%, 3.12% for Java-QE, and 1.46%, 3.24%, 2.20% for Python-QE, respectively). This research provides a solution that utilizes cloud LLM-assisted data enhancement to empower a lightweight hierarchical-aware model. [ABSTRACT FROM AUTHOR]
ISSN:	2192113X
DOI:	10.1186/s13677-026-00852-2