Towards Better Answers: Automated Stack Overflow Post Updating

Utilizing code snippets on Stack Overflow (SO) is a common practice among developers for problem-solving. Although SO code snippets serve as valuable resources, it is important to acknowledge their imperfections, reusing problematic code snippets can lead to the introduction of suboptimal or buggy c...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings / International Conference on Software Engineering s. 591 - 603
Hlavní autoři:	Mai, Yubo, Gao, Zhipeng, Wang, Haoye, Bi, Tingting, Hu, Xing, Xia, Xin, Sun, Jianling
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 26.04.2025
Témata:	Benchmark testing Codes Data integrity Data Quality Knowledge engineering Large language models Post Updating Problem-solving Reliability engineering Software Software engineering Software reliability Stack Overflow
ISSN:	1558-1225
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Utilizing code snippets on Stack Overflow (SO) is a common practice among developers for problem-solving. Although SO code snippets serve as valuable resources, it is important to acknowledge their imperfections, reusing problematic code snippets can lead to the introduction of suboptimal or buggy code into software projects. SO comments often point out weaknesses of a post and provide valuable insights to improve the quality of answers, while SO comments are usually missed and/or ignored, leaving these problematic code snippets untouched. In this work, we first investigate the task of automatic SO posts updating based on their associated comments. We introduce a novel framework, named SOUP (Stack Overflow Updator for Post) for this task. SOUP addresses two key tasks: Valid Comment-Edit Prediction (VCP) and Automatic Post Updating (APU). We fine-tuned a large language model, CodeLlama, using low-rank adaptation techniques to complete the VCP task, and constructed a dataset containing 78k valid comment-edit pairs for the APU task. Subsequently, we tested the performance of multiple large language models on the APU task. Extensive experimental results show the promising performance of our model over a set of benchmarks. Moreover, we also perform an in-the-wild evaluation on Stack Overflow, we submitted 50 edits generated by our approach to Stack Overflow posts and 21 of them have been verified and accepted by SO maintainers, further proving the practical value of SOUP.
ISSN:	1558-1225
DOI:	10.1109/ICSE55347.2025.00024