Interpretable Code Summarization

Code summarization is a process of creating a readable natural language from programming source codes. Code summarization has become a popular research topic for software maintenance, code generation, and code recovery. Existing code summarization methods follow the encoding/decoding approach and us...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on reliability Ročník 74; číslo 1; s. 2280 - 2289
Hlavní autoři: Kamal, Md Sarwar, Nimmy, Sonia Farhana, Dey, Nilanjan
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.03.2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:0018-9529, 1558-1721
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Code summarization is a process of creating a readable natural language from programming source codes. Code summarization has become a popular research topic for software maintenance, code generation, and code recovery. Existing code summarization methods follow the encoding/decoding approach and use various machine learning techniques to generate natural language from source codes. Although most of these methods are state of the art, it is difficult to understand the complex encoding and decoding process to map the tokens with natural language words. Therefore, these coding and decoding approaches are treated as opaque models (black box). This research proposes explainable AI methods that overcome the black box features for the token mapping in code summarization process. Here, we created an abstract syntax tree (AST) from the tokens of the source code. We then embedded the AST into natural language words using a bilingual statistical probability approach to generate possible statistical parse trees. We applied a page rank algorithm among the parse trees to rank the trees. From the best-ranked tree, we generate the comment for the corresponding code snippet. To explain our code generation method, we used Takagi-Sugeno fuzzy approach, layerwise relevance propagation and a hidden Markov model. These approaches make our method trustworthy and understandable to humans to understand the process of source code token mapping with natural language words.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0018-9529
1558-1721
DOI:10.1109/TR.2024.3392876