Software defect prediction via LSTM

Software quality plays an important role in the software lifecycle. Traditional software defect prediction approaches mainly focused on using hand-crafted features to detect defects. However, like human languages, programming languages contain rich semantic and structural information, and the cause...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IET software Ročník 14; číslo 4; s. 443 - 450
Hlavní autoři:	Deng, Jiehan, Lu, Lu, Qiu, Shaojian
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	The Institution of Engineering and Technology 01.08.2020
Témata:	AST node sequence contextual features defective code feature extraction human languages learning (artificial intelligence) long short-term memory network LSTM machine learning techniques numerical vectors open source projects program abstract syntax trees program debugging program diagnostics programming languages public domain software recurrent neural nets Research Article semantic features software defect prediction approaches software lifecycle software quality structural information trees (mathematics) word embedding techniques program debugging open source projects long short-term memory network contextual features LSTM defective code public domain software program diagnostics AST node sequence human languages recurrent neural nets trees (mathematics) software lifecycle software defect prediction approaches software quality machine learning techniques programming languages word embedding techniques program abstract syntax trees feature extraction structural information learning (artificial intelligence) semantic features numerical vectors
ISSN:	1751-8806, 1751-8814, 1751-8814
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Software quality plays an important role in the software lifecycle. Traditional software defect prediction approaches mainly focused on using hand-crafted features to detect defects. However, like human languages, programming languages contain rich semantic and structural information, and the cause of defective code is closely related to its context. Failing to catch this significant information, the performance of traditional approaches is far from satisfactory. In this study, the authors leveraged a long short-term memory (LSTM) network to automatically learn the semantic and contextual features from the source code. Specifically, they first extract the program's Abstract Syntax Trees (ASTs), which is made up of AST nodes, and then evaluate what and how much information they can preserve for several node types. They traverse the AST of each file and fed them into the LSTM network to automatically the semantic and contextual features of the program, which is then used to determine whether the file is defective. Experimental results on several opensource projects showed that the proposed LSTM method is superior to the state-of-the-art methods.
ISSN:	1751-8806 1751-8814 1751-8814
DOI:	10.1049/iet-sen.2019.0149