DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:29th International Conference on Software Engineering (ICSE'07) s. 96 - 105
Hlavní autoři: Lingxiao Jiang, Misherghi, G., Zhendong Su, Glondu, S.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.05.2007
Témata:
ISBN:9780769528281, 0769528287
ISSN:0270-5257
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space R n middot and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar.
Bibliografie:SourceType-Conference Papers & Proceedings-1
ObjectType-Conference Paper-1
content type line 25
ISBN:9780769528281
0769528287
ISSN:0270-5257
DOI:10.1109/ICSE.2007.30