Scalable code clone detection tool based on semantic analysis

This article describes the methods of code clones detection. New approach of code clones detection is proposed for C/C++ languages based on analysis of existed methods. The method based on semantic analysis of the project, which allows detecting code clones with high accuracy. It is realized as part...

Full description

Saved in:
Bibliographic Details
Published in:Trudy Instituta sistemnogo programmirovaniâ Vol. 27; no. 1; pp. 39 - 50
Main Authors: Sargsyan, Sevak, Kurmnagaleev, Shamil, Belevantsev, Andrey, Aslanyan, Hayk, Baloian, Artiom
Format: Journal Article
Language:English
Published: Russian Academy of Sciences, Ivannikov Institute for System Programming 01.10.2018
Subjects:
ISSN:2079-8156, 2220-6426
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract This article describes the methods of code clones detection. New approach of code clones detection is proposed for C/C++ languages based on analysis of existed methods. The method based on semantic analysis of the project, which allows detecting code clones with high accuracy. It is realized as part of LLVM compiler, which allows exceeding existed methods. The tool is consisted of three basic parts. The first part is Program Dependence Graph (PDG) generation and serialization. PDG is constructed during compilation time of the project based on LLVM‘s intermediate representation. Several simple optimizations are applied on these graphs, then they are serialized to file. The second stage is analyzing of stored PDGs. PDGs are loaded from files and split to subgraphs. Every subgraph is considered as clone candidate.  New method is purposed for the splitting, which increases number of detected clones. There are two types of algorithms for clone detection. The first types of algorithms try to prove that the pair of PDGs cannot be clones. These algorithms have linear complexity, which allows processing huge amount of PDGs pairs. In case of failure graph isomorphism algorithms are applied for similar subgraphs detection. The last part is integrated system for automatic testing of algorithm’s accuracy. For the project, set of clones are automatically generated, then clone detection algorithms are applied for original source and generated one.
AbstractList This article describes the methods of code clones detection. New approach of code clones detection is proposed for C/C++ languages based on analysis of existed methods. The method based on semantic analysis of the project, which allows detecting code clones with high accuracy. It is realized as part of LLVM compiler, which allows exceeding existed methods. The tool is consisted of three basic parts. The first part is Program Dependence Graph (PDG) generation and serialization. PDG is constructed during compilation time of the project based on LLVM‘s intermediate representation. Several simple optimizations are applied on these graphs, then they are serialized to file. The second stage is analyzing of stored PDGs. PDGs are loaded from files and split to subgraphs. Every subgraph is considered as clone candidate.  New method is purposed for the splitting, which increases number of detected clones. There are two types of algorithms for clone detection. The first types of algorithms try to prove that the pair of PDGs cannot be clones. These algorithms have linear complexity, which allows processing huge amount of PDGs pairs. In case of failure graph isomorphism algorithms are applied for similar subgraphs detection. The last part is integrated system for automatic testing of algorithm’s accuracy. For the project, set of clones are automatically generated, then clone detection algorithms are applied for original source and generated one.
Author Aslanyan, Hayk
Baloian, Artiom
Sargsyan, Sevak
Belevantsev, Andrey
Kurmnagaleev, Shamil
Author_xml – sequence: 1
  givenname: Sevak
  surname: Sargsyan
  fullname: Sargsyan, Sevak
– sequence: 2
  givenname: Shamil
  surname: Kurmnagaleev
  fullname: Kurmnagaleev, Shamil
– sequence: 3
  givenname: Andrey
  surname: Belevantsev
  fullname: Belevantsev, Andrey
– sequence: 4
  givenname: Hayk
  surname: Aslanyan
  fullname: Aslanyan, Hayk
– sequence: 5
  givenname: Artiom
  surname: Baloian
  fullname: Baloian, Artiom
BookMark eNo9kDtPwzAQgC1UJErpP2DICIPB51fSgaGqeFSqBKIwWxf7glKlMYqz9N9jWsRyr-G7u--STfrYE2PXIO7AGND36-3b-3LLpQDDZXkDt1ydsamUUnCrpZ3kWpQLXoGxF2ye0k4IIY0olYApe9h67LDuqPAx5NBleBFoJD-2sS_GGLuixkShyF2iPfZj6wvssTukNl2x8wa7RPO_PGOfT48fqxe-eX1er5Yb7vNSxVVtBYgAAarGNga1Jlt6bRGIFkLr4IU2WAYUtjYQDIIOlQeqFpWUQFLN2PrEDRF37nto9zgcXMTWHQdx-HI45MM6crWvyapQGrMgXUuqVGXzs3mQjaC3maVPLD_ElAZq_nkg3NGoOxl1v0adLB04p9QPGnhqSw
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.15514/ISPRAS-2015-27(1)-3
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2220-6426
EndPage 50
ExternalDocumentID oai_doaj_org_article_bcbe63d7559e4b2e8386507d75222ac6
10_15514_ISPRAS_2015_27_1__3
GroupedDBID 642
AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
GROUPED_DOAJ
OK1
ID FETCH-LOGICAL-c1563-3b6010d1d18f6f5a44e67c46a1ee9044dc045a7da06b51d5a14d8c1e898221e23
IEDL.DBID DOA
ISSN 2079-8156
IngestDate Mon Nov 03 21:57:20 EST 2025
Sat Nov 29 05:34:00 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1563-3b6010d1d18f6f5a44e67c46a1ee9044dc045a7da06b51d5a14d8c1e898221e23
OpenAccessLink https://doaj.org/article/bcbe63d7559e4b2e8386507d75222ac6
PageCount 12
ParticipantIDs doaj_primary_oai_doaj_org_article_bcbe63d7559e4b2e8386507d75222ac6
crossref_primary_10_15514_ISPRAS_2015_27_1__3
PublicationCentury 2000
PublicationDate 2018-10-01
PublicationDateYYYYMMDD 2018-10-01
PublicationDate_xml – month: 10
  year: 2018
  text: 2018-10-01
  day: 01
PublicationDecade 2010
PublicationTitle Trudy Instituta sistemnogo programmirovaniâ
PublicationYear 2018
Publisher Russian Academy of Sciences, Ivannikov Institute for System Programming
Publisher_xml – name: Russian Academy of Sciences, Ivannikov Institute for System Programming
SSID ssj0002507301
Score 2.0416136
Snippet This article describes the methods of code clones detection. New approach of code clones detection is proposed for C/C++ languages based on analysis of existed...
SourceID doaj
crossref
SourceType Open Website
Index Database
StartPage 39
SubjectTerms llvm
pdg
поиск клонов
семантический анализ
Title Scalable code clone detection tool based on semantic analysis
URI https://doaj.org/article/bcbe63d7559e4b2e8386507d75222ac6
Volume 27
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2220-6426
  dateEnd: 20201231
  omitProxy: false
  ssIdentifier: ssj0002507301
  issn: 2079-8156
  databaseCode: DOA
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8NAEF6kePDiW3yzBw96CM1mN7vJsYpFL6VYhd6WfUygUFtpq7_fmSSVevLiJZBNCMM3k3wzZOcbxm6klwFCjmUqhCpRuoiJ17nHg1YxFBJSV9XDJsxgUIzH5XBj1BftCWvkgRvguj540DIazHxB-QwKGlKZGlxAZnOhFttOTblRTNE3GImdQpcmy-HFhCRR2r45yhC6z6PhS2-EESKoIetW3CXyFy9tyPfXPNPfZ7ttgsh7jWEHbAtmh2xvPXyBt-_iEaNfJ1Pqe-LUlc7DdD4DHmFVb62a8dV8PuVEUZHj2RLeEcFJ4K7VIDlmb_3H14enpJ2FkAS0XSbSU-UURRRFpavcKQXaBKWdAChThbhibuZMdKn2uYi5EyoWQUBB-nwCMnnCOjM05JTxEqQMKjOkHYblnHeZyWJaIdFXKaQazliyRsJ-NJIXlkoFQs42yFlCzmbGCmvlGbsnuH7uJcHqegHdaFs32r_ceP4fD7lgO2hYI1crLllntfiEK7YdvlaT5eK6jpBvfn27qQ
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Scalable+code+clone+detection+tool+based+on+semantic+analysis&rft.jtitle=Trudy+Instituta+sistemnogo+programmirovani%C3%A2&rft.au=Sevak+Sargsyan&rft.au=Shamil+Kurmnagaleev&rft.au=Andrey+Belevantsev&rft.au=Hayk+Aslanyan&rft.date=2018-10-01&rft.pub=Russian+Academy+of+Sciences%2C+Ivannikov+Institute+for+System+Programming&rft.issn=2079-8156&rft.eissn=2220-6426&rft.volume=27&rft.issue=1&rft.spage=39&rft.epage=50&rft_id=info:doi/10.15514%2FISPRAS-2015-27%281%29-3&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_bcbe63d7559e4b2e8386507d75222ac6
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2079-8156&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2079-8156&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2079-8156&client=summon