Scalable code clone detection tool based on semantic analysis
This article describes the methods of code clones detection. New approach of code clones detection is proposed for C/C++ languages based on analysis of existed methods. The method based on semantic analysis of the project, which allows detecting code clones with high accuracy. It is realized as part...
Saved in:
| Published in: | Trudy Instituta sistemnogo programmirovaniâ Vol. 27; no. 1; pp. 39 - 50 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Russian Academy of Sciences, Ivannikov Institute for System Programming
01.10.2018
|
| Subjects: | |
| ISSN: | 2079-8156, 2220-6426 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | This article describes the methods of code clones detection. New approach of code clones detection is proposed for C/C++ languages based on analysis of existed methods. The method based on semantic analysis of the project, which allows detecting code clones with high accuracy. It is realized as part of LLVM compiler, which allows exceeding existed methods. The tool is consisted of three basic parts. The first part is Program Dependence Graph (PDG) generation and serialization. PDG is constructed during compilation time of the project based on LLVM‘s intermediate representation. Several simple optimizations are applied on these graphs, then they are serialized to file. The second stage is analyzing of stored PDGs. PDGs are loaded from files and split to subgraphs. Every subgraph is considered as clone candidate. New method is purposed for the splitting, which increases number of detected clones. There are two types of algorithms for clone detection. The first types of algorithms try to prove that the pair of PDGs cannot be clones. These algorithms have linear complexity, which allows processing huge amount of PDGs pairs. In case of failure graph isomorphism algorithms are applied for similar subgraphs detection. The last part is integrated system for automatic testing of algorithm’s accuracy. For the project, set of clones are automatically generated, then clone detection algorithms are applied for original source and generated one. |
|---|---|
| AbstractList | This article describes the methods of code clones detection. New approach of code clones detection is proposed for C/C++ languages based on analysis of existed methods. The method based on semantic analysis of the project, which allows detecting code clones with high accuracy. It is realized as part of LLVM compiler, which allows exceeding existed methods. The tool is consisted of three basic parts. The first part is Program Dependence Graph (PDG) generation and serialization. PDG is constructed during compilation time of the project based on LLVM‘s intermediate representation. Several simple optimizations are applied on these graphs, then they are serialized to file. The second stage is analyzing of stored PDGs. PDGs are loaded from files and split to subgraphs. Every subgraph is considered as clone candidate. New method is purposed for the splitting, which increases number of detected clones. There are two types of algorithms for clone detection. The first types of algorithms try to prove that the pair of PDGs cannot be clones. These algorithms have linear complexity, which allows processing huge amount of PDGs pairs. In case of failure graph isomorphism algorithms are applied for similar subgraphs detection. The last part is integrated system for automatic testing of algorithm’s accuracy. For the project, set of clones are automatically generated, then clone detection algorithms are applied for original source and generated one. |
| Author | Aslanyan, Hayk Baloian, Artiom Sargsyan, Sevak Belevantsev, Andrey Kurmnagaleev, Shamil |
| Author_xml | – sequence: 1 givenname: Sevak surname: Sargsyan fullname: Sargsyan, Sevak – sequence: 2 givenname: Shamil surname: Kurmnagaleev fullname: Kurmnagaleev, Shamil – sequence: 3 givenname: Andrey surname: Belevantsev fullname: Belevantsev, Andrey – sequence: 4 givenname: Hayk surname: Aslanyan fullname: Aslanyan, Hayk – sequence: 5 givenname: Artiom surname: Baloian fullname: Baloian, Artiom |
| BookMark | eNo9kDtPwzAQgC1UJErpP2DICIPB51fSgaGqeFSqBKIwWxf7glKlMYqz9N9jWsRyr-G7u--STfrYE2PXIO7AGND36-3b-3LLpQDDZXkDt1ydsamUUnCrpZ3kWpQLXoGxF2ye0k4IIY0olYApe9h67LDuqPAx5NBleBFoJD-2sS_GGLuixkShyF2iPfZj6wvssTukNl2x8wa7RPO_PGOfT48fqxe-eX1er5Yb7vNSxVVtBYgAAarGNga1Jlt6bRGIFkLr4IU2WAYUtjYQDIIOlQeqFpWUQFLN2PrEDRF37nto9zgcXMTWHQdx-HI45MM6crWvyapQGrMgXUuqVGXzs3mQjaC3maVPLD_ElAZq_nkg3NGoOxl1v0adLB04p9QPGnhqSw |
| ContentType | Journal Article |
| DBID | AAYXX CITATION DOA |
| DOI | 10.15514/ISPRAS-2015-27(1)-3 |
| DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2220-6426 |
| EndPage | 50 |
| ExternalDocumentID | oai_doaj_org_article_bcbe63d7559e4b2e8386507d75222ac6 10_15514_ISPRAS_2015_27_1__3 |
| GroupedDBID | 642 AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ OK1 |
| ID | FETCH-LOGICAL-c1563-3b6010d1d18f6f5a44e67c46a1ee9044dc045a7da06b51d5a14d8c1e898221e23 |
| IEDL.DBID | DOA |
| ISSN | 2079-8156 |
| IngestDate | Mon Nov 03 21:57:20 EST 2025 Sat Nov 29 05:34:00 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1563-3b6010d1d18f6f5a44e67c46a1ee9044dc045a7da06b51d5a14d8c1e898221e23 |
| OpenAccessLink | https://doaj.org/article/bcbe63d7559e4b2e8386507d75222ac6 |
| PageCount | 12 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_bcbe63d7559e4b2e8386507d75222ac6 crossref_primary_10_15514_ISPRAS_2015_27_1__3 |
| PublicationCentury | 2000 |
| PublicationDate | 2018-10-01 |
| PublicationDateYYYYMMDD | 2018-10-01 |
| PublicationDate_xml | – month: 10 year: 2018 text: 2018-10-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationTitle | Trudy Instituta sistemnogo programmirovaniâ |
| PublicationYear | 2018 |
| Publisher | Russian Academy of Sciences, Ivannikov Institute for System Programming |
| Publisher_xml | – name: Russian Academy of Sciences, Ivannikov Institute for System Programming |
| SSID | ssj0002507301 |
| Score | 2.0416136 |
| Snippet | This article describes the methods of code clones detection. New approach of code clones detection is proposed for C/C++ languages based on analysis of existed... |
| SourceID | doaj crossref |
| SourceType | Open Website Index Database |
| StartPage | 39 |
| SubjectTerms | llvm pdg поиск клонов семантический анализ |
| Title | Scalable code clone detection tool based on semantic analysis |
| URI | https://doaj.org/article/bcbe63d7559e4b2e8386507d75222ac6 |
| Volume | 27 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2220-6426 dateEnd: 20201231 omitProxy: false ssIdentifier: ssj0002507301 issn: 2079-8156 databaseCode: DOA dateStart: 20100101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8NAEF6kePDiW3yzBw96CM1mN7vJsYpFL6VYhd6WfUygUFtpq7_fmSSVevLiJZBNCMM3k3wzZOcbxm6klwFCjmUqhCpRuoiJ17nHg1YxFBJSV9XDJsxgUIzH5XBj1BftCWvkgRvguj540DIazHxB-QwKGlKZGlxAZnOhFttOTblRTNE3GImdQpcmy-HFhCRR2r45yhC6z6PhS2-EESKoIetW3CXyFy9tyPfXPNPfZ7ttgsh7jWEHbAtmh2xvPXyBt-_iEaNfJ1Pqe-LUlc7DdD4DHmFVb62a8dV8PuVEUZHj2RLeEcFJ4K7VIDlmb_3H14enpJ2FkAS0XSbSU-UURRRFpavcKQXaBKWdAChThbhibuZMdKn2uYi5EyoWQUBB-nwCMnnCOjM05JTxEqQMKjOkHYblnHeZyWJaIdFXKaQazliyRsJ-NJIXlkoFQs42yFlCzmbGCmvlGbsnuH7uJcHqegHdaFs32r_ceP4fD7lgO2hYI1crLllntfiEK7YdvlaT5eK6jpBvfn27qQ |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Scalable+code+clone+detection+tool+based+on+semantic+analysis&rft.jtitle=Trudy+Instituta+sistemnogo+programmirovani%C3%A2&rft.au=Sevak+Sargsyan&rft.au=Shamil+Kurmnagaleev&rft.au=Andrey+Belevantsev&rft.au=Hayk+Aslanyan&rft.date=2018-10-01&rft.pub=Russian+Academy+of+Sciences%2C+Ivannikov+Institute+for+System+Programming&rft.issn=2079-8156&rft.eissn=2220-6426&rft.volume=27&rft.issue=1&rft.spage=39&rft.epage=50&rft_id=info:doi/10.15514%2FISPRAS-2015-27%281%29-3&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_bcbe63d7559e4b2e8386507d75222ac6 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2079-8156&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2079-8156&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2079-8156&client=summon |