Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis Tasks

Uloženo v:
Podrobná bibliografie
Název: Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis Tasks
Autoři: Qian Chen, Chenyang Yu, Ruyan Liu, Chi Zhang, Yu Wang, Ke Wang, Ting Su, Linzhang Wang
Zdroj: Proceedings of the ACM on Programming Languages. 8:500-528
Informace o vydavateli: Association for Computing Machinery (ACM), 2024.
Rok vydání: 2024
Témata: 0103 physical sciences, 0202 electrical engineering, electronic engineering, information engineering, 02 engineering and technology, 01 natural sciences
Popis: While deep neural networks provide state-of-the-art solutions to a wide range of programming language tasks, their effectiveness in dealing with foundational program analysis tasks remains under explored. In this paper, we present an empirical study that evaluates four prominent models of code (i.e., CuBERT, CodeBERT, GGNN, and Graph Sandwiches) in two such foundational tasks: (1) alias prediction, in which models predict whether two pointers must alias, may alias or must not alias; and (2) equivalence prediction, in which models predict whether or not two programs are semantically equivalent. At the core of this study is CodeSem, a dataset built upon the source code of real-world flagship software (e.g., Linux Kernel, GCC, MySQL) and manually validated for the two prediction tasks. Results show that all models are accurate in both prediction tasks, especially CuBERT with an accuracy of 89% and 84% in alias prediction and equivalence prediction, respectively. We also conduct a comprehensive, in-depth analysis of the results of all models in both tasks, concluding that deep learning models are generally capable of performing foundational tasks in program analysis even though in specific cases their weaknesses are also evident. Our code and evaluation data are publicly available at https://github.com/CodeSemDataset/CodeSem.
Druh dokumentu: Article
Jazyk: English
ISSN: 2475-1421
DOI: 10.1145/3649829
Rights: CC BY
Přístupové číslo: edsair.doi...........18cede87a0b51c8d8a20d40d768bff1c
Databáze: OpenAIRE
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edsair&genre=article&issn=24751421&ISBN=&volume=8&issue=&date=20240429&spage=500&pages=500-528&title=Proceedings of the ACM on Programming Languages&atitle=Evaluating%20the%20Effectiveness%20of%20Deep%20Learning%20Models%20for%20Foundational%20Program%20Analysis%20Tasks&aulast=Qian%20Chen&id=DOI:10.1145/3649829
    Name: Full Text Finder
    Category: fullText
    Text: Full Text Finder
    Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif
    MouseOverText: Full Text Finder
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Chen%20Q
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edsair
DbLabel: OpenAIRE
An: edsair.doi...........18cede87a0b51c8d8a20d40d768bff1c
RelevancyScore: 974
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 974.294311523438
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis Tasks
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Qian+Chen%22">Qian Chen</searchLink><br /><searchLink fieldCode="AR" term="%22Chenyang+Yu%22">Chenyang Yu</searchLink><br /><searchLink fieldCode="AR" term="%22Ruyan+Liu%22">Ruyan Liu</searchLink><br /><searchLink fieldCode="AR" term="%22Chi+Zhang%22">Chi Zhang</searchLink><br /><searchLink fieldCode="AR" term="%22Yu+Wang%22">Yu Wang</searchLink><br /><searchLink fieldCode="AR" term="%22Ke+Wang%22">Ke Wang</searchLink><br /><searchLink fieldCode="AR" term="%22Ting+Su%22">Ting Su</searchLink><br /><searchLink fieldCode="AR" term="%22Linzhang+Wang%22">Linzhang Wang</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <i>Proceedings of the ACM on Programming Languages</i>. 8:500-528
– Name: Publisher
  Label: Publisher Information
  Group: PubInfo
  Data: Association for Computing Machinery (ACM), 2024.
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2024
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%220103+physical+sciences%22">0103 physical sciences</searchLink><br /><searchLink fieldCode="DE" term="%220202+electrical+engineering%2C+electronic+engineering%2C+information+engineering%22">0202 electrical engineering, electronic engineering, information engineering</searchLink><br /><searchLink fieldCode="DE" term="%2202+engineering+and+technology%22">02 engineering and technology</searchLink><br /><searchLink fieldCode="DE" term="%2201+natural+sciences%22">01 natural sciences</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: While deep neural networks provide state-of-the-art solutions to a wide range of programming language tasks, their effectiveness in dealing with foundational program analysis tasks remains under explored. In this paper, we present an empirical study that evaluates four prominent models of code (i.e., CuBERT, CodeBERT, GGNN, and Graph Sandwiches) in two such foundational tasks: (1) alias prediction, in which models predict whether two pointers must alias, may alias or must not alias; and (2) equivalence prediction, in which models predict whether or not two programs are semantically equivalent. At the core of this study is CodeSem, a dataset built upon the source code of real-world flagship software (e.g., Linux Kernel, GCC, MySQL) and manually validated for the two prediction tasks. Results show that all models are accurate in both prediction tasks, especially CuBERT with an accuracy of 89% and 84% in alias prediction and equivalence prediction, respectively. We also conduct a comprehensive, in-depth analysis of the results of all models in both tasks, concluding that deep learning models are generally capable of performing foundational tasks in program analysis even though in specific cases their weaknesses are also evident. Our code and evaluation data are publicly available at https://github.com/CodeSemDataset/CodeSem.
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Article
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: ISSN
  Label: ISSN
  Group: ISSN
  Data: 2475-1421
– Name: DOI
  Label: DOI
  Group: ID
  Data: 10.1145/3649829
– Name: Copyright
  Label: Rights
  Group: Cpyrght
  Data: CC BY
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsair.doi...........18cede87a0b51c8d8a20d40d768bff1c
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsair&AN=edsair.doi...........18cede87a0b51c8d8a20d40d768bff1c
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1145/3649829
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 29
        StartPage: 500
    Subjects:
      – SubjectFull: 0103 physical sciences
        Type: general
      – SubjectFull: 0202 electrical engineering, electronic engineering, information engineering
        Type: general
      – SubjectFull: 02 engineering and technology
        Type: general
      – SubjectFull: 01 natural sciences
        Type: general
    Titles:
      – TitleFull: Evaluating the Effectiveness of Deep Learning Models for Foundational Program Analysis Tasks
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Qian Chen
      – PersonEntity:
          Name:
            NameFull: Chenyang Yu
      – PersonEntity:
          Name:
            NameFull: Ruyan Liu
      – PersonEntity:
          Name:
            NameFull: Chi Zhang
      – PersonEntity:
          Name:
            NameFull: Yu Wang
      – PersonEntity:
          Name:
            NameFull: Ke Wang
      – PersonEntity:
          Name:
            NameFull: Ting Su
      – PersonEntity:
          Name:
            NameFull: Linzhang Wang
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 29
              M: 04
              Type: published
              Y: 2024
          Identifiers:
            – Type: issn-print
              Value: 24751421
            – Type: issn-locals
              Value: edsair
            – Type: issn-locals
              Value: edsairFT
          Numbering:
            – Type: volume
              Value: 8
          Titles:
            – TitleFull: Proceedings of the ACM on Programming Languages
              Type: main
ResultId 1