MSSA: multi-stage semantic-aware neural network for binary code similarity detection.

Gespeichert in:
Bibliographische Detailangaben
Titel: MSSA: multi-stage semantic-aware neural network for binary code similarity detection.
Autoren: Wan, Bangrui, Zhou, Jianjun, Wang, Ying, Chen, Feng, Qian, Ying
Quelle: PeerJ Computer Science; Jan2025, p1-22, 22p
Schlagwörter: BINARY codes, INTEGRAL functions, TRANSFORMER models, LINEAR network coding, DEEP learning
Abstract: Binary code similarity detection (BCSD) aims to identify whether a pair of binary code snippets is similar, which is widely used for tasks such as malware analysis, patch analysis, and clone detection. Current state-of-the-art approaches are based on Transformer, which require substantial computation resources. Learning-based approaches remains room for optimization in learning the deeper semantics of binary code. In this paper, we propose MSSA, a multi-stage semantic-aware neural network for BCSD at the function level. It effectively integrates the semantic and structural information of assembly instructions within and between basic blocks, and across the entire function through four semantic-aware neural networks, achieving deep understanding of binary code semantics. MSSA is a lightweight model with only 0.38M parameters in its backbone network, suitable for deployment in CPU environments. Experimental results show that MSSA outperforms Gemini, Asm2Vec, SAFE, and jTrans in classification performance and ranks second only to the Transformer-based jTrans in retrieval performance. [ABSTRACT FROM AUTHOR]
Copyright of PeerJ Computer Science is the property of PeerJ Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Datenbank: Complementary Index
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=pmc&term=2376-5992[TA]+AND+1[PG]+AND+2025[PDAT]
    Name: FREE - PubMed Central (ISSN based link)
    Category: fullText
    Text: Full Text
    Icon: https://imageserver.ebscohost.com/NetImages/iconPdf.gif
    MouseOverText: Check this PubMed for the article full text.
  – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=23765992&ISBN=&volume=&issue=&date=20250101&spage=1&pages=1-22&title=PeerJ Computer Science&atitle=MSSA%3A%20multi-stage%20semantic-aware%20neural%20network%20for%20binary%20code%20similarity%20detection.&aulast=Wan%2C%20Bangrui&id=DOI:10.7717/peerj-cs.2504
    Name: Full Text Finder
    Category: fullText
    Text: Full Text Finder
    Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif
    MouseOverText: Full Text Finder
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Wan%20B
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edb
DbLabel: Complementary Index
An: 182849440
RelevancyScore: 1007
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1007.31671142578
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: MSSA: multi-stage semantic-aware neural network for binary code similarity detection.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Wan%2C+Bangrui%22">Wan, Bangrui</searchLink><br /><searchLink fieldCode="AR" term="%22Zhou%2C+Jianjun%22">Zhou, Jianjun</searchLink><br /><searchLink fieldCode="AR" term="%22Wang%2C+Ying%22">Wang, Ying</searchLink><br /><searchLink fieldCode="AR" term="%22Chen%2C+Feng%22">Chen, Feng</searchLink><br /><searchLink fieldCode="AR" term="%22Qian%2C+Ying%22">Qian, Ying</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: PeerJ Computer Science; Jan2025, p1-22, 22p
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22BINARY+codes%22">BINARY codes</searchLink><br /><searchLink fieldCode="DE" term="%22INTEGRAL+functions%22">INTEGRAL functions</searchLink><br /><searchLink fieldCode="DE" term="%22TRANSFORMER+models%22">TRANSFORMER models</searchLink><br /><searchLink fieldCode="DE" term="%22LINEAR+network+coding%22">LINEAR network coding</searchLink><br /><searchLink fieldCode="DE" term="%22DEEP+learning%22">DEEP learning</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Binary code similarity detection (BCSD) aims to identify whether a pair of binary code snippets is similar, which is widely used for tasks such as malware analysis, patch analysis, and clone detection. Current state-of-the-art approaches are based on Transformer, which require substantial computation resources. Learning-based approaches remains room for optimization in learning the deeper semantics of binary code. In this paper, we propose MSSA, a multi-stage semantic-aware neural network for BCSD at the function level. It effectively integrates the semantic and structural information of assembly instructions within and between basic blocks, and across the entire function through four semantic-aware neural networks, achieving deep understanding of binary code semantics. MSSA is a lightweight model with only 0.38M parameters in its backbone network, suitable for deployment in CPU environments. Experimental results show that MSSA outperforms Gemini, Asm2Vec, SAFE, and jTrans in classification performance and ranks second only to the Transformer-based jTrans in retrieval performance. [ABSTRACT FROM AUTHOR]
– Name: Abstract
  Label:
  Group: Ab
  Data: <i>Copyright of PeerJ Computer Science is the property of PeerJ Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=182849440
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.7717/peerj-cs.2504
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 22
        StartPage: 1
    Subjects:
      – SubjectFull: BINARY codes
        Type: general
      – SubjectFull: INTEGRAL functions
        Type: general
      – SubjectFull: TRANSFORMER models
        Type: general
      – SubjectFull: LINEAR network coding
        Type: general
      – SubjectFull: DEEP learning
        Type: general
    Titles:
      – TitleFull: MSSA: multi-stage semantic-aware neural network for binary code similarity detection.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Wan, Bangrui
      – PersonEntity:
          Name:
            NameFull: Zhou, Jianjun
      – PersonEntity:
          Name:
            NameFull: Wang, Ying
      – PersonEntity:
          Name:
            NameFull: Chen, Feng
      – PersonEntity:
          Name:
            NameFull: Qian, Ying
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Text: Jan2025
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 23765992
          Titles:
            – TitleFull: PeerJ Computer Science
              Type: main
ResultId 1