Repository-Level Graph Representation Learning for Enhanced Security Patch Detection

Software vendors often silently release security patches without providing sufficient advisories (e.g., Common Vulnerabilities and Exposures) or delayed updates via resources (e.g., National Vulnerability Database). Therefore, it has become crucial to detect these security patches to ensure secure s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings / International Conference on Software Engineering S. 1 - 13
Hauptverfasser: Wen, Xin-Cheng, Lin, Zirui, Gao, Cuiyun, Zhang, Hongyu, Wang, Yong, Liao, Qing
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 26.04.2025
Schlagworte:
ISSN:1558-1225
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Software vendors often silently release security patches without providing sufficient advisories (e.g., Common Vulnerabilities and Exposures) or delayed updates via resources (e.g., National Vulnerability Database). Therefore, it has become crucial to detect these security patches to ensure secure software maintenance. However, existing methods face the following challenges: (1) They primarily focus on the information within the patches themselves, overlooking the complex dependencies in the repository. (2) Security patches typically involve multiple functions and files, increasing the difficulty in well learning the representations. To alleviate the above challenges, this paper proposes a Repository-level Security Patch Detection framework named RepoSPD, which comprises three key components: 1) a repository-level graph construction, RepoCPG, which represents software patches by merging pre-patch and post-patch source code at the repository level; 2) a structure-aware patch representation, which fuses the graph and sequence branch and aims at comprehending the relationship among multiple code changes; 3) progressive learning, which facilitates the model in balancing semantic and structural information. To evaluate RepoSPD, we employ two widely-used datasets in security patch detection: SPI-DB and PatchDB. We further extend these datasets to the repository level, incorporating a total of 20,238 and \mathbf{2 8, 7 8 1} versions of repository in C/C++ programming languages, respectively, denoted as SPI-DB* and PatchDB*. We compare RepoSPD with six existing security patch detection methods and five static tools. Our experimental results demonstrate that RepoSPD outperforms the state-of-the-art baseline, with improvements of 11.90 %, and 3.10 % in terms of accuracy on the two datasets, respectively. These results underscore the effectiveness of RepoSPD in detecting security patches. Furthermore, RepoSPD can detect 151 security patches, which outperforms the best-performing baseline by \mathbf{2 1. 3 6 \%} with respect to accuracy.
AbstractList Software vendors often silently release security patches without providing sufficient advisories (e.g., Common Vulnerabilities and Exposures) or delayed updates via resources (e.g., National Vulnerability Database). Therefore, it has become crucial to detect these security patches to ensure secure software maintenance. However, existing methods face the following challenges: (1) They primarily focus on the information within the patches themselves, overlooking the complex dependencies in the repository. (2) Security patches typically involve multiple functions and files, increasing the difficulty in well learning the representations. To alleviate the above challenges, this paper proposes a Repository-level Security Patch Detection framework named RepoSPD, which comprises three key components: 1) a repository-level graph construction, RepoCPG, which represents software patches by merging pre-patch and post-patch source code at the repository level; 2) a structure-aware patch representation, which fuses the graph and sequence branch and aims at comprehending the relationship among multiple code changes; 3) progressive learning, which facilitates the model in balancing semantic and structural information. To evaluate RepoSPD, we employ two widely-used datasets in security patch detection: SPI-DB and PatchDB. We further extend these datasets to the repository level, incorporating a total of 20,238 and \mathbf{2 8, 7 8 1} versions of repository in C/C++ programming languages, respectively, denoted as SPI-DB* and PatchDB*. We compare RepoSPD with six existing security patch detection methods and five static tools. Our experimental results demonstrate that RepoSPD outperforms the state-of-the-art baseline, with improvements of 11.90 %, and 3.10 % in terms of accuracy on the two datasets, respectively. These results underscore the effectiveness of RepoSPD in detecting security patches. Furthermore, RepoSPD can detect 151 security patches, which outperforms the best-performing baseline by \mathbf{2 1. 3 6 \%} with respect to accuracy.
Author Wen, Xin-Cheng
Lin, Zirui
Liao, Qing
Gao, Cuiyun
Zhang, Hongyu
Wang, Yong
Author_xml – sequence: 1
  givenname: Xin-Cheng
  surname: Wen
  fullname: Wen, Xin-Cheng
  email: xiamenwxc@foxmail.com
  organization: Harbin Institute of Technology,Shenzhen,China
– sequence: 2
  givenname: Zirui
  surname: Lin
  fullname: Lin, Zirui
  email: 210110128@stu.hit.edu.cn
  organization: Harbin Institute of Technology,Shenzhen,China
– sequence: 3
  givenname: Cuiyun
  surname: Gao
  fullname: Gao, Cuiyun
  email: gaocuiyun@hit.edu.cn
  organization: Harbin Institute of Technology,Shenzhen,China
– sequence: 4
  givenname: Hongyu
  surname: Zhang
  fullname: Zhang, Hongyu
  email: hyzhang@cqu.edu.cn
  organization: Chongqing University,Chongqing,China
– sequence: 5
  givenname: Yong
  surname: Wang
  fullname: Wang, Yong
  email: yongwang@ahpu.edu.cn
  organization: Anhui Polytechnic University,Anhui,China
– sequence: 6
  givenname: Qing
  surname: Liao
  fullname: Liao, Qing
  email: liaoqing@hit.edu.cn
  organization: Harbin Institute of Technology,Shenzhen,China
BookMark eNotkM1KAzEURqMo2Na-QRd5gam5-Z0spdZaGFBsXZc0ubEjNTNkotC3t6KrDw6Hs_jG5Cp1CQmZAZsDMHu3XmyWSglp5pxxNWcMOFyQqTW2FgIUU9rCJRmBUnUFnKsbMh6GD8aYltaOyPYV-25oS5dPVYPfeKSr7PoDPeOMA6biStsl2qDLqU3vNHaZLtPBJY-BbtB_5bac6Isr_kAfsKD_1W_JdXTHAaf_OyFvj8vt4qlqnlfrxX1TOa5ZqdCEyPZeBxEFShlE4NHCnmsuIw9ahTpYMNLE2nADqLXjdu-s5ADeR1GLCZn9dVtE3PW5_XT5tDvfwq1RRvwAOSNTrA
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICSE55347.2025.00121
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798331505691
EISSN 1558-1225
EndPage 13
ExternalDocumentID 11029757
Genre orig-research
GroupedDBID -~X
.4S
.DC
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
FEDTE
I-F
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-a260t-e7df0bc6d3f3e44d3d2f91b2624f2d65d8d91747f87271e66a29ba94211ccf383
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001538318100203&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 01:40:13 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a260t-e7df0bc6d3f3e44d3d2f91b2624f2d65d8d91747f87271e66a29ba94211ccf383
PageCount 13
ParticipantIDs ieee_primary_11029757
PublicationCentury 2000
PublicationDate 2025-April-26
PublicationDateYYYYMMDD 2025-04-26
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-April-26
  day: 26
PublicationDecade 2020
PublicationTitle Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev ICSE
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0006499
Score 2.2899325
Snippet Software vendors often silently release security patches without providing sufficient advisories (e.g., Common Vulnerabilities and Exposures) or delayed...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Accuracy
Codes
Computer languages
deep learning
Merging
Representation learning
Security
security patch detection
Semantics
Software engineering
Software maintenance
Source coding
Title Repository-Level Graph Representation Learning for Enhanced Security Patch Detection
URI https://ieeexplore.ieee.org/document/11029757
WOSCitedRecordID wos001538318100203&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4I8eAJHxjf6cFrhd122-0ZQU0I2QgabmT7Ei6LgcXfb1sKevHgremhTdp0Zr7OfN8A3GvODScyw9L5HkxZTrAodYZtl1uhy4zaoFPwPuSjUT6diiKS1QMXxhgTis_Mgx-GXL5eqo3_Kus4V-WJoLwBDc7Zlqy1N7vMxe6RG5d0ReelN-5nGaHcYcDU_5skXg70VweV4EAGrX9ufQztHyoeKvZO5gQOTHUKrV0vBhSf5hlMfCS9XviUOR76QiD05KWo0WuodI0EowpFOdUP5GJV1K_mIf-PxrGJHSqcYZ6jR1OHAq2qDW-D_qT3jGPHBFw6XFJjw7XtSsU0scRQqolOrUhkylJqU80ynWsHzyi3uQtbEsNYmQpZCupQoFLWgdVzaFbLylwAUu4pEy1yyYmgRBlJtVtPkZJZIXhCL6HtT2n2uRXFmO0O6OqP-Ws48hfhEzEpu4FmvdqYWzhUX_VivboLV_kNG92ghA
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3LTgIxFG0UTXSFD4xvu3BbmWk77XSNIMSREEHDjkxfwmYwMPj9tmVANy7cNbPoJG3ae0_vOecCcK85N5zIBEkXexBlKUEi1wmyEbdC5wm1wafgPeP9fjoei0ElVg9aGGNMIJ-ZBz8MtXw9Vyv_VNZ0ocoLQfku2EsoxdFarrW9eJnL3it1XByJZq81bCcJodyhQOxfTmJvCPqrh0oIIZ36P39-BBo_Yjw42IaZY7BjihNQ33RjgNXhPAUjn0svZ75ojjJPBYJP3owavgauayUxKmBlqPoBXbYK28U0MADgsGpjBwfuap7CR1MGilbRAG-d9qjVRVXPBJQ7ZFIiw7WNpGKaWGIo1URjK2KJGaYWa5boVDuARrlNXeISG8ZyLGQuqMOBSlkHV89ArZgX5hxA5Q4z0SKVnAhKlJFUu_kUyZkVgsf0AjT8Kk0-17YYk80CXf7x_Q4cdEcv2STr9Z-vwKHfFF-Wwewa1MrFytyAffVVzpaL27Ct36NWo8s
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Repository-Level+Graph+Representation+Learning+for+Enhanced+Security+Patch+Detection&rft.au=Wen%2C+Xin-Cheng&rft.au=Lin%2C+Zirui&rft.au=Gao%2C+Cuiyun&rft.au=Zhang%2C+Hongyu&rft.date=2025-04-26&rft.pub=IEEE&rft.eissn=1558-1225&rft.spage=1&rft.epage=13&rft_id=info:doi/10.1109%2FICSE55347.2025.00121&rft.externalDocID=11029757