Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code

Ransomware and other forms of malware cause significant financial and operational damage to organizations by exploiting long-standing and often difficult-to-detect software vulnerabilities. To detect vulnerabilities such as buffer overflows in compiled code, this research investigates the applicatio...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings (International Symposium on Digital Forensic and Security. Online) s. 1 - 6
Hlavní autoři: McCully, Gary A., Hastings, John D., Xu, Shengjie, Fortier, Adam
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 24.04.2025
Témata:
ISSN:2768-1831
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Ransomware and other forms of malware cause significant financial and operational damage to organizations by exploiting long-standing and often difficult-to-detect software vulnerabilities. To detect vulnerabilities such as buffer overflows in compiled code, this research investigates the application of unidirectional transformer-based embeddings, specifically G PT-2. Using a dataset of LLVM functions, we trained a GPT-2 model to generate embeddings, which were subsequently used to build LSTM neural networks to differentiate between vulnerable and non-vulnerable code. Our study reveals that embed dings from the GPT-2 model significantly outperform those from bidirectional models of BERT and RoBERTa, achieving an accuracy of 92.5% and an F1-score of 89.7%. LSTM neural networks were developed with both frozen and unfrozen embedding model layers. The model with the highest performance was achieved when the embedding layers were unfrozen. Further, the research finds that, in exploring the impact of different optimizers within this domain, the SGD optimizer demonstrates superior performance over Adam. Overall, these findings reveal important insights into the potential of unidirectional transformer-based approaches in enhancing cybersecurity defenses.
AbstractList Ransomware and other forms of malware cause significant financial and operational damage to organizations by exploiting long-standing and often difficult-to-detect software vulnerabilities. To detect vulnerabilities such as buffer overflows in compiled code, this research investigates the application of unidirectional transformer-based embeddings, specifically G PT-2. Using a dataset of LLVM functions, we trained a GPT-2 model to generate embeddings, which were subsequently used to build LSTM neural networks to differentiate between vulnerable and non-vulnerable code. Our study reveals that embed dings from the GPT-2 model significantly outperform those from bidirectional models of BERT and RoBERTa, achieving an accuracy of 92.5% and an F1-score of 89.7%. LSTM neural networks were developed with both frozen and unfrozen embedding model layers. The model with the highest performance was achieved when the embedding layers were unfrozen. Further, the research finds that, in exploring the impact of different optimizers within this domain, the SGD optimizer demonstrates superior performance over Adam. Overall, these findings reveal important insights into the potential of unidirectional transformer-based approaches in enhancing cybersecurity defenses.
Author Fortier, Adam
Hastings, John D.
Xu, Shengjie
McCully, Gary A.
Author_xml – sequence: 1
  givenname: Gary A.
  surname: McCully
  fullname: McCully, Gary A.
  email: gary.mccully@ieee.org
  organization: Dakota State University,Madison,SD,USA
– sequence: 2
  givenname: John D.
  surname: Hastings
  fullname: Hastings, John D.
  email: john.hastings@dsu.edu
  organization: Dakota State University,Madison,SD,USA
– sequence: 3
  givenname: Shengjie
  surname: Xu
  fullname: Xu, Shengjie
  email: sjxu@arizona.edu
  organization: University of Arizona,Tucson,AZ,USA
– sequence: 4
  givenname: Adam
  surname: Fortier
  fullname: Fortier, Adam
  email: afortier8@gatech.edu
  organization: Georgia Institute of Technology,Atlanta,GA,USA
BookMark eNpVkMtOwzAQRQ0CiVL6Byz8AaT4Ucf2ElIKlYpYtMCycuwxGpQ6lRMq8fe0PBaszsyV7pFmzslJahMQQjkbc87s9Xw5nS1LJUs5FkyoQ8gPwxEZWW2NlFwxa-XkmAyELk3BjeRnZNR174wxybUumRqQvmo3W5cxvdHnhAEz-B7b5Jorevt_dSnQ1zYHsQNPH9sATUdjm-kUO9_u4Fvx8tEkyK7GBnuEjmKiBz82EOgCY79HtW9ekNPomg5GvxyS1exuVT0Ui6f7eXWzKNDKvtjfEJ2ZMOM1RAOTGjgY4YK0WpkYorY-lNrz6JyywJVxtZDWO6tEHXQNckguf7QIAOttxo3Ln-u_P8kvotNhZQ
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ISDFS65363.2025.11012025
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Social Welfare & Social Work
EISBN 9798331509934
EISSN 2768-1831
EndPage 6
ExternalDocumentID 11012025
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-i93t-983fa8408c7ef8e4be1e82ad39758fdf79cd67c1faa59e158ab239ca952bd7be3
IEDL.DBID RIE
IngestDate Wed Aug 27 01:48:11 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-983fa8408c7ef8e4be1e82ad39758fdf79cd67c1faa59e158ab239ca952bd7be3
PageCount 6
ParticipantIDs ieee_primary_11012025
PublicationCentury 2000
PublicationDate 2025-April-24
PublicationDateYYYYMMDD 2025-04-24
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-April-24
  day: 24
PublicationDecade 2020
PublicationTitle Proceedings (International Symposium on Digital Forensic and Security. Online)
PublicationTitleAbbrev ISDFS
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003177605
Score 1.9140848
Snippet Ransomware and other forms of malware cause significant financial and operational damage to organizations by exploiting long-standing and often...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Accuracy
Bidirectional control
Binary Security
Buffer Over-flows
Codes
Encoding
GPT-2
Long short term memory
Machine Learning
Neural networks
Organizations
Recurrent neural networks
Training
Transformers
Unidirectional Encoders
Title Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code
URI https://ieeexplore.ieee.org/document/11012025
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA62ePDkq-KjSg7iyW27zyRXW4tCKQWL9layyUQWy1a22_5-M9lupQcPnrIZmBCykPlmMt8MIfcsEQbdAA-xsBeBijyZmsTjqbHXoVZKG1cyf8TGYz6bicmWrO64MADgks-gg5_uLV8v1RpDZV0fi1FZI90gDcaSiqy1C6hYQ8gsNq-zdXqi-_o2wPpBYRJ2UKtTq-81UnF2ZHj8zx2ckNYvI49OdrbmlBxAfkbaFbeWfsDCyALoA60Fy-LrnJT9qsdg_kktsqxslwv8PdKn_anMNeroYAOKYne0xYpaMEsH2Uphiicu8b5eYIFql0trvWua5RTXt5eKpqPMWOBq5xpaZDp8nvZfvG2TBS8TYekJHhppnTyuGBgOUQo-8EBqC1NibrRhQumEKd9IGQvwYy7TIBRKijhINUshvCDNfJnDJaGRCKTPoGeFLDJRIiXydpVhUQIiTMUVaeGBzr-rMhrz-iyv_5DfkCMc8OkmiNqkWRZruCWHalNmq-LO_fwfnWux5Q
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDI5gIMGJ1xCPATkgTnRb27RprmxMmyjTJCbYbUoTB1VMHeoev5-43YZ24MAtsRQriqL4s-PPJuSeh8KgG-AgFnYYKObIxIROlBj7HGqltClK5se8349GIzFYkdULLgwAFMlnUMdh8Zevp2qBobKGi8WorJHeJXsBY16zpGttQirWFHKLztf5Ok3R6L21sYKQH_p1XFdfK9hqpVJYks7RP_dwTKq_nDw62FibE7ID2Smplexa-gETI3OgD3QtmOZfZ2TeKrsMZp_UYsvSehWhv0f6tD2VmcY12luCotgfbTKjFs7SdjpTmOSJKt4XEyxRXWTTWv-aphlF_fZZ0TROjYWudq6hSoad52Gr66zaLDip8OeOiHwjrZsXKQ4mApaAC5EntQUqQWS04ULpkCvXSBkIcINIJp4vlBSBl2iegH9OKtk0gwtCmfCky6FphZwZFkqJzF1lOAtB-Im4JFU80PF3WUhjvD7Lqz_kd-SgO3yNx3Gv_3JNDlGEHzkeq5HKPF_ADdlXy3k6y2-Li_ADCFK1LA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28International+Symposium+on+Digital+Forensic+and+Security.+Online%29&rft.atitle=Comparing+Unidirectional%2C+Bidirectional%2C+and+Word2vec+Models+for+Discovering+Vulnerabilities+in+Compiled+Lifted+Code&rft.au=McCully%2C+Gary+A.&rft.au=Hastings%2C+John+D.&rft.au=Xu%2C+Shengjie&rft.au=Fortier%2C+Adam&rft.date=2025-04-24&rft.pub=IEEE&rft.eissn=2768-1831&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FISDFS65363.2025.11012025&rft.externalDocID=11012025