Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code
Ransomware and other forms of malware cause significant financial and operational damage to organizations by exploiting long-standing and often difficult-to-detect software vulnerabilities. To detect vulnerabilities such as buffer overflows in compiled code, this research investigates the applicatio...
Uloženo v:
| Vydáno v: | Proceedings (International Symposium on Digital Forensic and Security. Online) s. 1 - 6 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
24.04.2025
|
| Témata: | |
| ISSN: | 2768-1831 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Ransomware and other forms of malware cause significant financial and operational damage to organizations by exploiting long-standing and often difficult-to-detect software vulnerabilities. To detect vulnerabilities such as buffer overflows in compiled code, this research investigates the application of unidirectional transformer-based embeddings, specifically G PT-2. Using a dataset of LLVM functions, we trained a GPT-2 model to generate embeddings, which were subsequently used to build LSTM neural networks to differentiate between vulnerable and non-vulnerable code. Our study reveals that embed dings from the GPT-2 model significantly outperform those from bidirectional models of BERT and RoBERTa, achieving an accuracy of 92.5% and an F1-score of 89.7%. LSTM neural networks were developed with both frozen and unfrozen embedding model layers. The model with the highest performance was achieved when the embedding layers were unfrozen. Further, the research finds that, in exploring the impact of different optimizers within this domain, the SGD optimizer demonstrates superior performance over Adam. Overall, these findings reveal important insights into the potential of unidirectional transformer-based approaches in enhancing cybersecurity defenses. |
|---|---|
| AbstractList | Ransomware and other forms of malware cause significant financial and operational damage to organizations by exploiting long-standing and often difficult-to-detect software vulnerabilities. To detect vulnerabilities such as buffer overflows in compiled code, this research investigates the application of unidirectional transformer-based embeddings, specifically G PT-2. Using a dataset of LLVM functions, we trained a GPT-2 model to generate embeddings, which were subsequently used to build LSTM neural networks to differentiate between vulnerable and non-vulnerable code. Our study reveals that embed dings from the GPT-2 model significantly outperform those from bidirectional models of BERT and RoBERTa, achieving an accuracy of 92.5% and an F1-score of 89.7%. LSTM neural networks were developed with both frozen and unfrozen embedding model layers. The model with the highest performance was achieved when the embedding layers were unfrozen. Further, the research finds that, in exploring the impact of different optimizers within this domain, the SGD optimizer demonstrates superior performance over Adam. Overall, these findings reveal important insights into the potential of unidirectional transformer-based approaches in enhancing cybersecurity defenses. |
| Author | Fortier, Adam Hastings, John D. Xu, Shengjie McCully, Gary A. |
| Author_xml | – sequence: 1 givenname: Gary A. surname: McCully fullname: McCully, Gary A. email: gary.mccully@ieee.org organization: Dakota State University,Madison,SD,USA – sequence: 2 givenname: John D. surname: Hastings fullname: Hastings, John D. email: john.hastings@dsu.edu organization: Dakota State University,Madison,SD,USA – sequence: 3 givenname: Shengjie surname: Xu fullname: Xu, Shengjie email: sjxu@arizona.edu organization: University of Arizona,Tucson,AZ,USA – sequence: 4 givenname: Adam surname: Fortier fullname: Fortier, Adam email: afortier8@gatech.edu organization: Georgia Institute of Technology,Atlanta,GA,USA |
| BookMark | eNpVkMtOwzAQRQ0CiVL6Byz8AaT4Ucf2ElIKlYpYtMCycuwxGpQ6lRMq8fe0PBaszsyV7pFmzslJahMQQjkbc87s9Xw5nS1LJUs5FkyoQ8gPwxEZWW2NlFwxa-XkmAyELk3BjeRnZNR174wxybUumRqQvmo3W5cxvdHnhAEz-B7b5Jorevt_dSnQ1zYHsQNPH9sATUdjm-kUO9_u4Fvx8tEkyK7GBnuEjmKiBz82EOgCY79HtW9ekNPomg5GvxyS1exuVT0Ui6f7eXWzKNDKvtjfEJ2ZMOM1RAOTGjgY4YK0WpkYorY-lNrz6JyywJVxtZDWO6tEHXQNckguf7QIAOttxo3Ln-u_P8kvotNhZQ |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ISDFS65363.2025.11012025 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Social Welfare & Social Work |
| EISBN | 9798331509934 |
| EISSN | 2768-1831 |
| EndPage | 6 |
| ExternalDocumentID | 11012025 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-i93t-983fa8408c7ef8e4be1e82ad39758fdf79cd67c1faa59e158ab239ca952bd7be3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 01:48:11 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i93t-983fa8408c7ef8e4be1e82ad39758fdf79cd67c1faa59e158ab239ca952bd7be3 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_11012025 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-April-24 |
| PublicationDateYYYYMMDD | 2025-04-24 |
| PublicationDate_xml | – month: 04 year: 2025 text: 2025-April-24 day: 24 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (International Symposium on Digital Forensic and Security. Online) |
| PublicationTitleAbbrev | ISDFS |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003177605 |
| Score | 1.9140848 |
| Snippet | Ransomware and other forms of malware cause significant financial and operational damage to organizations by exploiting long-standing and often... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Accuracy Bidirectional control Binary Security Buffer Over-flows Codes Encoding GPT-2 Long short term memory Machine Learning Neural networks Organizations Recurrent neural networks Training Transformers Unidirectional Encoders |
| Title | Comparing Unidirectional, Bidirectional, and Word2vec Models for Discovering Vulnerabilities in Compiled Lifted Code |
| URI | https://ieeexplore.ieee.org/document/11012025 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA62ePDkq-KjSg7iyW27zyRXW4tCKQWL9layyUQWy1a22_5-M9lupQcPnrIZmBCykPlmMt8MIfcsEQbdAA-xsBeBijyZmsTjqbHXoVZKG1cyf8TGYz6bicmWrO64MADgks-gg5_uLV8v1RpDZV0fi1FZI90gDcaSiqy1C6hYQ8gsNq-zdXqi-_o2wPpBYRJ2UKtTq-81UnF2ZHj8zx2ckNYvI49OdrbmlBxAfkbaFbeWfsDCyALoA60Fy-LrnJT9qsdg_kktsqxslwv8PdKn_anMNeroYAOKYne0xYpaMEsH2Uphiicu8b5eYIFql0trvWua5RTXt5eKpqPMWOBq5xpaZDp8nvZfvG2TBS8TYekJHhppnTyuGBgOUQo-8EBqC1NibrRhQumEKd9IGQvwYy7TIBRKijhINUshvCDNfJnDJaGRCKTPoGeFLDJRIiXydpVhUQIiTMUVaeGBzr-rMhrz-iyv_5DfkCMc8OkmiNqkWRZruCWHalNmq-LO_fwfnWux5Q |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDI5gIMGJ1xCPATkgTnRb27RprmxMmyjTJCbYbUoTB1VMHeoev5-43YZ24MAtsRQriqL4s-PPJuSeh8KgG-AgFnYYKObIxIROlBj7HGqltClK5se8349GIzFYkdULLgwAFMlnUMdh8Zevp2qBobKGi8WorJHeJXsBY16zpGttQirWFHKLztf5Ok3R6L21sYKQH_p1XFdfK9hqpVJYks7RP_dwTKq_nDw62FibE7ID2Smplexa-gETI3OgD3QtmOZfZ2TeKrsMZp_UYsvSehWhv0f6tD2VmcY12luCotgfbTKjFs7SdjpTmOSJKt4XEyxRXWTTWv-aphlF_fZZ0TROjYWudq6hSoad52Gr66zaLDip8OeOiHwjrZsXKQ4mApaAC5EntQUqQWS04ULpkCvXSBkIcINIJp4vlBSBl2iegH9OKtk0gwtCmfCky6FphZwZFkqJzF1lOAtB-Im4JFU80PF3WUhjvD7Lqz_kd-SgO3yNx3Gv_3JNDlGEHzkeq5HKPF_ADdlXy3k6y2-Li_ADCFK1LA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28International+Symposium+on+Digital+Forensic+and+Security.+Online%29&rft.atitle=Comparing+Unidirectional%2C+Bidirectional%2C+and+Word2vec+Models+for+Discovering+Vulnerabilities+in+Compiled+Lifted+Code&rft.au=McCully%2C+Gary+A.&rft.au=Hastings%2C+John+D.&rft.au=Xu%2C+Shengjie&rft.au=Fortier%2C+Adam&rft.date=2025-04-24&rft.pub=IEEE&rft.eissn=2768-1831&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FISDFS65363.2025.11012025&rft.externalDocID=11012025 |