SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly
Decompilation is a well-studied area with numerous high-quality tools available. These are frequently used for security tasks and to port legacy code. However, they regularly generate difficult-to-read programs and require a large amount of engineering effort to support new programming languages and...
Saved in:
| Published in: | Proceedings / International Symposium on Code Generation and Optimization pp. 67 - 80 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
02.03.2024
|
| Subjects: | |
| ISSN: | 2643-2838 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Decompilation is a well-studied area with numerous high-quality tools available. These are frequently used for security tasks and to port legacy code. However, they regularly generate difficult-to-read programs and require a large amount of engineering effort to support new programming languages and ISAs. Recent interest in neural approaches has produced portable tools that generate readable code. Nevertheless, to-date such techniques are usually restricted to synthetic programs without optimization, and no models have evaluated their portability. Furthermore, while the code generated may be more readable, it is usually incorrect. This paper presents SLaDe, a Small Language model Decompiler based on a sequence-to-sequence Transformer trained over real-world code and augmented with a type inference engine. We utilize a novel tokenizer, dropout-free regularization, and type inference to generate programs that are more readable and accurate than standard analytic and recent neural approaches. Unlike standard approaches, SLaDe can infer out-of-context types and unlike neural approaches, it generates correct code. We evaluate SLaDe on over 4,000 ExeBench functions on two ISAs and at two optimization levels. SLaDe is up to 6× more accurate than Ghidra, a state-of-the-art, industrial-strength decompiler and up to 4× more accurate than the large language model ChatGPT and generates significantly more readable code than both. |
|---|---|
| AbstractList | Decompilation is a well-studied area with numerous high-quality tools available. These are frequently used for security tasks and to port legacy code. However, they regularly generate difficult-to-read programs and require a large amount of engineering effort to support new programming languages and ISAs. Recent interest in neural approaches has produced portable tools that generate readable code. Nevertheless, to-date such techniques are usually restricted to synthetic programs without optimization, and no models have evaluated their portability. Furthermore, while the code generated may be more readable, it is usually incorrect. This paper presents SLaDe, a Small Language model Decompiler based on a sequence-to-sequence Transformer trained over real-world code and augmented with a type inference engine. We utilize a novel tokenizer, dropout-free regularization, and type inference to generate programs that are more readable and accurate than standard analytic and recent neural approaches. Unlike standard approaches, SLaDe can infer out-of-context types and unlike neural approaches, it generates correct code. We evaluate SLaDe on over 4,000 ExeBench functions on two ISAs and at two optimization levels. SLaDe is up to 6× more accurate than Ghidra, a state-of-the-art, industrial-strength decompiler and up to 4× more accurate than the large language model ChatGPT and generates significantly more readable code than both. |
| Author | Cummins, Chris Woodruff, Jackson O'Boyle, Michael F.P. Armengol-Estape, Jordi |
| Author_xml | – sequence: 1 givenname: Jordi surname: Armengol-Estape fullname: Armengol-Estape, Jordi email: jordi.armengol.estape@ed.ac.uk organization: School of Informatics, University of Edinburgh,Edinburgh,United Kingdom – sequence: 2 givenname: Jackson surname: Woodruff fullname: Woodruff, Jackson email: j.c.woodruff@sms.ed.ac.uk organization: School of Informatics, University of Edinburgh,Edinburgh,United Kingdom – sequence: 3 givenname: Chris surname: Cummins fullname: Cummins, Chris email: cummins@fb.com organization: Meta AI Research,Menlo Park,CA,USA – sequence: 4 givenname: Michael F.P. surname: O'Boyle fullname: O'Boyle, Michael F.P. email: mob@inf.ed.ac.uk organization: School of Informatics, University of Edinburgh,Edinburgh,United Kingdom |
| BookMark | eNo1j9FOwjAUQKvRRED-wJj-wPC2t11b3wgomMxggj6TdrslMx0j23zAr5dEfTpvJ-eM2dWhPRBj9wJmQoB7WKw22uQIMwlSzQQopYy1F2zqjLOoAZ0G5y7ZSOYKM2nR3rBx338CSKMEjth6W_glPfI5f2u7wYdEfNv4lHjhD_svvyf-2laU-JLKtjnWiToe245vjkPd1N9U8XnfUxPS6ZZdR596mv5xwj6en94X66zYrF4W8yLz0sGQodIxxOCtJiNBlXnllCk1lggOfAxKVOd0BDB5HkGHKocgzrFQSRsBNU7Y3a-3JqLdsasb3512_-f4A-M9TV8 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/CGO57630.2024.10444788 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798350395099 |
| EISSN | 2643-2838 |
| EndPage | 80 |
| ExternalDocumentID | 10444788 |
| Genre | orig-research |
| GroupedDBID | 29O 6IE 6IF 6IK 6IL 6IN AAJGR ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL |
| ID | FETCH-LOGICAL-a290t-345fbfba85e7204c6d947c53c3090afb41d979300766f05bd60b10270d28f0353 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 5 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001179185400006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:08:26 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a290t-345fbfba85e7204c6d947c53c3090afb41d979300766f05bd60b10270d28f0353 |
| PageCount | 14 |
| ParticipantIDs | ieee_primary_10444788 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-March-2 |
| PublicationDateYYYYMMDD | 2024-03-02 |
| PublicationDate_xml | – month: 03 year: 2024 text: 2024-March-2 day: 02 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings / International Symposium on Code Generation and Optimization |
| PublicationTitleAbbrev | CGO |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0027413 ssib057256076 |
| Score | 2.354587 |
| Snippet | Decompilation is a well-studied area with numerous high-quality tools available. These are frequently used for security tasks and to port legacy code. However,... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 67 |
| SubjectTerms | Codes decompilation Engines language models neural decompilation Optimization Security Task analysis Transformer Transformers type inference |
| Title | SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly |
| URI | https://ieeexplore.ieee.org/document/10444788 |
| WOSCitedRecordID | wos001179185400006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG6EePCED4zv9OB1sbtt9-HNgMiBAAlquJE-ZhOSBQyCif56O2UX48GDt2YPTTPb6TftzPcNIbdu25jY8DwAF7sGIlQ6UNrmARNgHXymsfHpgtd-Mhikk0k2KsnqngsDAL74DFo49Ll8uzQbfCpzHi4Eyr3XSC1J4i1Zq9o8MkHwRmzd3bZCXlKCQ5bdtZ-GLrTmzF0JI9GqZvrVU8VDSrfxz8UckuYPOY-OdrBzRPZgcUwaVXcGWjrrCemN-6oD9_SB-mpRXQAdz1VR0H75REmxD1pBO4Bl5e5wWFEXwNKhO0Pmsy-wFPPBc118NslL9_G53QvKvgmBijK2DriQuc61SiVgCxoT20wkRnLDWcZUrkVoM-eWmISLcya1jZl2cUbCbJTmjEt-SuqL5QLOCI2sRMk8rgRKOkCsGLdu4GaUKCanz0kTLTN920pjTCujXPzx_ZIcoP19EVd0Rerr1Qauyb75WM_eVzf-h34D9i-dJw |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4UTfSEDwy-e_C62N22-_BmQMS4AglouJG-NiFZwCCY6K-3U3cxHjx4a3pomrbTb9qZ-T6EruyxUaGimWes7-oxX0hPSJ15hBlt4TMOlQsXvKRRtxuPRkm_KFZ3tTDGGJd8ZhrQdLF8PVcr-CqzFs4Y0L1voi2QzirKtcrjwyOAb0DX9XvLp0VRsE-S6-Z9zzrXlNhHYcAa5Vi_VFUcqLSr_5zOHqr9lOfh_hp49tGGmR2gaqnPgAtzPUSdQSpa5gbfYpcvKnODB1OR5zgtPikxKKHluGUgsdxeDwtsXVjcs7fIdPJpNIaI8FTmHzX03L4bNjteoZzgiSAhS48ynslMipgbEKFRoU5YpDhVlCREZJL5OrGGCWG4MCNc6pBI62lERAdxRiinR6gym89MHeFAcyDNo4IBqYMJBaHaNuyIHOjk5DGqwcqMX7_JMcblopz80X-JdjrDp3ScPnQfT9Eu7IVL6QrOUGW5WJlztK3el5O3xYXb3C9cRqBw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Symposium+on+Code+Generation+and+Optimization&rft.atitle=SLaDe%3A+A+Portable+Small+Language+Model+Decompiler+for+Optimized+Assembly&rft.au=Armengol-Estape%2C+Jordi&rft.au=Woodruff%2C+Jackson&rft.au=Cummins%2C+Chris&rft.au=O%27Boyle%2C+Michael+F.P.&rft.date=2024-03-02&rft.pub=IEEE&rft.eissn=2643-2838&rft.spage=67&rft.epage=80&rft_id=info:doi/10.1109%2FCGO57630.2024.10444788&rft.externalDocID=10444788 |