SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly

Decompilation is a well-studied area with numerous high-quality tools available. These are frequently used for security tasks and to port legacy code. However, they regularly generate difficult-to-read programs and require a large amount of engineering effort to support new programming languages and...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings / International Symposium on Code Generation and Optimization pp. 67 - 80
Main Authors: Armengol-Estape, Jordi, Woodruff, Jackson, Cummins, Chris, O'Boyle, Michael F.P.
Format: Conference Proceeding
Language:English
Published: IEEE 02.03.2024
Subjects:
ISSN:2643-2838
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Decompilation is a well-studied area with numerous high-quality tools available. These are frequently used for security tasks and to port legacy code. However, they regularly generate difficult-to-read programs and require a large amount of engineering effort to support new programming languages and ISAs. Recent interest in neural approaches has produced portable tools that generate readable code. Nevertheless, to-date such techniques are usually restricted to synthetic programs without optimization, and no models have evaluated their portability. Furthermore, while the code generated may be more readable, it is usually incorrect. This paper presents SLaDe, a Small Language model Decompiler based on a sequence-to-sequence Transformer trained over real-world code and augmented with a type inference engine. We utilize a novel tokenizer, dropout-free regularization, and type inference to generate programs that are more readable and accurate than standard analytic and recent neural approaches. Unlike standard approaches, SLaDe can infer out-of-context types and unlike neural approaches, it generates correct code. We evaluate SLaDe on over 4,000 ExeBench functions on two ISAs and at two optimization levels. SLaDe is up to 6× more accurate than Ghidra, a state-of-the-art, industrial-strength decompiler and up to 4× more accurate than the large language model ChatGPT and generates significantly more readable code than both.
AbstractList Decompilation is a well-studied area with numerous high-quality tools available. These are frequently used for security tasks and to port legacy code. However, they regularly generate difficult-to-read programs and require a large amount of engineering effort to support new programming languages and ISAs. Recent interest in neural approaches has produced portable tools that generate readable code. Nevertheless, to-date such techniques are usually restricted to synthetic programs without optimization, and no models have evaluated their portability. Furthermore, while the code generated may be more readable, it is usually incorrect. This paper presents SLaDe, a Small Language model Decompiler based on a sequence-to-sequence Transformer trained over real-world code and augmented with a type inference engine. We utilize a novel tokenizer, dropout-free regularization, and type inference to generate programs that are more readable and accurate than standard analytic and recent neural approaches. Unlike standard approaches, SLaDe can infer out-of-context types and unlike neural approaches, it generates correct code. We evaluate SLaDe on over 4,000 ExeBench functions on two ISAs and at two optimization levels. SLaDe is up to 6× more accurate than Ghidra, a state-of-the-art, industrial-strength decompiler and up to 4× more accurate than the large language model ChatGPT and generates significantly more readable code than both.
Author Cummins, Chris
Woodruff, Jackson
O'Boyle, Michael F.P.
Armengol-Estape, Jordi
Author_xml – sequence: 1
  givenname: Jordi
  surname: Armengol-Estape
  fullname: Armengol-Estape, Jordi
  email: jordi.armengol.estape@ed.ac.uk
  organization: School of Informatics, University of Edinburgh,Edinburgh,United Kingdom
– sequence: 2
  givenname: Jackson
  surname: Woodruff
  fullname: Woodruff, Jackson
  email: j.c.woodruff@sms.ed.ac.uk
  organization: School of Informatics, University of Edinburgh,Edinburgh,United Kingdom
– sequence: 3
  givenname: Chris
  surname: Cummins
  fullname: Cummins, Chris
  email: cummins@fb.com
  organization: Meta AI Research,Menlo Park,CA,USA
– sequence: 4
  givenname: Michael F.P.
  surname: O'Boyle
  fullname: O'Boyle, Michael F.P.
  email: mob@inf.ed.ac.uk
  organization: School of Informatics, University of Edinburgh,Edinburgh,United Kingdom
BookMark eNo1j9FOwjAUQKvRRED-wJj-wPC2t11b3wgomMxggj6TdrslMx0j23zAr5dEfTpvJ-eM2dWhPRBj9wJmQoB7WKw22uQIMwlSzQQopYy1F2zqjLOoAZ0G5y7ZSOYKM2nR3rBx338CSKMEjth6W_glPfI5f2u7wYdEfNv4lHjhD_svvyf-2laU-JLKtjnWiToe245vjkPd1N9U8XnfUxPS6ZZdR596mv5xwj6en94X66zYrF4W8yLz0sGQodIxxOCtJiNBlXnllCk1lggOfAxKVOd0BDB5HkGHKocgzrFQSRsBNU7Y3a-3JqLdsasb3512_-f4A-M9TV8
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CGO57630.2024.10444788
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350395099
EISSN 2643-2838
EndPage 80
ExternalDocumentID 10444788
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IK
6IL
6IN
AAJGR
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-a290t-345fbfba85e7204c6d947c53c3090afb41d979300766f05bd60b10270d28f0353
IEDL.DBID RIE
ISICitedReferencesCount 5
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001179185400006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:08:26 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a290t-345fbfba85e7204c6d947c53c3090afb41d979300766f05bd60b10270d28f0353
PageCount 14
ParticipantIDs ieee_primary_10444788
PublicationCentury 2000
PublicationDate 2024-March-2
PublicationDateYYYYMMDD 2024-03-02
PublicationDate_xml – month: 03
  year: 2024
  text: 2024-March-2
  day: 02
PublicationDecade 2020
PublicationTitle Proceedings / International Symposium on Code Generation and Optimization
PublicationTitleAbbrev CGO
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0027413
ssib057256076
Score 2.354587
Snippet Decompilation is a well-studied area with numerous high-quality tools available. These are frequently used for security tasks and to port legacy code. However,...
SourceID ieee
SourceType Publisher
StartPage 67
SubjectTerms Codes
decompilation
Engines
language models
neural decompilation
Optimization
Security
Task analysis
Transformer
Transformers
type inference
Title SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly
URI https://ieeexplore.ieee.org/document/10444788
WOSCitedRecordID wos001179185400006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG6EePCED4zv9OB1sbtt9-HNgMiBAAlquJE-ZhOSBQyCif56O2UX48GDt2YPTTPb6TftzPcNIbdu25jY8DwAF7sGIlQ6UNrmARNgHXymsfHpgtd-Mhikk0k2KsnqngsDAL74DFo49Ll8uzQbfCpzHi4Eyr3XSC1J4i1Zq9o8MkHwRmzd3bZCXlKCQ5bdtZ-GLrTmzF0JI9GqZvrVU8VDSrfxz8UckuYPOY-OdrBzRPZgcUwaVXcGWjrrCemN-6oD9_SB-mpRXQAdz1VR0H75REmxD1pBO4Bl5e5wWFEXwNKhO0Pmsy-wFPPBc118NslL9_G53QvKvgmBijK2DriQuc61SiVgCxoT20wkRnLDWcZUrkVoM-eWmISLcya1jZl2cUbCbJTmjEt-SuqL5QLOCI2sRMk8rgRKOkCsGLdu4GaUKCanz0kTLTN920pjTCujXPzx_ZIcoP19EVd0Rerr1Qauyb75WM_eVzf-h34D9i-dJw
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4UTfSEDwy-e_C62N22-_BmQMS4AglouJG-NiFZwCCY6K-3U3cxHjx4a3pomrbTb9qZ-T6EruyxUaGimWes7-oxX0hPSJ15hBlt4TMOlQsXvKRRtxuPRkm_KFZ3tTDGGJd8ZhrQdLF8PVcr-CqzFs4Y0L1voi2QzirKtcrjwyOAb0DX9XvLp0VRsE-S6-Z9zzrXlNhHYcAa5Vi_VFUcqLSr_5zOHqr9lOfh_hp49tGGmR2gaqnPgAtzPUSdQSpa5gbfYpcvKnODB1OR5zgtPikxKKHluGUgsdxeDwtsXVjcs7fIdPJpNIaI8FTmHzX03L4bNjteoZzgiSAhS48ynslMipgbEKFRoU5YpDhVlCREZJL5OrGGCWG4MCNc6pBI62lERAdxRiinR6gym89MHeFAcyDNo4IBqYMJBaHaNuyIHOjk5DGqwcqMX7_JMcblopz80X-JdjrDp3ScPnQfT9Eu7IVL6QrOUGW5WJlztK3el5O3xYXb3C9cRqBw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Symposium+on+Code+Generation+and+Optimization&rft.atitle=SLaDe%3A+A+Portable+Small+Language+Model+Decompiler+for+Optimized+Assembly&rft.au=Armengol-Estape%2C+Jordi&rft.au=Woodruff%2C+Jackson&rft.au=Cummins%2C+Chris&rft.au=O%27Boyle%2C+Michael+F.P.&rft.date=2024-03-02&rft.pub=IEEE&rft.eissn=2643-2838&rft.spage=67&rft.epage=80&rft_id=info:doi/10.1109%2FCGO57630.2024.10444788&rft.externalDocID=10444788