IRFuzzer: Specialized Fuzzing for LLVM Backend Code Generation

Modern compilers, such as LLVM, are complex. Due to their complexity, manual testing is unlikely to suffice, yet formal verification is difficult to scale. End-to-end fuzzing can be used, but it has difficulties in discovering LLVM backend problems for two reasons. First, frontend preprocessing and...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings / International Conference on Software Engineering pp. 1986 - 1998
Main Authors: Rong, Yuyang, Yu, Zhanghan, Weng, Zhenkai, Neuendorffer, Stephen, Chen, Hao
Format: Conference Proceeding
Language:English
Published: IEEE 26.04.2025
Subjects:
ISSN:1558-1225
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Modern compilers, such as LLVM, are complex. Due to their complexity, manual testing is unlikely to suffice, yet formal verification is difficult to scale. End-to-end fuzzing can be used, but it has difficulties in discovering LLVM backend problems for two reasons. First, frontend preprocessing and middle optimization shield the backend from seeing diverse inputs. Second, branch coverage cannot provide effective feedback as LLVM backend contains much reusable code. In this paper, we implement IRFuzzer to investigate the need of specialized fuzzing of the LLVM compiler backend. We focus on two approaches to improve the fuzzer: guaranteed input validity using constrained mutations to improve input diversity and new metrics to improve feedback quality. The mutator in IRFuzzer can generate a wide range of LLVM IR inputs, including structured control flow, vector types, and function definitions. The system instruments coding patterns in the compiler to monitor the execution status of instruction selection. The instrumentation not only provides new coverage feedback on the matcher table but also guides the mutator on architecture-specific intrinsics. We ran IRFuzzer on 29 mature LLVM backend targets. IRFuzzer discovered 78 new, confirmed bugs in LLVM upstream, none of which existing fuzzers could discover. This demonstrates that IRFuzzer is far more effective than existing fuzzers. Upon receiving our bug report, the developers have fixed 57 bugs and back-ported five fixes to LLVM 15, which shows that specialized fuzzing provides actionable insights to LLVM developers.
AbstractList Modern compilers, such as LLVM, are complex. Due to their complexity, manual testing is unlikely to suffice, yet formal verification is difficult to scale. End-to-end fuzzing can be used, but it has difficulties in discovering LLVM backend problems for two reasons. First, frontend preprocessing and middle optimization shield the backend from seeing diverse inputs. Second, branch coverage cannot provide effective feedback as LLVM backend contains much reusable code. In this paper, we implement IRFuzzer to investigate the need of specialized fuzzing of the LLVM compiler backend. We focus on two approaches to improve the fuzzer: guaranteed input validity using constrained mutations to improve input diversity and new metrics to improve feedback quality. The mutator in IRFuzzer can generate a wide range of LLVM IR inputs, including structured control flow, vector types, and function definitions. The system instruments coding patterns in the compiler to monitor the execution status of instruction selection. The instrumentation not only provides new coverage feedback on the matcher table but also guides the mutator on architecture-specific intrinsics. We ran IRFuzzer on 29 mature LLVM backend targets. IRFuzzer discovered 78 new, confirmed bugs in LLVM upstream, none of which existing fuzzers could discover. This demonstrates that IRFuzzer is far more effective than existing fuzzers. Upon receiving our bug report, the developers have fixed 57 bugs and back-ported five fixes to LLVM 15, which shows that specialized fuzzing provides actionable insights to LLVM developers.
Author Yu, Zhanghan
Weng, Zhenkai
Neuendorffer, Stephen
Rong, Yuyang
Chen, Hao
Author_xml – sequence: 1
  givenname: Yuyang
  surname: Rong
  fullname: Rong, Yuyang
  email: PeterRong96@gmail.com
  organization: Advanced Micro Devices, Inc
– sequence: 2
  givenname: Zhanghan
  surname: Yu
  fullname: Yu, Zhanghan
  email: hnryu@ucdavis.edu
  organization: University of California,Davis
– sequence: 3
  givenname: Zhenkai
  surname: Weng
  fullname: Weng, Zhenkai
  email: zweng@ucdavis.edu
  organization: University of California,Davis
– sequence: 4
  givenname: Stephen
  surname: Neuendorffer
  fullname: Neuendorffer, Stephen
  email: stephen.neuendorffer@amd.com
  organization: Advanced Micro Devices, Inc
– sequence: 5
  givenname: Hao
  surname: Chen
  fullname: Chen, Hao
  email: chen@ucdavis.edu
  organization: University of California,Davis
BookMark eNotkMtOwkAUQEejiYD8AYv5geK9M52XCxNtAElqTETdktvOHVPFlhRcwNer0dVJzuIszlCctV3LQkwQpogQrpbFamaMzt1UgTJTANRwIsbBBa81GjA24KkYoDE-Q6XMhRjudu8AYPMQBuJm-TT_Oh65v5arLdcNbZojR_nrmvZNpq6XZfn6IO-o_uA2yqKLLBfcck_7pmsvxXmizY7H_xyJl_nsubjPysfFsrgtM1IW9pl2nFNlbeVjCCnUDiMTebYW2LuUKJlALleMDoOlqKNB8ibmVbJgPeqRmPx1G2Zeb_vmk_rD-meACs4p_Q11-kqr
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICSE55347.2025.00130
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798331505691
EISSN 1558-1225
EndPage 1998
ExternalDocumentID 11029772
Genre orig-research
GroupedDBID -~X
.4S
.DC
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
FEDTE
I-F
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-a260t-37e4ab66b8d99f9c71deaa8e660e87ffaf59a742e17196ad3d51a85d4bf606813
IEDL.DBID RIE
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001538318100155&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 01:40:13 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a260t-37e4ab66b8d99f9c71deaa8e660e87ffaf59a742e17196ad3d51a85d4bf606813
PageCount 13
ParticipantIDs ieee_primary_11029772
PublicationCentury 2000
PublicationDate 2025-April-26
PublicationDateYYYYMMDD 2025-04-26
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-April-26
  day: 26
PublicationDecade 2020
PublicationTitle Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev ICSE
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0006499
Score 2.2967312
Snippet Modern compilers, such as LLVM, are complex. Due to their complexity, manual testing is unlikely to suffice, yet formal verification is difficult to scale....
SourceID ieee
SourceType Publisher
StartPage 1986
SubjectTerms Codes
Computer bugs
Fuzzing
Instruments
LLVM
Measurement
Optimization
Semantics
Software
software analysis
Testing
Vectors
Title IRFuzzer: Specialized Fuzzing for LLVM Backend Code Generation
URI https://ieeexplore.ieee.org/document/11029772
WOSCitedRecordID wos001538318100155&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG6UePCED4zv9OB1Zcv26cGDBCIJEuIr3Ei3nSZcFoOLB36902WBkwcvTdOkbTrNvDr9Zgi50wGVmMtkAuBCwkUeEpOm2LDUGc21U7ZKmT9Uo5GeTMy4BqtXWBgAqD6fwX3sVrF8P3fL-FTWRlXVQXsFJe6-UnIN1tqKXYm2e42NY6lpD7pvPSEyrtAH7MR3E1Z9dN5VUKkUSL_5z62PSGsHxaPjrZI5JntQnJDmphYDrVnzlDwOXvvL1QoWD7QuKT9bgadxDOdRNE3pcPj5Qp8ssm3haXfuga5zTseraZGPfu-9-5zUtRESix5IiXIBuM2lzLU3JhinmAdrNUiZglYh2CCMRbcXmEIesz7zglktPM8DuiyaZWekUcwLOCcUBHeZlXgQtI2cVLnjwvK4PlM6t-yCtCI9pl_r9BfTDSku_xi_IoeR5DHk0pHXpFEulnBDDtxPOfte3FaX9gs1IZgY
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4MmugJHxjf9uB1ZbvbdrsePEggEBdCFA030m2nCZfFIHjg1ztdFjh58NI0Tdqm08yr02-GkAflUImZWAYAxgVc5C5IwxAbFppUcWUSXabMz5LBQI3H6bACq5dYGAAoP5_Bo--WsXw7M0v_VNZEVRWhvYISd19wHoVruNZW8Eq03it0HAvTZq_13hYi5gl6gZF_OWHlV-ddDZVShXTq_9z8mDR2YDw63KqZE7IHxSmpb6ox0Io5z8hz762zXK1g_kSrovLTFVjqx3AeReOUZtlnn75oZNzC0tbMAl1nnfaX0yAfnfao1Q2q6giBRh9kgZIBuM6lzJVNU5eahFnQWoGUIajEOe1EqtHxBZYgl2kbW8G0EpbnDp0WxeJzUitmBVwQCoKbWEs8CFpHRia54UJzvz5LVK7ZJWl4eky-1gkwJhtSXP0xfk8Ou6N-Nsl6g9drcuTJ7wMwkbwhtcV8CbfkwPwspt_zu_ICfwFP2Jtf
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=IRFuzzer%3A+Specialized+Fuzzing+for+LLVM+Backend+Code+Generation&rft.au=Rong%2C+Yuyang&rft.au=Yu%2C+Zhanghan&rft.au=Weng%2C+Zhenkai&rft.au=Neuendorffer%2C+Stephen&rft.date=2025-04-26&rft.pub=IEEE&rft.eissn=1558-1225&rft.spage=1986&rft.epage=1998&rft_id=info:doi/10.1109%2FICSE55347.2025.00130&rft.externalDocID=11029772