IRFuzzer: Specialized Fuzzing for LLVM Backend Code Generation

Modern compilers, such as LLVM, are complex. Due to their complexity, manual testing is unlikely to suffice, yet formal verification is difficult to scale. End-to-end fuzzing can be used, but it has difficulties in discovering LLVM backend problems for two reasons. First, frontend preprocessing and...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings / International Conference on Software Engineering s. 1986 - 1998
Hlavní autoři:	Rong, Yuyang, Yu, Zhanghan, Weng, Zhenkai, Neuendorffer, Stephen, Chen, Hao
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 26.04.2025
Témata:	Codes Computer bugs Fuzzing Instruments LLVM Measurement Optimization Semantics Software software analysis Testing Vectors
ISSN:	1558-1225
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Modern compilers, such as LLVM, are complex. Due to their complexity, manual testing is unlikely to suffice, yet formal verification is difficult to scale. End-to-end fuzzing can be used, but it has difficulties in discovering LLVM backend problems for two reasons. First, frontend preprocessing and middle optimization shield the backend from seeing diverse inputs. Second, branch coverage cannot provide effective feedback as LLVM backend contains much reusable code. In this paper, we implement IRFuzzer to investigate the need of specialized fuzzing of the LLVM compiler backend. We focus on two approaches to improve the fuzzer: guaranteed input validity using constrained mutations to improve input diversity and new metrics to improve feedback quality. The mutator in IRFuzzer can generate a wide range of LLVM IR inputs, including structured control flow, vector types, and function definitions. The system instruments coding patterns in the compiler to monitor the execution status of instruction selection. The instrumentation not only provides new coverage feedback on the matcher table but also guides the mutator on architecture-specific intrinsics. We ran IRFuzzer on 29 mature LLVM backend targets. IRFuzzer discovered 78 new, confirmed bugs in LLVM upstream, none of which existing fuzzers could discover. This demonstrates that IRFuzzer is far more effective than existing fuzzers. Upon receiving our bug report, the developers have fixed 57 bugs and back-ported five fixes to LLVM 15, which shows that specialized fuzzing provides actionable insights to LLVM developers.
ISSN:	1558-1225
DOI:	10.1109/ICSE55347.2025.00130