IRFuzzer: Specialized Fuzzing for LLVM Backend Code Generation
Modern compilers, such as LLVM, are complex. Due to their complexity, manual testing is unlikely to suffice, yet formal verification is difficult to scale. End-to-end fuzzing can be used, but it has difficulties in discovering LLVM backend problems for two reasons. First, frontend preprocessing and...
Uloženo v:
| Vydáno v: | Proceedings / International Conference on Software Engineering s. 1986 - 1998 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
26.04.2025
|
| Témata: | |
| ISSN: | 1558-1225 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Modern compilers, such as LLVM, are complex. Due to their complexity, manual testing is unlikely to suffice, yet formal verification is difficult to scale. End-to-end fuzzing can be used, but it has difficulties in discovering LLVM backend problems for two reasons. First, frontend preprocessing and middle optimization shield the backend from seeing diverse inputs. Second, branch coverage cannot provide effective feedback as LLVM backend contains much reusable code. In this paper, we implement IRFuzzer to investigate the need of specialized fuzzing of the LLVM compiler backend. We focus on two approaches to improve the fuzzer: guaranteed input validity using constrained mutations to improve input diversity and new metrics to improve feedback quality. The mutator in IRFuzzer can generate a wide range of LLVM IR inputs, including structured control flow, vector types, and function definitions. The system instruments coding patterns in the compiler to monitor the execution status of instruction selection. The instrumentation not only provides new coverage feedback on the matcher table but also guides the mutator on architecture-specific intrinsics. We ran IRFuzzer on 29 mature LLVM backend targets. IRFuzzer discovered 78 new, confirmed bugs in LLVM upstream, none of which existing fuzzers could discover. This demonstrates that IRFuzzer is far more effective than existing fuzzers. Upon receiving our bug report, the developers have fixed 57 bugs and back-ported five fixes to LLVM 15, which shows that specialized fuzzing provides actionable insights to LLVM developers. |
|---|---|
| AbstractList | Modern compilers, such as LLVM, are complex. Due to their complexity, manual testing is unlikely to suffice, yet formal verification is difficult to scale. End-to-end fuzzing can be used, but it has difficulties in discovering LLVM backend problems for two reasons. First, frontend preprocessing and middle optimization shield the backend from seeing diverse inputs. Second, branch coverage cannot provide effective feedback as LLVM backend contains much reusable code. In this paper, we implement IRFuzzer to investigate the need of specialized fuzzing of the LLVM compiler backend. We focus on two approaches to improve the fuzzer: guaranteed input validity using constrained mutations to improve input diversity and new metrics to improve feedback quality. The mutator in IRFuzzer can generate a wide range of LLVM IR inputs, including structured control flow, vector types, and function definitions. The system instruments coding patterns in the compiler to monitor the execution status of instruction selection. The instrumentation not only provides new coverage feedback on the matcher table but also guides the mutator on architecture-specific intrinsics. We ran IRFuzzer on 29 mature LLVM backend targets. IRFuzzer discovered 78 new, confirmed bugs in LLVM upstream, none of which existing fuzzers could discover. This demonstrates that IRFuzzer is far more effective than existing fuzzers. Upon receiving our bug report, the developers have fixed 57 bugs and back-ported five fixes to LLVM 15, which shows that specialized fuzzing provides actionable insights to LLVM developers. |
| Author | Yu, Zhanghan Weng, Zhenkai Neuendorffer, Stephen Rong, Yuyang Chen, Hao |
| Author_xml | – sequence: 1 givenname: Yuyang surname: Rong fullname: Rong, Yuyang email: PeterRong96@gmail.com organization: Advanced Micro Devices, Inc – sequence: 2 givenname: Zhanghan surname: Yu fullname: Yu, Zhanghan email: hnryu@ucdavis.edu organization: University of California,Davis – sequence: 3 givenname: Zhenkai surname: Weng fullname: Weng, Zhenkai email: zweng@ucdavis.edu organization: University of California,Davis – sequence: 4 givenname: Stephen surname: Neuendorffer fullname: Neuendorffer, Stephen email: stephen.neuendorffer@amd.com organization: Advanced Micro Devices, Inc – sequence: 5 givenname: Hao surname: Chen fullname: Chen, Hao email: chen@ucdavis.edu organization: University of California,Davis |
| BookMark | eNotkMtOwkAUQEejiYD8AYv5geK9M52XCxNtAElqTETdktvOHVPFlhRcwNer0dVJzuIszlCctV3LQkwQpogQrpbFamaMzt1UgTJTANRwIsbBBa81GjA24KkYoDE-Q6XMhRjudu8AYPMQBuJm-TT_Oh65v5arLdcNbZojR_nrmvZNpq6XZfn6IO-o_uA2yqKLLBfcck_7pmsvxXmizY7H_xyJl_nsubjPysfFsrgtM1IW9pl2nFNlbeVjCCnUDiMTebYW2LuUKJlALleMDoOlqKNB8ibmVbJgPeqRmPx1G2Zeb_vmk_rD-meACs4p_Q11-kqr |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ICSE55347.2025.00130 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798331505691 |
| EISSN | 1558-1225 |
| EndPage | 1998 |
| ExternalDocumentID | 11029772 |
| Genre | orig-research |
| GroupedDBID | -~X .4S .DC 29O 5VS 6IE 6IF 6IH 6IK 6IL 6IM 6IN 8US AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS ARCSS AVWKF BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO EDO FEDTE I-F IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-a260t-37e4ab66b8d99f9c71deaa8e660e87ffaf59a742e17196ad3d51a85d4bf606813 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001538318100155&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 01:40:13 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a260t-37e4ab66b8d99f9c71deaa8e660e87ffaf59a742e17196ad3d51a85d4bf606813 |
| PageCount | 13 |
| ParticipantIDs | ieee_primary_11029772 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-April-26 |
| PublicationDateYYYYMMDD | 2025-04-26 |
| PublicationDate_xml | – month: 04 year: 2025 text: 2025-April-26 day: 26 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings / International Conference on Software Engineering |
| PublicationTitleAbbrev | ICSE |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0006499 |
| Score | 2.2967312 |
| Snippet | Modern compilers, such as LLVM, are complex. Due to their complexity, manual testing is unlikely to suffice, yet formal verification is difficult to scale.... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1986 |
| SubjectTerms | Codes Computer bugs Fuzzing Instruments LLVM Measurement Optimization Semantics Software software analysis Testing Vectors |
| Title | IRFuzzer: Specialized Fuzzing for LLVM Backend Code Generation |
| URI | https://ieeexplore.ieee.org/document/11029772 |
| WOSCitedRecordID | wos001538318100155&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LawIxEA6t9NCTfVj6Jodet5rdPHvooaJUsCJ94U2yOxPwosVqD_76TtZVTz30FgIhMMO8MvnmY-wuhZB7sJAo5VUidUEmpV0Wx7iqLOTkDYtWSTZhBgM7GrlhBVYvsTCIWH4-w_u4LHv5MCuW8amsSaEqpXyFPO6-MXoN1tq6XU25e4WNEy3X7LXfOkpl0lANmMZ3E1F-dN4xqJQBpFv_59VHrLGD4vHhNsgcsz2cnrD6houBV6Z5yh57r93laoXzB15Ryk9WCDzu0TlOqSnv9z9f-JMns50Cb88A-XrmdFRNg310O-_t56TiRkg8VSAL8gsofa51bsG54AojAL23qHULrQnBB-U8lb0oDNmYhwyU8FaBzAOVLFZkZ6w2nU3xnHGpRaFFnspMWgmeEhhwBsCETCifG3fBGlEe46_1-IvxRhSXf-xfscMo8thySfU1qy3mS7xhB8XPYvI9vy2V9guQipc6 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEG4MmugJHxjf9uB1ZbvbpwcPEgjEhRBFw410d7oJl8UgeODXO10WOHnw1jRpmsxkXp1-8xHyEEGeWtAQCGFFwGWGJiVN7Me4ijhP0RtmYUk2oQYDPR6bYQVWL7Ewzrny85l79Muylw-zbOmfypoYqiLMV9Dj7gvOo3AN19o6XonZe4WOY6Fp9lrvbSFirrAKjPzLCSu_Ou84VMoQ0qn_8_Jj0tiB8ehwG2ZOyJ4rTkl9w8ZAK-M8I8-9t85ytXLzJ1qRyk9XDqjfw3MUk1OaJJ99-mLRcAugrRk4up467ZXTIB-d9qjVDSp2hMBiDbJAz-C4TaVMNRiTm0wxcNZqJ2XotMpzmwtjsfB1TKGVWYhBMKsF8DTHokWz-JzUilnhLgjlkmWSpRGPueZgMYUBowBUHjNhU2UuScPLY_K1HoAx2Yji6o_9e3LYHfWTSdIbvF6TIy9-34CJ5A2pLeZLd0sOsp_F9Ht-VyrwF6B9moE |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=IRFuzzer%3A+Specialized+Fuzzing+for+LLVM+Backend+Code+Generation&rft.au=Rong%2C+Yuyang&rft.au=Yu%2C+Zhanghan&rft.au=Weng%2C+Zhenkai&rft.au=Neuendorffer%2C+Stephen&rft.date=2025-04-26&rft.pub=IEEE&rft.eissn=1558-1225&rft.spage=1986&rft.epage=1998&rft_id=info:doi/10.1109%2FICSE55347.2025.00130&rft.externalDocID=11029772 |