Introducing Instruction-Accurate Simulators for Performance Estimation of Autotuning Workloads
Accelerating Machine Learning (ML) workloads requires efficient methods due to their large optimization space. Autotuning has emerged as an effective approach for systematically evaluating variations of implementations. Traditionally, autotuning requires the workloads to be executed on the target ha...
Uloženo v:
| Vydáno v: | 2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
22.06.2025
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Accelerating Machine Learning (ML) workloads requires efficient methods due to their large optimization space. Autotuning has emerged as an effective approach for systematically evaluating variations of implementations. Traditionally, autotuning requires the workloads to be executed on the target hardware (HW). We present an interface that allows executing autotuning workloads on simulators. This approach offers high scalability when the availability of the target HW is limited, as many simulations can be run in parallel on any accessible HW.Additionally, we evaluate the feasibility of using fast instruction-accurate simulators for autotuning. We train various predictors to forecast the performance of ML workload implementations on the target HW based on simulation statistics.Our results demonstrate that the tuned predictors are highly effective. The best workload implementation in terms of actual run time on the target HW is always within the top 3% of predictions for the tested x86, ARM, and RISC-V-based architectures. In the best case, this approach outperforms native execution on the target HW for embedded architectures when running as few as three samples on three simulators in parallel. |
|---|---|
| AbstractList | Accelerating Machine Learning (ML) workloads requires efficient methods due to their large optimization space. Autotuning has emerged as an effective approach for systematically evaluating variations of implementations. Traditionally, autotuning requires the workloads to be executed on the target hardware (HW). We present an interface that allows executing autotuning workloads on simulators. This approach offers high scalability when the availability of the target HW is limited, as many simulations can be run in parallel on any accessible HW.Additionally, we evaluate the feasibility of using fast instruction-accurate simulators for autotuning. We train various predictors to forecast the performance of ML workload implementations on the target HW based on simulation statistics.Our results demonstrate that the tuned predictors are highly effective. The best workload implementation in terms of actual run time on the target HW is always within the top 3% of predictions for the tested x86, ARM, and RISC-V-based architectures. In the best case, this approach outperforms native execution on the target HW for embedded architectures when running as few as three samples on three simulators in parallel. |
| Author | Leupers, Rainer Pelke, Rebecca Reimann, Lennart M. Bosbach, Nils |
| Author_xml | – sequence: 1 givenname: Rebecca surname: Pelke fullname: Pelke, Rebecca email: pelke@ice.rwth-aachen.de organization: Institute for Communication Technologies and Embedded Systems RWTH Aachen University,Germany – sequence: 2 givenname: Nils surname: Bosbach fullname: Bosbach, Nils email: bosbach@ice.rwth-aachen.de organization: Institute for Communication Technologies and Embedded Systems RWTH Aachen University,Germany – sequence: 3 givenname: Lennart M. surname: Reimann fullname: Reimann, Lennart M. email: lennart.reimann@ice.rwth-aachen.de organization: Institute for Communication Technologies and Embedded Systems RWTH Aachen University,Germany – sequence: 4 givenname: Rainer surname: Leupers fullname: Leupers, Rainer email: leupers@ice.rwth-aachen.de organization: Institute for Communication Technologies and Embedded Systems RWTH Aachen University,Germany |
| BookMark | eNo1j8FKxDAURSPoQsf5A5H8QMe8Jm2aZamjFgYUVNw5vKYvUmwTSdOFf-8M6urA5XK494Kd-uCJsWsQGwBhbm7rppSVMptc5MUhAilzqU_Y2mhTSQmFkEJV5-y99SmGfrGD_-Ctn1NcbBqCz2prl4iJ-PMwLSOmEGfuQuRPFA-Y0Fvi2zkNEx7rPDheLymkxR9FbyF-jgH7-ZKdORxnWv9xxV7vti_NQ7Z7vG-bepchaJMygo6ULV0vFdkCc3TCIMiSHHTOKFdq1BUCoNHSUU_KmQ40WW1cp_NSyxW7-vUORLT_iodZ8Xv_f1v-APAPVRw |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/DAC63849.2025.11133237 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798331503048 |
| EndPage | 7 |
| ExternalDocumentID | 11133237 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IH CBEJK RIE RIO |
| ID | FETCH-LOGICAL-a179t-e1be4c6fd34ec5a2af09a136ef1bf94f67a78a11a973fede4f9b17ec79fb72673 |
| IEDL.DBID | RIE |
| IngestDate | Wed Oct 01 07:05:15 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a179t-e1be4c6fd34ec5a2af09a136ef1bf94f67a78a11a973fede4f9b17ec79fb72673 |
| PageCount | 7 |
| ParticipantIDs | ieee_primary_11133237 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-June-22 |
| PublicationDateYYYYMMDD | 2025-06-22 |
| PublicationDate_xml | – month: 06 year: 2025 text: 2025-June-22 day: 22 |
| PublicationDecade | 2020 |
| PublicationTitle | 2025 62nd ACM/IEEE Design Automation Conference (DAC) |
| PublicationTitleAbbrev | DAC |
| PublicationYear | 2025 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 2.2956133 |
| Snippet | Accelerating Machine Learning (ML) workloads requires efficient methods due to their large optimization space. Autotuning has emerged as an effective approach... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Autotuning cache optimization Design automation Estimation gem5 Hardware Machine learning Optimization Predictive models Scalability TVM |
| Title | Introducing Instruction-Accurate Simulators for Performance Estimation of Autotuning Workloads |
| URI | https://ieeexplore.ieee.org/document/11133237 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA22ePCkYsVvcvCadpNsk82x1BZ7KQUVerIksxMoaFfaXX9_k93W4sGDtzAEAjMZJpmZN4-QxwQUOAmSmdxalmIsEkpEFoKJBEApLTRkE3o6zeZzM9uB1WssDCLWzWfYjcu6lp8XUMVUWS_SokshdYu0tFYNWGuH-uWJ6T0NhuE2pRF-Ivrd_eZftCl11Bif_vO8M9I54O_o7CeynJMjXF2Q90lsKs8rCCI6OQx-ZQOAKg58oC_Lz0jGVaw3NDxF6eyACaCj4MkNSJEWng6qsiirmBGhMVn-Udh80yFv49Hr8Jnt2BGYDU5UMuQOU1A-lylC3wrrE2O5VOi58yb1SludWc6t0dJjjqk3jmsEbbzTQml5SdqrYoVXhHplZSIMZjILPuyF1UnmQEP4TDiX9PGadKJyFl_NAIzFXi83f8hvyUk0QeyoEuKOtINC8J4cw3e53KwfarNtAb_wnec |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA1aBT2pWPHbHLxuu5tkN5tjqZYWaylYoSdLMjuBgnal3fX3m-y2Fg8evIUhEJjJMMnMvHmE3IeQgOHAA5VpHQj0RUKOGLhgwgGQcw012YQcjdLpVI3XYPUKC4OIVfMZtvyyquVnOZQ-Vdb2tOiccblL9mIhWFjDtda43yhU7YdO190n4QEoLG5ttv8iTqniRu_onycek-YWgUfHP7HlhOzg4pS8DXxbeVaCE9HBdvRr0AEo_cgH-jL_8HRc-XJF3WOUjreoAProfLmGKdLc0k5Z5EXpcyLUp8vfc52tmuS19zjp9oM1P0KgnRsVAUYGBSQ24wIh1kzbUOmIJ2gjY5WwidQy1VGkleQWMxRWmUgiSGWNZInkZ6SxyBd4TqhNNA-ZwpSnzost0zJMDUhw3wljwhgvSNMrZ_ZZj8CYbfRy-Yf8jhz0J8_D2XAweroih94cvr-KsWvScMrBG7IPX8V8tbytTPgNIhShLg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=Introducing+Instruction-Accurate+Simulators+for+Performance+Estimation+of+Autotuning+Workloads&rft.au=Pelke%2C+Rebecca&rft.au=Bosbach%2C+Nils&rft.au=Reimann%2C+Lennart+M.&rft.au=Leupers%2C+Rainer&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11133237&rft.externalDocID=11133237 |