Introducing Instruction-Accurate Simulators for Performance Estimation of Autotuning Workloads

Accelerating Machine Learning (ML) workloads requires efficient methods due to their large optimization space. Autotuning has emerged as an effective approach for systematically evaluating variations of implementations. Traditionally, autotuning requires the workloads to be executed on the target ha...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7
Hlavní autoři:	Pelke, Rebecca, Bosbach, Nils, Reimann, Lennart M., Leupers, Rainer
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 22.06.2025
Témata:	Autotuning cache optimization Design automation Estimation gem5 Hardware Machine learning Optimization Predictive models Scalability TVM
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Accelerating Machine Learning (ML) workloads requires efficient methods due to their large optimization space. Autotuning has emerged as an effective approach for systematically evaluating variations of implementations. Traditionally, autotuning requires the workloads to be executed on the target hardware (HW). We present an interface that allows executing autotuning workloads on simulators. This approach offers high scalability when the availability of the target HW is limited, as many simulations can be run in parallel on any accessible HW.Additionally, we evaluate the feasibility of using fast instruction-accurate simulators for autotuning. We train various predictors to forecast the performance of ML workload implementations on the target HW based on simulation statistics.Our results demonstrate that the tuned predictors are highly effective. The best workload implementation in terms of actual run time on the target HW is always within the top 3% of predictions for the tested x86, ARM, and RISC-V-based architectures. In the best case, this approach outperforms native execution on the target HW for embedded architectures when running as few as three samples on three simulators in parallel.
AbstractList	Accelerating Machine Learning (ML) workloads requires efficient methods due to their large optimization space. Autotuning has emerged as an effective approach for systematically evaluating variations of implementations. Traditionally, autotuning requires the workloads to be executed on the target hardware (HW). We present an interface that allows executing autotuning workloads on simulators. This approach offers high scalability when the availability of the target HW is limited, as many simulations can be run in parallel on any accessible HW.Additionally, we evaluate the feasibility of using fast instruction-accurate simulators for autotuning. We train various predictors to forecast the performance of ML workload implementations on the target HW based on simulation statistics.Our results demonstrate that the tuned predictors are highly effective. The best workload implementation in terms of actual run time on the target HW is always within the top 3% of predictions for the tested x86, ARM, and RISC-V-based architectures. In the best case, this approach outperforms native execution on the target HW for embedded architectures when running as few as three samples on three simulators in parallel.
Author	Leupers, Rainer Pelke, Rebecca Reimann, Lennart M. Bosbach, Nils
Author_xml	– sequence: 1 givenname: Rebecca surname: Pelke fullname: Pelke, Rebecca email: pelke@ice.rwth-aachen.de organization: Institute for Communication Technologies and Embedded Systems RWTH Aachen University,Germany – sequence: 2 givenname: Nils surname: Bosbach fullname: Bosbach, Nils email: bosbach@ice.rwth-aachen.de organization: Institute for Communication Technologies and Embedded Systems RWTH Aachen University,Germany – sequence: 3 givenname: Lennart M. surname: Reimann fullname: Reimann, Lennart M. email: lennart.reimann@ice.rwth-aachen.de organization: Institute for Communication Technologies and Embedded Systems RWTH Aachen University,Germany – sequence: 4 givenname: Rainer surname: Leupers fullname: Leupers, Rainer email: leupers@ice.rwth-aachen.de organization: Institute for Communication Technologies and Embedded Systems RWTH Aachen University,Germany
BookMark	eNo1j8FKxDAURSPoQsf5A5H8QMe8Jm2aZamjFgYUVNw5vKYvUmwTSdOFf-8M6urA5XK494Kd-uCJsWsQGwBhbm7rppSVMptc5MUhAilzqU_Y2mhTSQmFkEJV5-y99SmGfrGD_-Ctn1NcbBqCz2prl4iJ-PMwLSOmEGfuQuRPFA-Y0Fvi2zkNEx7rPDheLymkxR9FbyF-jgH7-ZKdORxnWv9xxV7vti_NQ7Z7vG-bepchaJMygo6ULV0vFdkCc3TCIMiSHHTOKFdq1BUCoNHSUU_KmQ40WW1cp_NSyxW7-vUORLT_iodZ8Xv_f1v-APAPVRw
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/DAC63849.2025.11133237
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798331503048
EndPage	7
ExternalDocumentID	11133237
Genre	orig-research
GroupedDBID	6IE 6IH CBEJK RIE RIO
ID	FETCH-LOGICAL-a179t-e1be4c6fd34ec5a2af09a136ef1bf94f67a78a11a973fede4f9b17ec79fb72673
IEDL.DBID	RIE
IngestDate	Wed Oct 01 07:05:15 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a179t-e1be4c6fd34ec5a2af09a136ef1bf94f67a78a11a973fede4f9b17ec79fb72673
PageCount	7
ParticipantIDs	ieee_primary_11133237
PublicationCentury	2000
PublicationDate	2025-June-22
PublicationDateYYYYMMDD	2025-06-22
PublicationDate_xml	– month: 06 year: 2025 text: 2025-June-22 day: 22
PublicationDecade	2020
PublicationTitle	2025 62nd ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev	DAC
PublicationYear	2025
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	2.2956133
Snippet	Accelerating Machine Learning (ML) workloads requires efficient methods due to their large optimization space. Autotuning has emerged as an effective approach...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Autotuning cache optimization Design automation Estimation gem5 Hardware Machine learning Optimization Predictive models Scalability TVM
Title	Introducing Instruction-Accurate Simulators for Performance Estimation of Autotuning Workloads
URI	https://ieeexplore.ieee.org/document/11133237
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA22ePCkYsVvcvCadpNsk82x1BZ7KQUVerIksxMoaFfaXX9_k93W4sGDtzAEAjMZJpmZN4-QxwQUOAmSmdxalmIsEkpEFoKJBEApLTRkE3o6zeZzM9uB1WssDCLWzWfYjcu6lp8XUMVUWS_SokshdYu0tFYNWGuH-uWJ6T0NhuE2pRF-Ivrd_eZftCl11Bif_vO8M9I54O_o7CeynJMjXF2Q90lsKs8rCCI6OQx-ZQOAKg58oC_Lz0jGVaw3NDxF6eyACaCj4MkNSJEWng6qsiirmBGhMVn-Udh80yFv49Hr8Jnt2BGYDU5UMuQOU1A-lylC3wrrE2O5VOi58yb1SludWc6t0dJjjqk3jmsEbbzTQml5SdqrYoVXhHplZSIMZjILPuyF1UnmQEP4TDiX9PGadKJyFl_NAIzFXi83f8hvyUk0QeyoEuKOtINC8J4cw3e53KwfarNtAb_wnec
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA1aBT2pWPHbHLxuu5tkN5tjqZYWaylYoSdLMjuBgnal3fX3m-y2Fg8evIUhEJjJMMnMvHmE3IeQgOHAA5VpHQj0RUKOGLhgwgGQcw012YQcjdLpVI3XYPUKC4OIVfMZtvyyquVnOZQ-Vdb2tOiccblL9mIhWFjDtda43yhU7YdO190n4QEoLG5ttv8iTqniRu_onycek-YWgUfHP7HlhOzg4pS8DXxbeVaCE9HBdvRr0AEo_cgH-jL_8HRc-XJF3WOUjreoAProfLmGKdLc0k5Z5EXpcyLUp8vfc52tmuS19zjp9oM1P0KgnRsVAUYGBSQ24wIh1kzbUOmIJ2gjY5WwidQy1VGkleQWMxRWmUgiSGWNZInkZ6SxyBd4TqhNNA-ZwpSnzost0zJMDUhw3wljwhgvSNMrZ_ZZj8CYbfRy-Yf8jhz0J8_D2XAweroih94cvr-KsWvScMrBG7IPX8V8tbytTPgNIhShLg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=Introducing+Instruction-Accurate+Simulators+for+Performance+Estimation+of+Autotuning+Workloads&rft.au=Pelke%2C+Rebecca&rft.au=Bosbach%2C+Nils&rft.au=Reimann%2C+Lennart+M.&rft.au=Leupers%2C+Rainer&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11133237&rft.externalDocID=11133237