Introducing Instruction-Accurate Simulators for Performance Estimation of Autotuning Workloads

Accelerating Machine Learning (ML) workloads requires efficient methods due to their large optimization space. Autotuning has emerged as an effective approach for systematically evaluating variations of implementations. Traditionally, autotuning requires the workloads to be executed on the target ha...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2025 62nd ACM/IEEE Design Automation Conference (DAC) s. 1 - 7
Hlavní autoři: Pelke, Rebecca, Bosbach, Nils, Reimann, Lennart M., Leupers, Rainer
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 22.06.2025
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Accelerating Machine Learning (ML) workloads requires efficient methods due to their large optimization space. Autotuning has emerged as an effective approach for systematically evaluating variations of implementations. Traditionally, autotuning requires the workloads to be executed on the target hardware (HW). We present an interface that allows executing autotuning workloads on simulators. This approach offers high scalability when the availability of the target HW is limited, as many simulations can be run in parallel on any accessible HW.Additionally, we evaluate the feasibility of using fast instruction-accurate simulators for autotuning. We train various predictors to forecast the performance of ML workload implementations on the target HW based on simulation statistics.Our results demonstrate that the tuned predictors are highly effective. The best workload implementation in terms of actual run time on the target HW is always within the top 3% of predictions for the tested x86, ARM, and RISC-V-based architectures. In the best case, this approach outperforms native execution on the target HW for embedded architectures when running as few as three samples on three simulators in parallel.
AbstractList Accelerating Machine Learning (ML) workloads requires efficient methods due to their large optimization space. Autotuning has emerged as an effective approach for systematically evaluating variations of implementations. Traditionally, autotuning requires the workloads to be executed on the target hardware (HW). We present an interface that allows executing autotuning workloads on simulators. This approach offers high scalability when the availability of the target HW is limited, as many simulations can be run in parallel on any accessible HW.Additionally, we evaluate the feasibility of using fast instruction-accurate simulators for autotuning. We train various predictors to forecast the performance of ML workload implementations on the target HW based on simulation statistics.Our results demonstrate that the tuned predictors are highly effective. The best workload implementation in terms of actual run time on the target HW is always within the top 3% of predictions for the tested x86, ARM, and RISC-V-based architectures. In the best case, this approach outperforms native execution on the target HW for embedded architectures when running as few as three samples on three simulators in parallel.
Author Leupers, Rainer
Pelke, Rebecca
Reimann, Lennart M.
Bosbach, Nils
Author_xml – sequence: 1
  givenname: Rebecca
  surname: Pelke
  fullname: Pelke, Rebecca
  email: pelke@ice.rwth-aachen.de
  organization: Institute for Communication Technologies and Embedded Systems RWTH Aachen University,Germany
– sequence: 2
  givenname: Nils
  surname: Bosbach
  fullname: Bosbach, Nils
  email: bosbach@ice.rwth-aachen.de
  organization: Institute for Communication Technologies and Embedded Systems RWTH Aachen University,Germany
– sequence: 3
  givenname: Lennart M.
  surname: Reimann
  fullname: Reimann, Lennart M.
  email: lennart.reimann@ice.rwth-aachen.de
  organization: Institute for Communication Technologies and Embedded Systems RWTH Aachen University,Germany
– sequence: 4
  givenname: Rainer
  surname: Leupers
  fullname: Leupers, Rainer
  email: leupers@ice.rwth-aachen.de
  organization: Institute for Communication Technologies and Embedded Systems RWTH Aachen University,Germany
BookMark eNo1j8FKxDAURSPoQsf5A5H8QMe8Jm2aZamjFgYUVNw5vKYvUmwTSdOFf-8M6urA5XK494Kd-uCJsWsQGwBhbm7rppSVMptc5MUhAilzqU_Y2mhTSQmFkEJV5-y99SmGfrGD_-Ctn1NcbBqCz2prl4iJ-PMwLSOmEGfuQuRPFA-Y0Fvi2zkNEx7rPDheLymkxR9FbyF-jgH7-ZKdORxnWv9xxV7vti_NQ7Z7vG-bepchaJMygo6ULV0vFdkCc3TCIMiSHHTOKFdq1BUCoNHSUU_KmQ40WW1cp_NSyxW7-vUORLT_iodZ8Xv_f1v-APAPVRw
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/DAC63849.2025.11133237
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798331503048
EndPage 7
ExternalDocumentID 11133237
Genre orig-research
GroupedDBID 6IE
6IH
CBEJK
RIE
RIO
ID FETCH-LOGICAL-a179t-e1be4c6fd34ec5a2af09a136ef1bf94f67a78a11a973fede4f9b17ec79fb72673
IEDL.DBID RIE
IngestDate Wed Oct 01 07:05:15 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a179t-e1be4c6fd34ec5a2af09a136ef1bf94f67a78a11a973fede4f9b17ec79fb72673
PageCount 7
ParticipantIDs ieee_primary_11133237
PublicationCentury 2000
PublicationDate 2025-June-22
PublicationDateYYYYMMDD 2025-06-22
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-June-22
  day: 22
PublicationDecade 2020
PublicationTitle 2025 62nd ACM/IEEE Design Automation Conference (DAC)
PublicationTitleAbbrev DAC
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
Score 2.2956133
Snippet Accelerating Machine Learning (ML) workloads requires efficient methods due to their large optimization space. Autotuning has emerged as an effective approach...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Autotuning
cache optimization
Design automation
Estimation
gem5
Hardware
Machine learning
Optimization
Predictive models
Scalability
TVM
Title Introducing Instruction-Accurate Simulators for Performance Estimation of Autotuning Workloads
URI https://ieeexplore.ieee.org/document/11133237
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA22ePCkYsVvcvCadpNsk82x1BZ7KQUVerIksxMoaFfaXX9_k93W4sGDtzAEAjMZJpmZN4-QxwQUOAmSmdxalmIsEkpEFoKJBEApLTRkE3o6zeZzM9uB1WssDCLWzWfYjcu6lp8XUMVUWS_SokshdYu0tFYNWGuH-uWJ6T0NhuE2pRF-Ivrd_eZftCl11Bif_vO8M9I54O_o7CeynJMjXF2Q90lsKs8rCCI6OQx-ZQOAKg58oC_Lz0jGVaw3NDxF6eyACaCj4MkNSJEWng6qsiirmBGhMVn-Udh80yFv49Hr8Jnt2BGYDU5UMuQOU1A-lylC3wrrE2O5VOi58yb1SludWc6t0dJjjqk3jmsEbbzTQml5SdqrYoVXhHplZSIMZjILPuyF1UnmQEP4TDiX9PGadKJyFl_NAIzFXi83f8hvyUk0QeyoEuKOtINC8J4cw3e53KwfarNtAb_wnec
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA1aBT2pWPHbHLxuu5tkN5tjqZYWaylYoSdLMjuBgnal3fX3m-y2Fg8evIUhEJjJMMnMvHmE3IeQgOHAA5VpHQj0RUKOGLhgwgGQcw012YQcjdLpVI3XYPUKC4OIVfMZtvyyquVnOZQ-Vdb2tOiccblL9mIhWFjDtda43yhU7YdO190n4QEoLG5ttv8iTqniRu_onycek-YWgUfHP7HlhOzg4pS8DXxbeVaCE9HBdvRr0AEo_cgH-jL_8HRc-XJF3WOUjreoAProfLmGKdLc0k5Z5EXpcyLUp8vfc52tmuS19zjp9oM1P0KgnRsVAUYGBSQ24wIh1kzbUOmIJ2gjY5WwidQy1VGkleQWMxRWmUgiSGWNZInkZ6SxyBd4TqhNNA-ZwpSnzost0zJMDUhw3wljwhgvSNMrZ_ZZj8CYbfRy-Yf8jhz0J8_D2XAweroih94cvr-KsWvScMrBG7IPX8V8tbytTPgNIhShLg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2025+62nd+ACM%2FIEEE+Design+Automation+Conference+%28DAC%29&rft.atitle=Introducing+Instruction-Accurate+Simulators+for+Performance+Estimation+of+Autotuning+Workloads&rft.au=Pelke%2C+Rebecca&rft.au=Bosbach%2C+Nils&rft.au=Reimann%2C+Lennart+M.&rft.au=Leupers%2C+Rainer&rft.date=2025-06-22&rft.pub=IEEE&rft.spage=1&rft.epage=7&rft_id=info:doi/10.1109%2FDAC63849.2025.11133237&rft.externalDocID=11133237