VEAL Virtualized Execution Accelerator for Loops

Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunction with general purpose processors to achieve future performance goals. There is a serious drawback to accelerators, thou...

Full description

Saved in:
Bibliographic Details
Published in:2008 International Symposium on Computer Architecture pp. 389 - 400
Main Authors: Clark, Nathan, Hormati, Amir, Mahlke, Scott
Format: Conference Proceeding
Language:English
Published: Washington, DC, USA IEEE Computer Society 01.06.2008
IEEE
Series:ACM Conferences
Subjects:
ISBN:9780769531748, 0769531741
ISSN:1063-6897
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunction with general purpose processors to achieve future performance goals. There is a serious drawback to accelerators, though: binary compatibility. An application compiled to utilize an accelerator cannot run on a processor without that accelerator, and applications that do not utilize an accelerator will never use it. To overcome this problem, we propose decoupling the instruction set architecture from the underlying accelerators. Computation to be accelerated is expressed using a processor’s baseline instruction set, and light-weight dynamic translation maps the representation to whatever accelerators are available in the system. In this paper, we describe the changes to a compilation framework and processor system needed to support this abstraction for an important set of accelerator designs that support innermost loops. In this analysis, we investigate the dynamic overheads associated with abstraction as well as the static/dynamic tradeoffs to improve the dynamic mapping of loop-nests. As part of the exploration, we also provide a quantitative analysis of the hardware characteristics of effective loop accelerators. We conclude that using a hybrid static-dynamic compilation approach to map computation on to loop-level accelerators is an practical way to increase computation efficiency, without the overheads associated with instruction set modification.
AbstractList Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunction with general purpose processors to achieve future performance goals. There is a serious drawback to accelerators, though: binary compatibility. An application compiled to utilize an accelerator cannot run on a processor without that accelerator, and applications that do not utilize an accelerator will never use it. To overcome this problem, we propose decoupling the instruction set architecture from the underlying accelerators. Computation to be accelerated is expressed using a processorpsilas baseline instruction set, and light-weight dynamic translation maps the representation to whatever accelerators are available in the system. In this paper, we describe the changes to a compilation framework and processor system needed to support this abstraction for an important set of accelerator designs that support innermost loops. In this analysis, we investigate the dynamic overheads associated with abstraction as well as the static/dynamic tradeoffs to improve the dynamic mapping of loop-nests. As part of the exploration, we also provide a quantitative analysis of the hardware characteristics of effective loop accelerators. We conclude that using a hybrid static-dynamic compilation approach to map computation on to loop-level accelerators is an practical way to increase computation efficiency, without the overheads associated with instruction set modification.
Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunction with general purpose processors to achieve future performance goals. There is a serious drawback to accelerators, though: binary compatibility. An application compiled to utilize an accelerator cannot run on a processor without that accelerator, and applications that do not utilize an accelerator will never use it. To overcome this problem, we propose decoupling the instruction set architecture from the underlying accelerators. Computation to be accelerated is expressed using a processor’s baseline instruction set, and light-weight dynamic translation maps the representation to whatever accelerators are available in the system. In this paper, we describe the changes to a compilation framework and processor system needed to support this abstraction for an important set of accelerator designs that support innermost loops. In this analysis, we investigate the dynamic overheads associated with abstraction as well as the static/dynamic tradeoffs to improve the dynamic mapping of loop-nests. As part of the exploration, we also provide a quantitative analysis of the hardware characteristics of effective loop accelerators. We conclude that using a hybrid static-dynamic compilation approach to map computation on to loop-level accelerators is an practical way to increase computation efficiency, without the overheads associated with instruction set modification.
Author Hormati, Amir
Clark, Nathan
Mahlke, Scott
Author_xml – sequence: 1
  givenname: Nathan
  surname: Clark
  fullname: Clark, Nathan
– sequence: 2
  givenname: Amir
  surname: Hormati
  fullname: Hormati, Amir
– sequence: 3
  givenname: Scott
  surname: Mahlke
  fullname: Mahlke, Scott
BookMark eNqNj7FOwzAURS0RJErJxsbMhBKe_Ww_Z4yiUipFYqBltWzHlgK0QQkLf0-i8gFMdzhHVzrXLDsNp8jYLYeSc6ged69NXQoAUyJesLwiA6QrhZykydiKg8ZCm4quWD5N7wDAK00cccWyt03d3rDL5D6nmP_tmh2eNvvmuWhftrumbgsnJP8uOtOp5Im7oIWWnfAoExGpxCWYoFKMMxYCvVcqIAbToRCRnKNgvBER1-zu_NvHGO3X2B_d-GOlUpqkmOnDmbpwtH4YPibLwS59dumzS59FtH7sY5rt-__Y-AugoEyt
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ISCA.2008.33
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 400
ExternalDocumentID 4556742
Genre orig-research
GroupedDBID 6IE
6IF
6IG
6IH
6IK
6IL
6IM
6IN
AAJGR
AARBI
ACM
ADPZR
ALMA_UNASSIGNED_HOLDINGS
APO
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
GUFHI
IERZE
OCL
RIB
RIC
RIE
RIL
RIO
23M
29F
29O
AAWTH
ACGFS
ADZIZ
CHZPO
IEGSK
IJVOP
IPLJI
M43
ZY4
ID FETCH-LOGICAL-a241t-d8d5fb71ac6264d2b34f7775f1408c5fee5fb223bb55c33c8d322e7aa7c8b82e3
IEDL.DBID RIE
ISBN 9780769531748
0769531741
ISICitedReferencesCount 41
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000257942700032&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1063-6897
IngestDate Wed Aug 27 02:15:40 EDT 2025
Wed Jan 31 06:44:29 EST 2024
Wed Jan 31 06:43:35 EST 2024
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MeetingName ISCA08: The 35th Annual International Symposium on Computer Architecture
MergedId FETCHMERGED-LOGICAL-a241t-d8d5fb71ac6264d2b34f7775f1408c5fee5fb223bb55c33c8d322e7aa7c8b82e3
PageCount 12
ParticipantIDs acm_books_10_1109_ISCA_2008_33
acm_books_10_1109_ISCA_2008_33_brief
ieee_primary_4556742
PublicationCentury 2000
PublicationDate 20080601
2008-June
PublicationDateYYYYMMDD 2008-06-01
PublicationDate_xml – month: 06
  year: 2008
  text: 20080601
  day: 01
PublicationDecade 2000
PublicationPlace Washington, DC, USA
PublicationPlace_xml – name: Washington, DC, USA
PublicationSeriesTitle ACM Conferences
PublicationTitle 2008 International Symposium on Computer Architecture
PublicationTitleAbbrev ISCA
PublicationYear 2008
Publisher IEEE Computer Society
IEEE
Publisher_xml – name: IEEE Computer Society
– name: IEEE
SSID ssj0001967133
ssj0019956
Score 1.9732575
Snippet Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific...
SourceID ieee
acm
SourceType Publisher
StartPage 389
SubjectTerms Acceleration
Algorithm design and analysis
Application software
Application specific integrated circuits
Computer aided instruction
Computer architecture
Costs
Energy consumption
Hardware
Hardware -- Electronic design automation -- Methodologies for EDA
Hardware -- Integrated circuits -- Logic circuits -- Arithmetic and datapath circuits
Hardware -- Integrated circuits -- Logic circuits -- Design modules and hierarchy
Hardware -- Very large scale integration design -- Application-specific VLSI designs -- Application specific instruction set processors
Multicore processing
Software and its engineering -- Software notations and tools -- Compilers
Software and its engineering -- Software notations and tools -- Compilers -- Runtime environments
Software and its engineering -- Software notations and tools -- General programming languages -- Language features -- Control structures
Subtitle Virtualized Execution Accelerator for Loops
Title VEAL
URI https://ieeexplore.ieee.org/document/4556742
WOSCitedRecordID wos000257942700032&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fa8IwED5U9rAnt-mY-yF58HGd2pgm2ZuIsoGIsE18K02aQh9mpeoY--t3SasyGIO9tU0p5bjw3Zf77g6gE0QyET5lnvIDiQSFUS-yonGD4Owr7UfcuCauUz6bieVSzitwf6iFMcY48Zl5sJculx9nemePyroDxgKkclWoch4UtVrH8xQZcDeqpMwg2IpNl-kMqBcI2_7PUnaJLocYWnbe2d-LgyJedp9fRsNCYmln6VYj_f5j7oqDnUn9fz98Bs1j_R6ZH5DpHCpmdQH1_QAHUu7nBvQX4-H0kSzS3FaRpF8mJuNPo50vkqHWCEkuC08wsiXTLFtvmvA2Gb-OnrxyhoIXITZvvVjELFG8H2lkLoPYV3SQcM5ZgsRKWKWZwWUMEZRiTFOqRYw73PAo4loo4Rt6CbVVtjJXQHyJbKunZTxgFMMOjZ_qJwkGSNIpZXstaKO1QksONqHjFj0ZWnMWoy4pbUHn7xdClacmaUHDWjJcF-02wtKI178_voHTQsJhD0ZuobbNd-YOTvTHNt3kbecj32wwr5s
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Na8JAEB3UFtqTbbXUftgcPDZVs9nsbm8iitJUhFrxFpLNBnKoStRS-us7u0alUAq9JdkQwjDLzNt5Mw-g4YUi4Q6hduR4AgEKJXaoSeMKg7MTSSdkygxx9dloxGczMS7Aw74XRillyGfqUV-aWn68kBt9VNZ0KfUQyhXhSCtn5d1ahxMV4TEjVpLXEHTPpql1esT2uB4AqEG7QKfDKJrP3tnd8z0nXjSHr93OlmSp1XSLoXz_obxiAk-__L9fPoPqoYPPGu9j0zkU1PwCyjsJByvf0RVoT3sd_8mappnuI0m_VGz1PpU03mh1pMSgZOrwFua2lr9YLFdVeOv3Jt2Bnaso2CFG57Ud85gmEWuHErGLGzsRcRPGGE0QWnHNNVO4jElCFFEqCZE8xj2uWBgyySPuKHIJpflirq7AcgTirZYUsUsJJh4SP9VOEkyRhOHKtmpQR2sFGh6sAoMuWiLQ5tyKXRJSg8bfLwRRlqqkBhVtyWC5HbgR5Ea8_v3xPZwMJi9-4A9HzzdwuiV06GOSWyits426g2P5sU5XWd34yzfMUrLk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+35th+Annual+International+Symposium+on+Computer+Architecture&rft.atitle=VEAL&rft.au=Clark%2C+Nathan&rft.au=Hormati%2C+Amir&rft.au=Mahlke%2C+Scott&rft.series=ACM+Conferences&rft.date=2008-06-01&rft.pub=IEEE+Computer+Society&rft.isbn=9780769531748&rft.spage=389&rft.epage=400&rft_id=info:doi/10.1109%2FISCA.2008.33
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6897&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6897&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6897&client=summon