VEAL Virtualized Execution Accelerator for Loops

Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunction with general purpose processors to achieve future performance goals. There is a serious drawback to accelerators, thou...

Full description

Saved in:

Bibliographic Details
Published in:	2008 International Symposium on Computer Architecture pp. 389 - 400
Main Authors:	Clark, Nathan, Hormati, Amir, Mahlke, Scott
Format:	Conference Proceeding
Language:	English
Published:	Washington, DC, USA IEEE Computer Society 01.06.2008 IEEE
Series:	ACM Conferences
Subjects:	Acceleration Algorithm design and analysis Application software Application specific integrated circuits Computer aided instruction Computer architecture Costs Energy consumption Hardware Hardware > Electronic design automation > Methodologies for EDA Hardware > Integrated circuits > Logic circuits > Arithmetic and datapath circuits Hardware > Integrated circuits > Logic circuits > Design modules and hierarchy Hardware > Very large scale integration design > Application-specific VLSI designs > Application specific instruction set processors Multicore processing Software and its engineering > Software notations and tools > Compilers Software and its engineering > Software notations and tools > Compilers > Runtime environments Software and its engineering > Software notations and tools > General programming languages > Language features > Control structures
ISBN:	9780769531748, 0769531741
ISSN:	1063-6897
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunction with general purpose processors to achieve future performance goals. There is a serious drawback to accelerators, though: binary compatibility. An application compiled to utilize an accelerator cannot run on a processor without that accelerator, and applications that do not utilize an accelerator will never use it. To overcome this problem, we propose decoupling the instruction set architecture from the underlying accelerators. Computation to be accelerated is expressed using a processor’s baseline instruction set, and light-weight dynamic translation maps the representation to whatever accelerators are available in the system. In this paper, we describe the changes to a compilation framework and processor system needed to support this abstraction for an important set of accelerator designs that support innermost loops. In this analysis, we investigate the dynamic overheads associated with abstraction as well as the static/dynamic tradeoffs to improve the dynamic mapping of loop-nests. As part of the exploration, we also provide a quantitative analysis of the hardware characteristics of effective loop accelerators. We conclude that using a hybrid static-dynamic compilation approach to map computation on to loop-level accelerators is an practical way to increase computation efficiency, without the overheads associated with instruction set modification.
AbstractList	Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunction with general purpose processors to achieve future performance goals. There is a serious drawback to accelerators, though: binary compatibility. An application compiled to utilize an accelerator cannot run on a processor without that accelerator, and applications that do not utilize an accelerator will never use it. To overcome this problem, we propose decoupling the instruction set architecture from the underlying accelerators. Computation to be accelerated is expressed using a processorpsilas baseline instruction set, and light-weight dynamic translation maps the representation to whatever accelerators are available in the system. In this paper, we describe the changes to a compilation framework and processor system needed to support this abstraction for an important set of accelerator designs that support innermost loops. In this analysis, we investigate the dynamic overheads associated with abstraction as well as the static/dynamic tradeoffs to improve the dynamic mapping of loop-nests. As part of the exploration, we also provide a quantitative analysis of the hardware characteristics of effective loop accelerators. We conclude that using a hybrid static-dynamic compilation approach to map computation on to loop-level accelerators is an practical way to increase computation efficiency, without the overheads associated with instruction set modification. Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific accelerators used in conjunction with general purpose processors to achieve future performance goals. There is a serious drawback to accelerators, though: binary compatibility. An application compiled to utilize an accelerator cannot run on a processor without that accelerator, and applications that do not utilize an accelerator will never use it. To overcome this problem, we propose decoupling the instruction set architecture from the underlying accelerators. Computation to be accelerated is expressed using a processor’s baseline instruction set, and light-weight dynamic translation maps the representation to whatever accelerators are available in the system. In this paper, we describe the changes to a compilation framework and processor system needed to support this abstraction for an important set of accelerator designs that support innermost loops. In this analysis, we investigate the dynamic overheads associated with abstraction as well as the static/dynamic tradeoffs to improve the dynamic mapping of loop-nests. As part of the exploration, we also provide a quantitative analysis of the hardware characteristics of effective loop accelerators. We conclude that using a hybrid static-dynamic compilation approach to map computation on to loop-level accelerators is an practical way to increase computation efficiency, without the overheads associated with instruction set modification.
Author	Hormati, Amir Clark, Nathan Mahlke, Scott
Author_xml	– sequence: 1 givenname: Nathan surname: Clark fullname: Clark, Nathan – sequence: 2 givenname: Amir surname: Hormati fullname: Hormati, Amir – sequence: 3 givenname: Scott surname: Mahlke fullname: Mahlke, Scott
BookMark	eNqNj7FOwzAURS0RJErJxsbMhBKe_Ww_Z4yiUipFYqBltWzHlgK0QQkLf0-i8gFMdzhHVzrXLDsNp8jYLYeSc6ged69NXQoAUyJesLwiA6QrhZykydiKg8ZCm4quWD5N7wDAK00cccWyt03d3rDL5D6nmP_tmh2eNvvmuWhftrumbgsnJP8uOtOp5Im7oIWWnfAoExGpxCWYoFKMMxYCvVcqIAbToRCRnKNgvBER1-zu_NvHGO3X2B_d-GOlUpqkmOnDmbpwtH4YPibLwS59dumzS59FtH7sY5rt-__Y-AugoEyt
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ISCA.2008.33
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EndPage	400
ExternalDocumentID	4556742
Genre	orig-research
GroupedDBID	6IE 6IF 6IG 6IH 6IK 6IL 6IM 6IN AAJGR AARBI ACM ADPZR ALMA_UNASSIGNED_HOLDINGS APO BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK GUFHI IERZE OCL RIB RIC RIE RIL RIO 23M 29F 29O AAWTH ACGFS ADZIZ CHZPO IEGSK IJVOP IPLJI M43 ZY4
ID	FETCH-LOGICAL-a241t-d8d5fb71ac6264d2b34f7775f1408c5fee5fb223bb55c33c8d322e7aa7c8b82e3
IEDL.DBID	RIE
ISBN	9780769531748 0769531741
ISICitedReferencesCount	41
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000257942700032&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	1063-6897
IngestDate	Wed Aug 27 02:15:40 EDT 2025 Wed Jan 31 06:44:29 EST 2024 Wed Jan 31 06:43:35 EST 2024
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MeetingName	ISCA08: The 35th Annual International Symposium on Computer Architecture
MergedId	FETCHMERGED-LOGICAL-a241t-d8d5fb71ac6264d2b34f7775f1408c5fee5fb223bb55c33c8d322e7aa7c8b82e3
PageCount	12
ParticipantIDs	acm_books_10_1109_ISCA_2008_33 acm_books_10_1109_ISCA_2008_33_brief ieee_primary_4556742
PublicationCentury	2000
PublicationDate	20080601 2008-June
PublicationDateYYYYMMDD	2008-06-01
PublicationDate_xml	– month: 06 year: 2008 text: 20080601 day: 01
PublicationDecade	2000
PublicationPlace	Washington, DC, USA
PublicationPlace_xml	– name: Washington, DC, USA
PublicationSeriesTitle	ACM Conferences
PublicationTitle	2008 International Symposium on Computer Architecture
PublicationTitleAbbrev	ISCA
PublicationYear	2008
Publisher	IEEE Computer Society IEEE
Publisher_xml	– name: IEEE Computer Society – name: IEEE
SSID	ssj0001967133 ssj0019956
Score	1.9732575
Snippet	Performance improvement solely through transistor scaling is becoming more and more difficult, thus it is increasingly common to see domain specific...
SourceID	ieee acm
SourceType	Publisher
StartPage	389
SubjectTerms	Acceleration Algorithm design and analysis Application software Application specific integrated circuits Computer aided instruction Computer architecture Costs Energy consumption Hardware Hardware -- Electronic design automation -- Methodologies for EDA Hardware -- Integrated circuits -- Logic circuits -- Arithmetic and datapath circuits Hardware -- Integrated circuits -- Logic circuits -- Design modules and hierarchy Hardware -- Very large scale integration design -- Application-specific VLSI designs -- Application specific instruction set processors Multicore processing Software and its engineering -- Software notations and tools -- Compilers Software and its engineering -- Software notations and tools -- Compilers -- Runtime environments Software and its engineering -- Software notations and tools -- General programming languages -- Language features -- Control structures
Subtitle	Virtualized Execution Accelerator for Loops
Title	VEAL
URI	https://ieeexplore.ieee.org/document/4556742
WOSCitedRecordID	wos000257942700032&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fa8IwED5U9rAnt-mY-yF58HGd2pgm2ZuIsoGIsE18K02aQh9mpeoY--t3SasyGIO9tU0p5bjw3Zf77g6gE0QyET5lnvIDiQSFUS-yonGD4Owr7UfcuCauUz6bieVSzitwf6iFMcY48Zl5sJculx9nemePyroDxgKkclWoch4UtVrH8xQZcDeqpMwg2IpNl-kMqBcI2_7PUnaJLocYWnbe2d-LgyJedp9fRsNCYmln6VYj_f5j7oqDnUn9fz98Bs1j_R6ZH5DpHCpmdQH1_QAHUu7nBvQX4-H0kSzS3FaRpF8mJuNPo50vkqHWCEkuC08wsiXTLFtvmvA2Gb-OnrxyhoIXITZvvVjELFG8H2lkLoPYV3SQcM5ZgsRKWKWZwWUMEZRiTFOqRYw73PAo4loo4Rt6CbVVtjJXQHyJbKunZTxgFMMOjZ_qJwkGSNIpZXstaKO1QksONqHjFj0ZWnMWoy4pbUHn7xdClacmaUHDWjJcF-02wtKI178_voHTQsJhD0ZuobbNd-YOTvTHNt3kbecj32wwr5s
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Na8JAEB3UFtqTbbXUftgcPDZVs9nsbm8iitJUhFrxFpLNBnKoStRS-us7u0alUAq9JdkQwjDLzNt5Mw-g4YUi4Q6hduR4AgEKJXaoSeMKg7MTSSdkygxx9dloxGczMS7Aw74XRillyGfqUV-aWn68kBt9VNZ0KfUQyhXhSCtn5d1ahxMV4TEjVpLXEHTPpql1esT2uB4AqEG7QKfDKJrP3tnd8z0nXjSHr93OlmSp1XSLoXz_obxiAk-__L9fPoPqoYPPGu9j0zkU1PwCyjsJByvf0RVoT3sd_8mappnuI0m_VGz1PpU03mh1pMSgZOrwFua2lr9YLFdVeOv3Jt2Bnaso2CFG57Ud85gmEWuHErGLGzsRcRPGGE0QWnHNNVO4jElCFFEqCZE8xj2uWBgyySPuKHIJpflirq7AcgTirZYUsUsJJh4SP9VOEkyRhOHKtmpQR2sFGh6sAoMuWiLQ5tyKXRJSg8bfLwRRlqqkBhVtyWC5HbgR5Ea8_v3xPZwMJi9-4A9HzzdwuiV06GOSWyits426g2P5sU5XWd34yzfMUrLk
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+35th+Annual+International+Symposium+on+Computer+Architecture&rft.atitle=VEAL&rft.au=Clark%2C+Nathan&rft.au=Hormati%2C+Amir&rft.au=Mahlke%2C+Scott&rft.series=ACM+Conferences&rft.date=2008-06-01&rft.pub=IEEE+Computer+Society&rft.isbn=9780769531748&rft.spage=389&rft.epage=400&rft_id=info:doi/10.1109%2FISCA.2008.33
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6897&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6897&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6897&client=summon