Efficient Selection of Vector Instructions Using Dynamic Programming

Accelerating program performance via SIMD vector units is very common in modern processors, as evidenced by the use of SSE, MMX, VSE, and VSX SIMD instructions in multimedia, scientific, and embedded applications. To take full advantage of the vector capabilities, a compiler needs to generate effici...

Full description

Saved in:
Bibliographic Details
Published in:2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture pp. 201 - 212
Main Authors: Barik, Rajkishore, Jisheng Zhao, Sarkar, Vivek
Format: Conference Proceeding
Language:English
Published: IEEE 01.12.2010
Subjects:
ISBN:1424490715, 9781424490714
ISSN:1072-4451
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Accelerating program performance via SIMD vector units is very common in modern processors, as evidenced by the use of SSE, MMX, VSE, and VSX SIMD instructions in multimedia, scientific, and embedded applications. To take full advantage of the vector capabilities, a compiler needs to generate efficient vector code automatically. However, most commercial and open-source compilers fall short of using the full potential of vector units, and only generate vector code for simple innermost loops. In this paper, we present the design and implementation of an auto-vectorization framework in the back-end of a dynamic compiler that not only generates optimized vector code but is also well integrated with the instruction scheduler and register allocator. The framework includes a novel compile-time efficient dynamic programming-based vector instruction selection algorithm for straight-line code that expands opportunities for vectorization in the following ways: (1) scalar packing explores opportunities of packing multiple scalar variables into short vectors; (2) judicious use of shuffle and horizontal vector operations, when possible; and (3) algebraic reassociation expands opportunities for vectorization by algebraic simplification. We report performance results on the impact of autovectorization on a set of standard numerical benchmarks using the Jikes RVM dynamic compilation environment. Our results show performance improvement of up to 57.71% on an Intel Xeon processor, compared to non-vectorized execution, with a modest increase in compile-time in the range from 0.87% to 9.992%. An investigation of the SIMD parallelization performed by v11.1 of the Intel Fortran Compiler (IFC) on three benchmarks shows that our system achieves speedup with vectorization in all three cases and IFC does not. Finally, a comparison of our approach with an implementation of the Superword Level Parallelization (SLP) algorithm from, shows that our approach yields a performance improvement of up to 13.78% relative to SLP.
AbstractList Accelerating program performance via SIMD vector units is very common in modern processors, as evidenced by the use of SSE, MMX, VSE, and VSX SIMD instructions in multimedia, scientific, and embedded applications. To take full advantage of the vector capabilities, a compiler needs to generate efficient vector code automatically. However, most commercial and open-source compilers fall short of using the full potential of vector units, and only generate vector code for simple innermost loops. In this paper, we present the design and implementation of an auto-vectorization framework in the back-end of a dynamic compiler that not only generates optimized vector code but is also well integrated with the instruction scheduler and register allocator. The framework includes a novel compile-time efficient dynamic programming-based vector instruction selection algorithm for straight-line code that expands opportunities for vectorization in the following ways: (1) scalar packing explores opportunities of packing multiple scalar variables into short vectors; (2) judicious use of shuffle and horizontal vector operations, when possible; and (3) algebraic reassociation expands opportunities for vectorization by algebraic simplification. We report performance results on the impact of autovectorization on a set of standard numerical benchmarks using the Jikes RVM dynamic compilation environment. Our results show performance improvement of up to 57.71% on an Intel Xeon processor, compared to non-vectorized execution, with a modest increase in compile-time in the range from 0.87% to 9.992%. An investigation of the SIMD parallelization performed by v11.1 of the Intel Fortran Compiler (IFC) on three benchmarks shows that our system achieves speedup with vectorization in all three cases and IFC does not. Finally, a comparison of our approach with an implementation of the Superword Level Parallelization (SLP) algorithm from, shows that our approach yields a performance improvement of up to 13.78% relative to SLP.
Author Barik, Rajkishore
Jisheng Zhao
Sarkar, Vivek
Author_xml – sequence: 1
  givenname: Rajkishore
  surname: Barik
  fullname: Barik, Rajkishore
  email: rajkishore.barik@intel.com
  organization: Intel Corp., Santa Clara, CA, USA
– sequence: 2
  surname: Jisheng Zhao
  fullname: Jisheng Zhao
  email: jisheng.zhao@rice.edu
  organization: Rice Univ., Houston, TX, USA
– sequence: 3
  givenname: Vivek
  surname: Sarkar
  fullname: Sarkar, Vivek
  email: vsarkar@rice.edu
  organization: Rice Univ., Houston, TX, USA
BookMark eNotjD1PwzAURS1RJNrSkYnFfyDFn3nxiNoCkYqKgLJWtvNcGTUOcsLQf08E3OXec4Y7I5PUJSTkhrMl58zcPder191SsJFldUFmXAmlDAOuJ2TKGYhCKc2vyKLvP9kYLQC0mJL1JoToI6aBvuEJ_RC7RLtAP8bZZVqnfsjfv7an-z6mI12fk22jpy-5O2bbtqO7JpfBnnpc_Pec7B8276unYrt7rFf328IKoYbCVa7iYLiHwBoFrLSWSQQD3EvJhDDgOXqsSu_AhcCsKBsHVrAG0BmFck5u_34jIh6-cmxtPh90abSWIH8AW59MeA
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/MICRO.2010.38
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 212
ExternalDocumentID 5695537
Genre orig-research
GroupedDBID -~X
123
29O
6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADZIZ
AFFNX
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-a224t-b8b81791c7f0d4706aa03e7971c3302297c1ece86cb7bff0a26db7a20d7eb94e3
IEDL.DBID RIE
ISBN 1424490715
9781424490714
ISSN 1072-4451
IngestDate Wed Aug 27 03:19:11 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a224t-b8b81791c7f0d4706aa03e7971c3302297c1ece86cb7bff0a26db7a20d7eb94e3
PageCount 12
ParticipantIDs ieee_primary_5695537
PublicationCentury 2000
PublicationDate 2010-12
PublicationDateYYYYMMDD 2010-12-01
PublicationDate_xml – month: 12
  year: 2010
  text: 2010-12
PublicationDecade 2010
PublicationTitle 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
PublicationTitleAbbrev micro
PublicationYear 2010
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000527752
ssj0008695
Score 1.9578493
Snippet Accelerating program performance via SIMD vector units is very common in modern processors, as evidenced by the use of SSE, MMX, VSE, and VSX SIMD instructions...
SourceID ieee
SourceType Publisher
StartPage 201
SubjectTerms Benchmark testing
Dynamic compiler
Dynamic Optimization
Dynamic programming
Heuristic algorithms
Instruction Selection
Optimization
Registers
Tiles
Vectorization
Title Efficient Selection of Vector Instructions Using Dynamic Programming
URI https://ieeexplore.ieee.org/document/5695537
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ09T8MwEIZPbcXAVKBFfMsDI6GOk_jimVLBUiq-1K2ynYvEQIto4fdjO2nCwMKWeLBiW9HlLvc-L8ClIhUXvgRmMpG6BAWTSOXWRKX1-LCSez5MMJvA6TSfz9WsA1eNFoaIQvMZXfvL8C-_WNkvXyobZVJlWYJd6CLKSqvV1FN4JhCzFh2Vy-C44rIbEXkI11bU5ZLBONuynur7tIVvjtziHx-qli-vWflluRIizqT_v2fdg2Er3WOzJijtQ4eWB9Dfejew-lUewPg2sCPcHOwpWOG482Grkr2GKj67b8GyaxbaCti48q73s_uOrnc3NoSXye3zzV1UOypE2oXqTWRyk3seqcWSFylyqTVPCBXGNklcNFdoY7KUS2vQlCXXQhYGteAFklEpJYfQW66WdARMcO2-nLTUOdepVtIIkwhF5L3VpTDiGAZ-UxYfFTRjUe_Hyd_Dp7Armj6RM-i5JdI57Njvzdv68yKc9A_-EaHp
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PT8IwFMdfEE30hArG3_bg0UnXre16FghERKJouJG26xIPMiPo32_bjc2DF29bD83aZnl7b-_7-QJcCyPC1JXAFCWxTVB4FIhEqyDTDh-WYceH8WYTfDJJ5nMxbcBNpYUxxvjmM3PrLv2__DTXX65U1qVMUBrxLdimcUxwodaqKiqYEs5pDY9KmPdcsfkNCRyGayPrsulgSDe0p_I-rvGbXbv8p8ei6cupVn6ZrviYM2j972n3oVOL99C0CksH0DDLQ2ht3BtQ-TK3odf39Ag7B3r2Zjj2hFCeoVdfx0ejGi27Qr6xAPUK93o3u-vperdjHXgZ9Gd3w6D0VAikDdbrQCUqcURSzTOcxhwzKXFkuOChjiIbzwXXodEmYVpxlWVYEpYqLglOuVEiNtERNJf50hwDIljabyfJZIJlLAVTREVEGOPc1RlR5ATablMWHwU2Y1Hux-nfw1ewO5w9jBfj0eT-DPZI1TVyDk27XHMBO_p7_bb6vPSn_gOgnqUw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2010+43rd+Annual+IEEE%2FACM+International+Symposium+on+Microarchitecture&rft.atitle=Efficient+Selection+of+Vector+Instructions+Using+Dynamic+Programming&rft.au=Barik%2C+Rajkishore&rft.au=Jisheng+Zhao&rft.au=Sarkar%2C+Vivek&rft.date=2010-12-01&rft.pub=IEEE&rft.isbn=9781424490714&rft.issn=1072-4451&rft.spage=201&rft.epage=212&rft_id=info:doi/10.1109%2FMICRO.2010.38&rft.externalDocID=5695537
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1072-4451&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1072-4451&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1072-4451&client=summon