Efficient Selection of Vector Instructions Using Dynamic Programming

Accelerating program performance via SIMD vector units is very common in modern processors, as evidenced by the use of SSE, MMX, VSE, and VSX SIMD instructions in multimedia, scientific, and embedded applications. To take full advantage of the vector capabilities, a compiler needs to generate effici...

Full description

Saved in:

Bibliographic Details
Published in:	2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture pp. 201 - 212
Main Authors:	Barik, Rajkishore, Jisheng Zhao, Sarkar, Vivek
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.12.2010
Subjects:	Benchmark testing Dynamic compiler Dynamic Optimization Dynamic programming Heuristic algorithms Instruction Selection Optimization Registers Tiles Vectorization
ISBN:	1424490715, 9781424490714
ISSN:	1072-4451
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Accelerating program performance via SIMD vector units is very common in modern processors, as evidenced by the use of SSE, MMX, VSE, and VSX SIMD instructions in multimedia, scientific, and embedded applications. To take full advantage of the vector capabilities, a compiler needs to generate efficient vector code automatically. However, most commercial and open-source compilers fall short of using the full potential of vector units, and only generate vector code for simple innermost loops. In this paper, we present the design and implementation of an auto-vectorization framework in the back-end of a dynamic compiler that not only generates optimized vector code but is also well integrated with the instruction scheduler and register allocator. The framework includes a novel compile-time efficient dynamic programming-based vector instruction selection algorithm for straight-line code that expands opportunities for vectorization in the following ways: (1) scalar packing explores opportunities of packing multiple scalar variables into short vectors; (2) judicious use of shuffle and horizontal vector operations, when possible; and (3) algebraic reassociation expands opportunities for vectorization by algebraic simplification. We report performance results on the impact of autovectorization on a set of standard numerical benchmarks using the Jikes RVM dynamic compilation environment. Our results show performance improvement of up to 57.71% on an Intel Xeon processor, compared to non-vectorized execution, with a modest increase in compile-time in the range from 0.87% to 9.992%. An investigation of the SIMD parallelization performed by v11.1 of the Intel Fortran Compiler (IFC) on three benchmarks shows that our system achieves speedup with vectorization in all three cases and IFC does not. Finally, a comparison of our approach with an implementation of the Superword Level Parallelization (SLP) algorithm from, shows that our approach yields a performance improvement of up to 13.78% relative to SLP.
AbstractList	Accelerating program performance via SIMD vector units is very common in modern processors, as evidenced by the use of SSE, MMX, VSE, and VSX SIMD instructions in multimedia, scientific, and embedded applications. To take full advantage of the vector capabilities, a compiler needs to generate efficient vector code automatically. However, most commercial and open-source compilers fall short of using the full potential of vector units, and only generate vector code for simple innermost loops. In this paper, we present the design and implementation of an auto-vectorization framework in the back-end of a dynamic compiler that not only generates optimized vector code but is also well integrated with the instruction scheduler and register allocator. The framework includes a novel compile-time efficient dynamic programming-based vector instruction selection algorithm for straight-line code that expands opportunities for vectorization in the following ways: (1) scalar packing explores opportunities of packing multiple scalar variables into short vectors; (2) judicious use of shuffle and horizontal vector operations, when possible; and (3) algebraic reassociation expands opportunities for vectorization by algebraic simplification. We report performance results on the impact of autovectorization on a set of standard numerical benchmarks using the Jikes RVM dynamic compilation environment. Our results show performance improvement of up to 57.71% on an Intel Xeon processor, compared to non-vectorized execution, with a modest increase in compile-time in the range from 0.87% to 9.992%. An investigation of the SIMD parallelization performed by v11.1 of the Intel Fortran Compiler (IFC) on three benchmarks shows that our system achieves speedup with vectorization in all three cases and IFC does not. Finally, a comparison of our approach with an implementation of the Superword Level Parallelization (SLP) algorithm from, shows that our approach yields a performance improvement of up to 13.78% relative to SLP.
Author	Barik, Rajkishore Jisheng Zhao Sarkar, Vivek
Author_xml	– sequence: 1 givenname: Rajkishore surname: Barik fullname: Barik, Rajkishore email: rajkishore.barik@intel.com organization: Intel Corp., Santa Clara, CA, USA – sequence: 2 surname: Jisheng Zhao fullname: Jisheng Zhao email: jisheng.zhao@rice.edu organization: Rice Univ., Houston, TX, USA – sequence: 3 givenname: Vivek surname: Sarkar fullname: Sarkar, Vivek email: vsarkar@rice.edu organization: Rice Univ., Houston, TX, USA
BookMark	eNotjD1PwzAURS1RJNrSkYnFfyDFn3nxiNoCkYqKgLJWtvNcGTUOcsLQf08E3OXec4Y7I5PUJSTkhrMl58zcPder191SsJFldUFmXAmlDAOuJ2TKGYhCKc2vyKLvP9kYLQC0mJL1JoToI6aBvuEJ_RC7RLtAP8bZZVqnfsjfv7an-z6mI12fk22jpy-5O2bbtqO7JpfBnnpc_Pec7B8276unYrt7rFf328IKoYbCVa7iYLiHwBoFrLSWSQQD3EvJhDDgOXqsSu_AhcCsKBsHVrAG0BmFck5u_34jIh6-cmxtPh90abSWIH8AW59MeA
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/MICRO.2010.38
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EndPage	212
ExternalDocumentID	5695537
Genre	orig-research
GroupedDBID	-~X 123 29O 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ADZIZ AFFNX ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IPLJI M43 OCL RIE RIL RNS
ID	FETCH-LOGICAL-a224t-b8b81791c7f0d4706aa03e7971c3302297c1ece86cb7bff0a26db7a20d7eb94e3
IEDL.DBID	RIE
ISBN	1424490715 9781424490714
ISSN	1072-4451
IngestDate	Wed Aug 27 03:19:11 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a224t-b8b81791c7f0d4706aa03e7971c3302297c1ece86cb7bff0a26db7a20d7eb94e3
PageCount	12
ParticipantIDs	ieee_primary_5695537
PublicationCentury	2000
PublicationDate	2010-12
PublicationDateYYYYMMDD	2010-12-01
PublicationDate_xml	– month: 12 year: 2010 text: 2010-12
PublicationDecade	2010
PublicationTitle	2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
PublicationTitleAbbrev	micro
PublicationYear	2010
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0000527752 ssj0008695
Score	1.9578493
Snippet	Accelerating program performance via SIMD vector units is very common in modern processors, as evidenced by the use of SSE, MMX, VSE, and VSX SIMD instructions...
SourceID	ieee
SourceType	Publisher
StartPage	201
SubjectTerms	Benchmark testing Dynamic compiler Dynamic Optimization Dynamic programming Heuristic algorithms Instruction Selection Optimization Registers Tiles Vectorization
Title	Efficient Selection of Vector Instructions Using Dynamic Programming
URI	https://ieeexplore.ieee.org/document/5695537
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ09T8MwEIZPbcXAVKBFfMsDI6GOk_jimVLBUiq-1K2ynYvEQIto4fdjO2nCwMKWeLBiW9HlLvc-L8ClIhUXvgRmMpG6BAWTSOXWRKX1-LCSez5MMJvA6TSfz9WsA1eNFoaIQvMZXfvL8C-_WNkvXyobZVJlWYJd6CLKSqvV1FN4JhCzFh2Vy-C44rIbEXkI11bU5ZLBONuynur7tIVvjtziHx-qli-vWflluRIizqT_v2fdg2Er3WOzJijtQ4eWB9Dfejew-lUewPg2sCPcHOwpWOG482Grkr2GKj67b8GyaxbaCti48q73s_uOrnc3NoSXye3zzV1UOypE2oXqTWRyk3seqcWSFylyqTVPCBXGNklcNFdoY7KUS2vQlCXXQhYGteAFklEpJYfQW66WdARMcO2-nLTUOdepVtIIkwhF5L3VpTDiGAZ-UxYfFTRjUe_Hyd_Dp7Armj6RM-i5JdI57Njvzdv68yKc9A_-EaHp
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ3PT8IwFMdfEE30hArG3_bg0UnXre16FghERKJouJG26xIPMiPo32_bjc2DF29bD83aZnl7b-_7-QJcCyPC1JXAFCWxTVB4FIhEqyDTDh-WYceH8WYTfDJJ5nMxbcBNpYUxxvjmM3PrLv2__DTXX65U1qVMUBrxLdimcUxwodaqKiqYEs5pDY9KmPdcsfkNCRyGayPrsulgSDe0p_I-rvGbXbv8p8ei6cupVn6ZrviYM2j972n3oVOL99C0CksH0DDLQ2ht3BtQ-TK3odf39Ag7B3r2Zjj2hFCeoVdfx0ejGi27Qr6xAPUK93o3u-vperdjHXgZ9Gd3w6D0VAikDdbrQCUqcURSzTOcxhwzKXFkuOChjiIbzwXXodEmYVpxlWVYEpYqLglOuVEiNtERNJf50hwDIljabyfJZIJlLAVTREVEGOPc1RlR5ATablMWHwU2Y1Hux-nfw1ewO5w9jBfj0eT-DPZI1TVyDk27XHMBO_p7_bb6vPSn_gOgnqUw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2010+43rd+Annual+IEEE%2FACM+International+Symposium+on+Microarchitecture&rft.atitle=Efficient+Selection+of+Vector+Instructions+Using+Dynamic+Programming&rft.au=Barik%2C+Rajkishore&rft.au=Jisheng+Zhao&rft.au=Sarkar%2C+Vivek&rft.date=2010-12-01&rft.pub=IEEE&rft.isbn=9781424490714&rft.issn=1072-4451&rft.spage=201&rft.epage=212&rft_id=info:doi/10.1109%2FMICRO.2010.38&rft.externalDocID=5695537
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1072-4451&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1072-4451&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1072-4451&client=summon