A 280mV-to-1.1V 256b reconfigurable SIMD vector permutation engine with 2-dimensional shuffle in 22nm CMOS
Energy-efficient SIMD permutation operations are key for maximizing high-performance microprocessor vector datapath utilization in multimedia, graphics, and signal processing workloads [1-3]. A wide SIMD vector permutation engine is required to achieve high-throughput data rearrangement operations o...
Gespeichert in:
| Veröffentlicht in: | 2012 IEEE International Solid-State Circuits Conference S. 178 - 180 |
|---|---|
| Hauptverfasser: | , , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.02.2012
|
| Schlagworte: | |
| ISBN: | 1467303763, 9781467303767 |
| ISSN: | 0193-6530 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Energy-efficient SIMD permutation operations are key for maximizing high-performance microprocessor vector datapath utilization in multimedia, graphics, and signal processing workloads [1-3]. A wide SIMD vector permutation engine is required to achieve high-throughput data rearrangement operations on large data sets, with scaled supply voltages to deliver high energy efficiency. An ultra-low-voltage reconfigurable 4-way to 32-way SIMD vector permutation engine consisting of a 32-entry × 256b 3-read/1-write ported register file with a 256b byte-wise any-to-any permute crossbar for 2-dimensional shuffle is fabricated in 22nm CMOS. The register file integrates a vertical shuffle across multiple entries into read/write operations, and includes clockless static reads with shared P/N dual-ended transmission gate (DETG) writes, improving register file V MIN by 250mV across PVT variations with a wide dynamic operating range of 280mV-1.1V. The permute crossbar implements an interleaved folded byte-wise multiplexer layout forming an any-to-any fully-connected tree to perform a horizontal shuffle with permute accumulate circuits, and includes vector flip-flops, stacked min-delay buffers, shared gates to average min-sized transistor variation, and ultra-low-voltage split-output (ULVS) level shifters improving logic V MIN by 150mV, while enabling peak energy efficiency of 585GOPS/W measured at 260mV, 50°C. The permutation engine occupies a dense layout of 0.048mm 2 (Fig. 10.1.7) while achieving: (i) nominal register file performance of 1.8GHz, 106mW measured at 0.9V, 50°C; (ii) robust register file functionality measured down to 280mV (subthreshold) with peak energy efficiency of 154GOPS/W; (iii) scalable permute crossbar performance of 2.9GHz, 69mW measured at 1.1V, 50°C with deep sub-threshold operation at 240mV, 10MHz consuming 19μW; and (iv) a 64b 4×4 matrix transpose algorithm with 53% energy savings and 42% improved peak throughput of 263Gbps measured at 1.8GHz, 0.9V. |
|---|---|
| AbstractList | Energy-efficient SIMD permutation operations are key for maximizing high-performance microprocessor vector datapath utilization in multimedia, graphics, and signal processing workloads [1-3]. A wide SIMD vector permutation engine is required to achieve high-throughput data rearrangement operations on large data sets, with scaled supply voltages to deliver high energy efficiency. An ultra-low-voltage reconfigurable 4-way to 32-way SIMD vector permutation engine consisting of a 32-entry × 256b 3-read/1-write ported register file with a 256b byte-wise any-to-any permute crossbar for 2-dimensional shuffle is fabricated in 22nm CMOS. The register file integrates a vertical shuffle across multiple entries into read/write operations, and includes clockless static reads with shared P/N dual-ended transmission gate (DETG) writes, improving register file V MIN by 250mV across PVT variations with a wide dynamic operating range of 280mV-1.1V. The permute crossbar implements an interleaved folded byte-wise multiplexer layout forming an any-to-any fully-connected tree to perform a horizontal shuffle with permute accumulate circuits, and includes vector flip-flops, stacked min-delay buffers, shared gates to average min-sized transistor variation, and ultra-low-voltage split-output (ULVS) level shifters improving logic V MIN by 150mV, while enabling peak energy efficiency of 585GOPS/W measured at 260mV, 50°C. The permutation engine occupies a dense layout of 0.048mm 2 (Fig. 10.1.7) while achieving: (i) nominal register file performance of 1.8GHz, 106mW measured at 0.9V, 50°C; (ii) robust register file functionality measured down to 280mV (subthreshold) with peak energy efficiency of 154GOPS/W; (iii) scalable permute crossbar performance of 2.9GHz, 69mW measured at 1.1V, 50°C with deep sub-threshold operation at 240mV, 10MHz consuming 19μW; and (iv) a 64b 4×4 matrix transpose algorithm with 53% energy savings and 42% improved peak throughput of 263Gbps measured at 1.8GHz, 0.9V. |
| Author | Sheikh, F. Kaul, H. Mathew, S. Hsu, S. Anders, M. Agarwal, A. Krishnamurthy, R. |
| Author_xml | – sequence: 1 givenname: S. surname: Hsu fullname: Hsu, S. organization: Intel, Hillsboro, OR, USA – sequence: 2 givenname: A. surname: Agarwal fullname: Agarwal, A. organization: Intel, Hillsboro, OR, USA – sequence: 3 givenname: M. surname: Anders fullname: Anders, M. organization: Intel, Hillsboro, OR, USA – sequence: 4 givenname: S. surname: Mathew fullname: Mathew, S. organization: Intel, Hillsboro, OR, USA – sequence: 5 givenname: H. surname: Kaul fullname: Kaul, H. organization: Intel, Hillsboro, OR, USA – sequence: 6 givenname: F. surname: Sheikh fullname: Sheikh, F. organization: Intel, Hillsboro, OR, USA – sequence: 7 givenname: R. surname: Krishnamurthy fullname: Krishnamurthy, R. organization: Intel, Hillsboro, OR, USA |
| BookMark | eNo1kMtuwjAURF2VSgXKD7Qb_4DTazvxY4nSFxKIRRBb5CTXYEQclIRW_ftWKl2N5khzFjMho9hGJOSRQ8I52OdFUeR5IoCLRHGtrFI3ZGa14anSEqTW6S2Z_BclR2QM3EqmMgn3ZNL3RwDIrDJjcpxTYaDZsqFlPOFbKjJV0g6rNvqwv3SuPCEtFqsX-onV0Hb0jF1zGdwQ2kgx7kNE-hWGAxWsDg3G_pe7E-0PF-9_lyFSIWJD89W6eCB33p16nF1zSjZvr5v8gy3X74t8vmTBwsCcS12muFKZN6kSOq1kqasa0cvagCkRwAkrtUlReo3GOGd1JXztRelBGzklT3_agIi7cxca133vrjfJH7giWg4 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ISSCC.2012.6176966 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore (IEEE/IET Electronic Library - IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 9781467303774 1467303771 9781467303743 1467303755 1467303747 9781467303750 |
| EndPage | 180 |
| ExternalDocumentID | 6176966 |
| Genre | orig-research |
| GroupedDBID | 29G 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS |
| ID | FETCH-LOGICAL-i90t-aa4a561665f846274c3b7cdeef3d808be00a293784e3f7e88aa97c2fdf2bf0783 |
| IEDL.DBID | RIE |
| ISBN | 1467303763 9781467303767 |
| ISSN | 0193-6530 |
| IngestDate | Wed Aug 27 03:50:30 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i90t-aa4a561665f846274c3b7cdeef3d808be00a293784e3f7e88aa97c2fdf2bf0783 |
| PageCount | 3 |
| ParticipantIDs | ieee_primary_6176966 |
| PublicationCentury | 2000 |
| PublicationDate | 2012-Feb. |
| PublicationDateYYYYMMDD | 2012-02-01 |
| PublicationDate_xml | – month: 02 year: 2012 text: 2012-Feb. |
| PublicationDecade | 2010 |
| PublicationTitle | 2012 IEEE International Solid-State Circuits Conference |
| PublicationTitleAbbrev | ISSCC |
| PublicationYear | 2012 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0005968 ssj0000703945 |
| Score | 1.9945806 |
| Snippet | Energy-efficient SIMD permutation operations are key for maximizing high-performance microprocessor vector datapath utilization in multimedia, graphics, and... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 178 |
| SubjectTerms | Energy efficiency Energy measurement Engines Frequency measurement Registers Vectors Voltage measurement |
| Title | A 280mV-to-1.1V 256b reconfigurable SIMD vector permutation engine with 2-dimensional shuffle in 22nm CMOS |
| URI | https://ieeexplore.ieee.org/document/6176966 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEN4g8aAXH2B8Zw8eXSjbdrd7NCiRRJCkhHAj2-2sYkIhQPn97m4LaOLFW9ukj2ynnW--mW8GoQcTFwP4PCWRYAEJWCpJQlNlaQ4vlFozLp1Q-I33-9F4LAYV9LjTwgCAKz6Dht10ufx0rnJLlTWNt2UGnh-gA85ZodXa8SnWdEWwh76hcDI4g2B8wkLfc6IuZuzZflHbXk_lPt-qaTzR7MZxu21LvmijvN2vuSvO7XRO_vfAp6i-1-_hwc4znaEKZOfo-EfrwRr6esI08mYjsp6TVqM1wgaUJNiFx3r6kS-togrH3d4z3jheHy_MLzwv8vYY3JWw5XAxJakdEFA098Crz1xrc-Y0w5RmM9zuvcd1NOy8DNuvpJy7QKbCWxMpA2lQFWOhNuDERK3KT7hKAbSfRl6UgOdJAxJ4FICvOUSRlIIrqlNNE22zgheoms0zuERYt7QfJKoFVk6XCCk0DyUIRhUoEAKuUM0u2mRRdNaYlOt1_ffhG3Rk30tRM32LqutlDnfoUG3W09Xy3pnDN662rak |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELVKQQIu7GLHB464uE5ix0dUqFrRlkqtKm6V44yhSKSobfr92E4XkLhwSyJlkTPJvHkzbwahWxsXAwQiJbHkIQl5qkjCUu1oDhopY7hQXijcEp1O_PoquyV0t9LCAIAvPoOK2_S5_HSsc0eV3Vtvyy0830CbURgyWqi1VoyKM14ZrsFvJL0QzmKYgPAooF7Wxa1Fu29q2e1psS-Wehoq75u9Xq3mir5YZXHDX5NXvOOp7_3vkffR8VrBh7sr33SASpAdot0fzQeP0McDZjH9HJDZmFQr1QG2sCTBPkA2o7d84jRVuNdsP-K5Z_bxl_2J50XmHoO_EnYsLmYkdSMCivYeePqeG2PPHGWYsewT19ovvWPUrz_1aw2ymLxARpLOiFKhsriK88hYeGLjVh0kQqcAJkhjGidAqbIwQcQhBEZAHCslhWYmNSwxLi94gsrZOINThE3VBGGiq-AEdYlU0ohIgeRMgwYp4QwduUUbfhW9NYaL9Tr_-_AN2m70261hq9l5vkA77h0VFdSXqDyb5HCFtvR8NppOrr1pfAPtnLDw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+IEEE+International+Solid-State+Circuits+Conference&rft.atitle=A+280mV-to-1.1V+256b+reconfigurable+SIMD+vector+permutation+engine+with+2-dimensional+shuffle+in+22nm+CMOS&rft.au=Hsu%2C+S.&rft.au=Agarwal%2C+A.&rft.au=Anders%2C+M.&rft.au=Mathew%2C+S.&rft.date=2012-02-01&rft.pub=IEEE&rft.isbn=9781467303767&rft.issn=0193-6530&rft.spage=178&rft.epage=180&rft_id=info:doi/10.1109%2FISSCC.2012.6176966&rft.externalDocID=6176966 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0193-6530&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0193-6530&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0193-6530&client=summon |

