ILP and TLP in shared memory applications: A limit study
With the breakdown of Dennard scaling, future processor designs will be at the mercy of power limits as Chip MultiProcessor (CMP) designs scale out to many-cores. It is critical, therefore, that future CMPs be optimally designed in terms of performance efficiency with respect to power. A characteriz...
Uloženo v:
| Vydáno v: | PACT '14 : proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques : August 24-27, 2014, Edmonton, AB, Canada s. 113 - 125 |
|---|---|
| Hlavní autoři: | , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
01.08.2014
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | With the breakdown of Dennard scaling, future processor designs will be at the mercy of power limits as Chip MultiProcessor (CMP) designs scale out to many-cores. It is critical, therefore, that future CMPs be optimally designed in terms of performance efficiency with respect to power. A characterization analysis of future workloads is imperative to ensure maximum returns of performance per Watt consumed. Hence, a detailed analysis of emerging workloads is necessary to understand their characteristics with respect to hardware in terms of power and performance tradeoffs. In this paper, we conduct a limit study simultaneously analyzing the two dominant forms of parallelism exploited by modern computer architectures: Instruction Level Parallelism (ILP) and Thread Level Parallelism (TLP). This study gives insights into the upper bounds of performance that future architectures can achieve. Furthermore it identifies the bottlenecks of emerging workloads. To the best of our knowledge, our work is the first study that combines the two forms of parallelism into one study with modern applications. We evaluate the PARSEC multithreaded benchmark suite using a specialized trace-driven simulator. We make several contributions describing the high-level behavior of next-generation applications. For example, we show these applications contain up to a factor of 929× more ILP than what is currently being extracted from real machines. We then show the effects of breaking the application into increasing numbers of threads (exploiting TLP), instruction window size, realistic branch prediction, realistic memory latency, and thread dependencies on exploitable ILP. Our examination shows that theses benchmarks differed vastly from one another. As a result, we expect no single, homogeneous, micro-architecture will work optimally for all, arguing for reconfigurable, heterogeneous designs. |
|---|---|
| AbstractList | With the breakdown of Dennard scaling, future processor designs will be at the mercy of power limits as Chip MultiProcessor (CMP) designs scale out to many-cores. It is critical, therefore, that future CMPs be optimally designed in terms of performance efficiency with respect to power. A characterization analysis of future workloads is imperative to ensure maximum returns of performance per Watt consumed. Hence, a detailed analysis of emerging workloads is necessary to understand their characteristics with respect to hardware in terms of power and performance tradeoffs. In this paper, we conduct a limit study simultaneously analyzing the two dominant forms of parallelism exploited by modern computer architectures: Instruction Level Parallelism (ILP) and Thread Level Parallelism (TLP). This study gives insights into the upper bounds of performance that future architectures can achieve. Furthermore it identifies the bottlenecks of emerging workloads. To the best of our knowledge, our work is the first study that combines the two forms of parallelism into one study with modern applications. We evaluate the PARSEC multithreaded benchmark suite using a specialized trace-driven simulator. We make several contributions describing the high-level behavior of next-generation applications. For example, we show these applications contain up to a factor of 929× more ILP than what is currently being extracted from real machines. We then show the effects of breaking the application into increasing numbers of threads (exploiting TLP), instruction window size, realistic branch prediction, realistic memory latency, and thread dependencies on exploitable ILP. Our examination shows that theses benchmarks differed vastly from one another. As a result, we expect no single, homogeneous, micro-architecture will work optimally for all, arguing for reconfigurable, heterogeneous designs. |
| Author | Gratz, Paul V. Fatehi, Ehsan |
| Author_xml | – sequence: 1 givenname: Ehsan surname: Fatehi fullname: Fatehi, Ehsan email: efatehi@tamu.edu organization: Electr. & Comput. Eng., Texas A&M Univ., College Station, TX, USA – sequence: 2 givenname: Paul V. surname: Gratz fullname: Gratz, Paul V. email: pgratz@tamu.edu organization: Electr. & Comput. Eng., Texas A&M Univ., College Station, TX, USA |
| BookMark | eNotzEtLxDAUBeAICurYtQs3-QMdk9zm5W4YfAwUdDGuh9vmFiN90dRF_731sfo458C5Zuf90BNjt1JspSz0vTLKCSu3v3o4Y5m3bh0E_GR3ybKUPoUQa6Wt8lfMHco3jn3gx9XY8_SBEwXeUTdMC8dxbGONcxz69MB3vI1dnHmav8Jywy4abBNl_27Y-9Pjcf-Sl6_Ph_2uzFEVds61gdo0uqmwaKg2dYFGSweqoRCCRhQElUfwAhVYUq4iaQSCkrXVvtIVbNjd328kotM4xQ6n5WSd1s4DfANRG0XY |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/2628071.2628093 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781450328098 1450328091 |
| EndPage | 125 |
| ExternalDocumentID | 7855893 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK GUFHI LHSKQ RIE RIL |
| ID | FETCH-LOGICAL-a247t-563c6f5fba4fec6c4a651832feddd5aa0e3b9a390a237e28be160a321c759b5b3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 7 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000396396800011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:07:49 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a247t-563c6f5fba4fec6c4a651832feddd5aa0e3b9a390a237e28be160a321c759b5b3 |
| PageCount | 13 |
| ParticipantIDs | ieee_primary_7855893 |
| PublicationCentury | 2000 |
| PublicationDate | 2014-Aug. |
| PublicationDateYYYYMMDD | 2014-08-01 |
| PublicationDate_xml | – month: 08 year: 2014 text: 2014-Aug. |
| PublicationDecade | 2010 |
| PublicationTitle | PACT '14 : proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques : August 24-27, 2014, Edmonton, AB, Canada |
| PublicationTitleAbbrev | PACT |
| PublicationYear | 2014 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0001455729 |
| Score | 1.9461453 |
| Snippet | With the breakdown of Dennard scaling, future processor designs will be at the mercy of power limits as Chip MultiProcessor (CMP) designs scale out to... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 113 |
| SubjectTerms | Benchmark testing Instruction sets Instruction-Level Parallelism (ILP) Limits Multicore processing Parallel processing PThreads Thread-Level Parallelism (TLP) Transistors Upper bound |
| Title | ILP and TLP in shared memory applications: A limit study |
| URI | https://ieeexplore.ieee.org/document/7855893 |
| WOSCitedRecordID | wos000396396800011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB7a4sFT1VZ8k4NH1-5uMsmuNxGLgpQ9VOit5IkF3Uofgv_eZHdte_DiIUwmhwQSJjPJzHwDcB1TajJmMXIBgZBRyaLcSB7pjDIToxSeqYpNiNEom0zyogU3m1wYa20VfGZvQ7fy5Zu5XoevsoHIEL1-bUNbCF7nam3_UxiiNxQb9B7PDVIekF78GzDQ4FjeKZ9SaY9h93_rHkB_m4ZHio2COYSWLY-g-1uHgTRi2YPs-aUgsjRk7OmsJMu3EFVOPkIQ7TfZdVHfkXvyHlKaSAUr24fX4eP44SlqKiJEMmViFSGnmjt0SjJnNddMcgwy6awxBqWMLVW5pHksUypsmimb8FjSNNECc4WKHkOnnJf2BIjxV510KlUYMK_8lM4fjxaJoso3h6fQCxsx_axBL6bNHpz9PXwO-96SYHVk3AV0Vou1vYQ9_bWaLRdX1Un9AD_SlGI |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4gmugJFYxve_DoyrLttLvejJFARMIBE26kz0iii-Fh4r-3XVbw4MVD004PbdJmOm1n5vsArmNKTcosRi4gEDIqWZQZySOdUmZilMILBdmE6PfT0SgbVOBmnQtjrS2Cz-xtaBa-fDPVy_BV1hQporevW7AdmLPKbK3NjwpD9FfFEr_HS82EB6wX_woMdXAt_yJQKexHu_a_mfehsUnEI4O1iTmAis0PofbDxEBKxaxD2u0NiMwNGfp6kpP5a4grJ-8hjPaL_HZS35F78haSmkgBLNuAl_bj8KETlZwIkUyYWETIqeYOnZLMWc01kxyDVjprjEEpY0tVJmkWy4QKm6TKtngsadLSAjOFih5BNZ_m9hiI8YeddCpRGFCv_JDOb5AWLUWVLw5PoB4WYvyxgr0Yl2tw-nf3Fex2hs-9ca_bfzqDPX-vYKs4uXOoLmZLewE7-nMxmc8ui137Biztl6s |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=PACT+%2714+%3A+proceedings+of+the+23rd+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques+%3A+August+24-27%2C+2014%2C+Edmonton%2C+AB%2C+Canada&rft.atitle=ILP+and+TLP+in+shared+memory+applications%3A+A+limit+study&rft.au=Fatehi%2C+Ehsan&rft.au=Gratz%2C+Paul+V.&rft.date=2014-08-01&rft.pub=ACM&rft.spage=113&rft.epage=125&rft_id=info:doi/10.1145%2F2628071.2628093&rft.externalDocID=7855893 |