ILP and TLP in shared memory applications: A limit study

With the breakdown of Dennard scaling, future processor designs will be at the mercy of power limits as Chip MultiProcessor (CMP) designs scale out to many-cores. It is critical, therefore, that future CMPs be optimally designed in terms of performance efficiency with respect to power. A characteriz...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:PACT '14 : proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques : August 24-27, 2014, Edmonton, AB, Canada s. 113 - 125
Hlavní autoři: Fatehi, Ehsan, Gratz, Paul V.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 01.08.2014
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract With the breakdown of Dennard scaling, future processor designs will be at the mercy of power limits as Chip MultiProcessor (CMP) designs scale out to many-cores. It is critical, therefore, that future CMPs be optimally designed in terms of performance efficiency with respect to power. A characterization analysis of future workloads is imperative to ensure maximum returns of performance per Watt consumed. Hence, a detailed analysis of emerging workloads is necessary to understand their characteristics with respect to hardware in terms of power and performance tradeoffs. In this paper, we conduct a limit study simultaneously analyzing the two dominant forms of parallelism exploited by modern computer architectures: Instruction Level Parallelism (ILP) and Thread Level Parallelism (TLP). This study gives insights into the upper bounds of performance that future architectures can achieve. Furthermore it identifies the bottlenecks of emerging workloads. To the best of our knowledge, our work is the first study that combines the two forms of parallelism into one study with modern applications. We evaluate the PARSEC multithreaded benchmark suite using a specialized trace-driven simulator. We make several contributions describing the high-level behavior of next-generation applications. For example, we show these applications contain up to a factor of 929× more ILP than what is currently being extracted from real machines. We then show the effects of breaking the application into increasing numbers of threads (exploiting TLP), instruction window size, realistic branch prediction, realistic memory latency, and thread dependencies on exploitable ILP. Our examination shows that theses benchmarks differed vastly from one another. As a result, we expect no single, homogeneous, micro-architecture will work optimally for all, arguing for reconfigurable, heterogeneous designs.
AbstractList With the breakdown of Dennard scaling, future processor designs will be at the mercy of power limits as Chip MultiProcessor (CMP) designs scale out to many-cores. It is critical, therefore, that future CMPs be optimally designed in terms of performance efficiency with respect to power. A characterization analysis of future workloads is imperative to ensure maximum returns of performance per Watt consumed. Hence, a detailed analysis of emerging workloads is necessary to understand their characteristics with respect to hardware in terms of power and performance tradeoffs. In this paper, we conduct a limit study simultaneously analyzing the two dominant forms of parallelism exploited by modern computer architectures: Instruction Level Parallelism (ILP) and Thread Level Parallelism (TLP). This study gives insights into the upper bounds of performance that future architectures can achieve. Furthermore it identifies the bottlenecks of emerging workloads. To the best of our knowledge, our work is the first study that combines the two forms of parallelism into one study with modern applications. We evaluate the PARSEC multithreaded benchmark suite using a specialized trace-driven simulator. We make several contributions describing the high-level behavior of next-generation applications. For example, we show these applications contain up to a factor of 929× more ILP than what is currently being extracted from real machines. We then show the effects of breaking the application into increasing numbers of threads (exploiting TLP), instruction window size, realistic branch prediction, realistic memory latency, and thread dependencies on exploitable ILP. Our examination shows that theses benchmarks differed vastly from one another. As a result, we expect no single, homogeneous, micro-architecture will work optimally for all, arguing for reconfigurable, heterogeneous designs.
Author Gratz, Paul V.
Fatehi, Ehsan
Author_xml – sequence: 1
  givenname: Ehsan
  surname: Fatehi
  fullname: Fatehi, Ehsan
  email: efatehi@tamu.edu
  organization: Electr. & Comput. Eng., Texas A&M Univ., College Station, TX, USA
– sequence: 2
  givenname: Paul V.
  surname: Gratz
  fullname: Gratz, Paul V.
  email: pgratz@tamu.edu
  organization: Electr. & Comput. Eng., Texas A&M Univ., College Station, TX, USA
BookMark eNotzEtLxDAUBeAICurYtQs3-QMdk9zm5W4YfAwUdDGuh9vmFiN90dRF_731sfo458C5Zuf90BNjt1JspSz0vTLKCSu3v3o4Y5m3bh0E_GR3ybKUPoUQa6Wt8lfMHco3jn3gx9XY8_SBEwXeUTdMC8dxbGONcxz69MB3vI1dnHmav8Jywy4abBNl_27Y-9Pjcf-Sl6_Ph_2uzFEVds61gdo0uqmwaKg2dYFGSweqoRCCRhQElUfwAhVYUq4iaQSCkrXVvtIVbNjd328kotM4xQ6n5WSd1s4DfANRG0XY
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/2628071.2628093
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781450328098
1450328091
EndPage 125
ExternalDocumentID 7855893
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
GUFHI
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a247t-563c6f5fba4fec6c4a651832feddd5aa0e3b9a390a237e28be160a321c759b5b3
IEDL.DBID RIE
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000396396800011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:07:49 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a247t-563c6f5fba4fec6c4a651832feddd5aa0e3b9a390a237e28be160a321c759b5b3
PageCount 13
ParticipantIDs ieee_primary_7855893
PublicationCentury 2000
PublicationDate 2014-Aug.
PublicationDateYYYYMMDD 2014-08-01
PublicationDate_xml – month: 08
  year: 2014
  text: 2014-Aug.
PublicationDecade 2010
PublicationTitle PACT '14 : proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques : August 24-27, 2014, Edmonton, AB, Canada
PublicationTitleAbbrev PACT
PublicationYear 2014
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0001455729
Score 1.9461453
Snippet With the breakdown of Dennard scaling, future processor designs will be at the mercy of power limits as Chip MultiProcessor (CMP) designs scale out to...
SourceID ieee
SourceType Publisher
StartPage 113
SubjectTerms Benchmark testing
Instruction sets
Instruction-Level Parallelism (ILP)
Limits
Multicore processing
Parallel processing
PThreads
Thread-Level Parallelism (TLP)
Transistors
Upper bound
Title ILP and TLP in shared memory applications: A limit study
URI https://ieeexplore.ieee.org/document/7855893
WOSCitedRecordID wos000396396800011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB7a4sFT1VZ8k4NH1-5uMsmuNxGLgpQ9VOit5IkF3Uofgv_eZHdte_DiIUwmhwQSJjPJzHwDcB1TajJmMXIBgZBRyaLcSB7pjDIToxSeqYpNiNEom0zyogU3m1wYa20VfGZvQ7fy5Zu5XoevsoHIEL1-bUNbCF7nam3_UxiiNxQb9B7PDVIekF78GzDQ4FjeKZ9SaY9h93_rHkB_m4ZHio2COYSWLY-g-1uHgTRi2YPs-aUgsjRk7OmsJMu3EFVOPkIQ7TfZdVHfkXvyHlKaSAUr24fX4eP44SlqKiJEMmViFSGnmjt0SjJnNddMcgwy6awxBqWMLVW5pHksUypsmimb8FjSNNECc4WKHkOnnJf2BIjxV510KlUYMK_8lM4fjxaJoso3h6fQCxsx_axBL6bNHpz9PXwO-96SYHVk3AV0Vou1vYQ9_bWaLRdX1Un9AD_SlGI
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4gmugJFYxve_DoyrLttLvejJFARMIBE26kz0iii-Fh4r-3XVbw4MVD004PbdJmOm1n5vsArmNKTcosRi4gEDIqWZQZySOdUmZilMILBdmE6PfT0SgbVOBmnQtjrS2Cz-xtaBa-fDPVy_BV1hQporevW7AdmLPKbK3NjwpD9FfFEr_HS82EB6wX_woMdXAt_yJQKexHu_a_mfehsUnEI4O1iTmAis0PofbDxEBKxaxD2u0NiMwNGfp6kpP5a4grJ-8hjPaL_HZS35F78haSmkgBLNuAl_bj8KETlZwIkUyYWETIqeYOnZLMWc01kxyDVjprjEEpY0tVJmkWy4QKm6TKtngsadLSAjOFih5BNZ_m9hiI8YeddCpRGFCv_JDOb5AWLUWVLw5PoB4WYvyxgr0Yl2tw-nf3Fex2hs-9ca_bfzqDPX-vYKs4uXOoLmZLewE7-nMxmc8ui137Biztl6s
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=PACT+%2714+%3A+proceedings+of+the+23rd+International+Conference+on+Parallel+Architectures+and+Compilation+Techniques+%3A+August+24-27%2C+2014%2C+Edmonton%2C+AB%2C+Canada&rft.atitle=ILP+and+TLP+in+shared+memory+applications%3A+A+limit+study&rft.au=Fatehi%2C+Ehsan&rft.au=Gratz%2C+Paul+V.&rft.date=2014-08-01&rft.pub=ACM&rft.spage=113&rft.epage=125&rft_id=info:doi/10.1145%2F2628071.2628093&rft.externalDocID=7855893