Efficient hardware/software co-designed schemes for low-power processors

Uloženo v:
Podrobná bibliografie
Název: Efficient hardware/software co-designed schemes for low-power processors
Autoři: López Muñoz, Pedro
Přispěvatelé: University/Department: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
Thesis Advisors: Latorre Salinas, Fernando, Gibert Codina, Enric
Zdroj: TDX (Tesis Doctorals en Xarxa)
Informace o vydavateli: Universitat Politècnica de Catalunya, 2014.
Rok vydání: 2014
Fyzický popis: 193 p.
Original Identifier: B 16002-2014
Popis: Nowadays, we are reaching a point where further improving single thread performance can only be done at the expenses of significantly increasing power consumption. Thus, multi-core chips have been adopted by the industry and the scientific community as a proven solution to improve performance with limited power consumption. However, the number of units to be integrated into a single die is limited by its area and power restrictions, and therefore the thread level parallelism (TLP) that could be exploited is also limited. One way to continue incrementing the number of core units is to reduce the complexity of each individual core at the cost of sacrificing instruction level parallelism (ILP). We face a design trade-off here: to dedicate the total available die area to put a lot of simple cores and favor TLP or to dedicate it to put fewer cores and favor ILP. Among the different solutions already studied in the literature to deal with this challenge, we selected hybrid hardware/software co-designed processors. This solution provides high single thread performance on simple low-power cores through a software dynamic binary optimizer tightly coupled with the hardware underneath. For this reason, we believe that hardware/software co-designed processors is an area that deserves special attention on the design of multi-core systems since it allows implementing multiple simple cores suitable to maximize TLP but sustaining better ILP than conventional pure hardware approaches. In particular, this thesis explores three different techniques to address some of the most relevant challenges on the design of a simple low-power hardware/software co-designed processor. The first technique is a profiling mechanism, named as LIU Profiler, able to detect hot code regions. It consists in a small hardware table that uses a novel replacement policy aimed at detecting hot code. Such simple hardware structure implements this mechanism and allows the software to apply heuristics when building code regions and applying optimizations. The LIU Profiler achieves 85.5% code coverage detection whereas similar profilers implementing traditional replacement policies reach up to 60% coverage requiring a 4x bigger table. Moreover, the LIU Profiler only increases by 1% the total area of a simple low-power processor and consumes less than 0.87% of the total processor power. The LIU Profiler enables improving single thread performance without significantly incrementing the area and power of the processor. The second technique is a rollback scheme aimed to support code reordering and aggressive speculative optimizations on hot code regions. It is named HRC and combines software and hardware mechanisms to checkpoint and to recover the architectural register state of the processor. When compared with pure hardware solutions that require doubling the number of registers, the proposal reduces by 11% the area of the processor and by 24.4% the register file power consumption, at the cost of only degrading 1% the performance. The third technique is a loop parallelization (LP) scheme that uses the software layer to dynamically detect loops of instructions and to prepare them to execute multiple iterations in parallel by using Simultaneous Multi-Threading threads. These are optimized by employing dedicated loop parallelization binary optimizations to speed-up loop execution. LP scheme uses novel fine-grain register communication and thread dynamic register binding technique, as well as already existing processor resources. It introduces small overheads to the system and even small loops and loops that iterate just a few times are able to get significant performance improvements. The execution time of the loops is improved by more than a 16.5% when compared to a fully optimized baseline. LP contributes positively to the integration of a high number of simple cores in the same die and it allows those cores to cooperate to some extent to continue exploiting ILP when necessary.
Description (Translated): DOCTORAT EN ARQUITECTURA I TECNOLOGIA DE COMPUTADORS (Pla 1998)
Druh dokumentu: Dissertation/Thesis
Popis souboru: application/pdf
Jazyk: English
DOI: 10.5821/dissertation-2117-95292
Přístupová URL adresa: http://hdl.handle.net/10803/144619
https://dx.doi.org/10.5821/dissertation-2117-95292
Rights: ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.
Přístupové číslo: edstdx.10803.144619
Databáze: TDX
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://hdl.handle.net/10803/144619#
    Name: EDS - TDX (s4221598)
    Category: fullText
    Text: View record in TDX
Header DbId: edstdx
DbLabel: TDX
An: edstdx.10803.144619
RelevancyScore: 1301
AccessLevel: 3
PubType: Dissertation/ Thesis
PubTypeId: dissertation
PreciseRelevancyScore: 1301.34826660156
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Efficient hardware/software co-designed schemes for low-power processors
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22López+Muñoz%2C+Pedro%22">López Muñoz, Pedro</searchLink>
– Name: Author
  Label: Contributors
  Group: Au
  Data: University/Department: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
– Name: Author
  Label: Thesis Advisors
  Group: Au
  Data: Latorre Salinas, Fernando<br />Gibert Codina, Enric
– Name: TitleSource
  Label: Source
  Group: Src
  Data: TDX (Tesis Doctorals en Xarxa)
– Name: Publisher
  Label: Publisher Information
  Group: PubInfo
  Data: Universitat Politècnica de Catalunya, 2014.
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2014
– Name: PhysDesc
  Label: Physical Description
  Group: PhysDesc
  Data: 193 p.
– Name: AN
  Label: Original Identifier
  Group: ID
  Data: B 16002-2014
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Nowadays, we are reaching a point where further improving single thread performance can only be done at the expenses of significantly increasing power consumption. Thus, multi-core chips have been adopted by the industry and the scientific community as a proven solution to improve performance with limited power consumption. However, the number of units to be integrated into a single die is limited by its area and power restrictions, and therefore the thread level parallelism (TLP) that could be exploited is also limited. One way to continue incrementing the number of core units is to reduce the complexity of each individual core at the cost of sacrificing instruction level parallelism (ILP). We face a design trade-off here: to dedicate the total available die area to put a lot of simple cores and favor TLP or to dedicate it to put fewer cores and favor ILP. Among the different solutions already studied in the literature to deal with this challenge, we selected hybrid hardware/software co-designed processors. This solution provides high single thread performance on simple low-power cores through a software dynamic binary optimizer tightly coupled with the hardware underneath. For this reason, we believe that hardware/software co-designed processors is an area that deserves special attention on the design of multi-core systems since it allows implementing multiple simple cores suitable to maximize TLP but sustaining better ILP than conventional pure hardware approaches. In particular, this thesis explores three different techniques to address some of the most relevant challenges on the design of a simple low-power hardware/software co-designed processor. The first technique is a profiling mechanism, named as LIU Profiler, able to detect hot code regions. It consists in a small hardware table that uses a novel replacement policy aimed at detecting hot code. Such simple hardware structure implements this mechanism and allows the software to apply heuristics when building code regions and applying optimizations. The LIU Profiler achieves 85.5% code coverage detection whereas similar profilers implementing traditional replacement policies reach up to 60% coverage requiring a 4x bigger table. Moreover, the LIU Profiler only increases by 1% the total area of a simple low-power processor and consumes less than 0.87% of the total processor power. The LIU Profiler enables improving single thread performance without significantly incrementing the area and power of the processor. The second technique is a rollback scheme aimed to support code reordering and aggressive speculative optimizations on hot code regions. It is named HRC and combines software and hardware mechanisms to checkpoint and to recover the architectural register state of the processor. When compared with pure hardware solutions that require doubling the number of registers, the proposal reduces by 11% the area of the processor and by 24.4% the register file power consumption, at the cost of only degrading 1% the performance. The third technique is a loop parallelization (LP) scheme that uses the software layer to dynamically detect loops of instructions and to prepare them to execute multiple iterations in parallel by using Simultaneous Multi-Threading threads. These are optimized by employing dedicated loop parallelization binary optimizations to speed-up loop execution. LP scheme uses novel fine-grain register communication and thread dynamic register binding technique, as well as already existing processor resources. It introduces small overheads to the system and even small loops and loops that iterate just a few times are able to get significant performance improvements. The execution time of the loops is improved by more than a 16.5% when compared to a fully optimized baseline. LP contributes positively to the integration of a high number of simple cores in the same die and it allows those cores to cooperate to some extent to continue exploiting ILP when necessary.
– Name: Abstract
  Label: Description (Translated)
  Group: Ab
  Data: DOCTORAT EN ARQUITECTURA I TECNOLOGIA DE COMPUTADORS (Pla 1998)
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Dissertation/Thesis
– Name: Format
  Label: File Description
  Group: SrcInfo
  Data: application/pdf
– Name: Language
  Label: Language
  Group: Lang
  Data: English
– Name: DOI
  Label: DOI
  Group: ID
  Data: 10.5821/dissertation-2117-95292
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="http://hdl.handle.net/10803/144619" linkWindow="_blank">http://hdl.handle.net/10803/144619</link><br /><link linkTarget="URL" linkTerm="https://dx.doi.org/10.5821/dissertation-2117-95292" linkWindow="_blank">https://dx.doi.org/10.5821/dissertation-2117-95292</link>
– Name: Copyright
  Label: Rights
  Group: Cpyrght
  Data: ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edstdx.10803.144619
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edstdx&AN=edstdx.10803.144619
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.5821/dissertation-2117-95292
    Languages:
      – Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 193
    Titles:
      – TitleFull: Efficient hardware/software co-designed schemes for low-power processors
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: López Muñoz, Pedro
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 17
              M: 03
              Type: published
              Y: 2014
ResultId 1