Zobrazit v EDS

Efficient hardware/software co-designed schemes for low-power processors

Uloženo v:

Podrobná bibliografie
Název:	Efficient hardware/software co-designed schemes for low-power processors
Autoři:	López Muñoz, Pedro
Přispěvatelé:	University/Department: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
Thesis Advisors:	Latorre Salinas, Fernando, Gibert Codina, Enric
Zdroj:	TDX (Tesis Doctorals en Xarxa)
Informace o vydavateli:	Universitat Politècnica de Catalunya, 2014.
Rok vydání:	2014
Fyzický popis:	193 p.
Original Identifier:	B 16002-2014
Popis:	Nowadays, we are reaching a point where further improving single thread performance can only be done at the expenses of significantly increasing power consumption. Thus, multi-core chips have been adopted by the industry and the scientific community as a proven solution to improve performance with limited power consumption. However, the number of units to be integrated into a single die is limited by its area and power restrictions, and therefore the thread level parallelism (TLP) that could be exploited is also limited. One way to continue incrementing the number of core units is to reduce the complexity of each individual core at the cost of sacrificing instruction level parallelism (ILP). We face a design trade-off here: to dedicate the total available die area to put a lot of simple cores and favor TLP or to dedicate it to put fewer cores and favor ILP. Among the different solutions already studied in the literature to deal with this challenge, we selected hybrid hardware/software co-designed processors. This solution provides high single thread performance on simple low-power cores through a software dynamic binary optimizer tightly coupled with the hardware underneath. For this reason, we believe that hardware/software co-designed processors is an area that deserves special attention on the design of multi-core systems since it allows implementing multiple simple cores suitable to maximize TLP but sustaining better ILP than conventional pure hardware approaches. In particular, this thesis explores three different techniques to address some of the most relevant challenges on the design of a simple low-power hardware/software co-designed processor. The first technique is a profiling mechanism, named as LIU Profiler, able to detect hot code regions. It consists in a small hardware table that uses a novel replacement policy aimed at detecting hot code. Such simple hardware structure implements this mechanism and allows the software to apply heuristics when building code regions and applying optimizations. The LIU Profiler achieves 85.5% code coverage detection whereas similar profilers implementing traditional replacement policies reach up to 60% coverage requiring a 4x bigger table. Moreover, the LIU Profiler only increases by 1% the total area of a simple low-power processor and consumes less than 0.87% of the total processor power. The LIU Profiler enables improving single thread performance without significantly incrementing the area and power of the processor. The second technique is a rollback scheme aimed to support code reordering and aggressive speculative optimizations on hot code regions. It is named HRC and combines software and hardware mechanisms to checkpoint and to recover the architectural register state of the processor. When compared with pure hardware solutions that require doubling the number of registers, the proposal reduces by 11% the area of the processor and by 24.4% the register file power consumption, at the cost of only degrading 1% the performance. The third technique is a loop parallelization (LP) scheme that uses the software layer to dynamically detect loops of instructions and to prepare them to execute multiple iterations in parallel by using Simultaneous Multi-Threading threads. These are optimized by employing dedicated loop parallelization binary optimizations to speed-up loop execution. LP scheme uses novel fine-grain register communication and thread dynamic register binding technique, as well as already existing processor resources. It introduces small overheads to the system and even small loops and loops that iterate just a few times are able to get significant performance improvements. The execution time of the loops is improved by more than a 16.5% when compared to a fully optimized baseline. LP contributes positively to the integration of a high number of simple cores in the same die and it allows those cores to cooperate to some extent to continue exploiting ILP when necessary.
Description (Translated):	DOCTORAT EN ARQUITECTURA I TECNOLOGIA DE COMPUTADORS (Pla 1998)
Druh dokumentu:	Dissertation/Thesis
Popis souboru:	application/pdf
Jazyk:	English
DOI:	10.5821/dissertation-2117-95292
Přístupová URL adresa:	http://hdl.handle.net/10803/144619 https://dx.doi.org/10.5821/dissertation-2117-95292
Rights:	ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs.
Přístupové číslo:	edstdx.10803.144619
Databáze:	TDX

View record in TDX

FullText	Text: Availability: 0 CustomLinks: – Url: http://hdl.handle.net/10803/144619# Name: EDS - TDX (s4221598) Category: fullText Text: View record in TDX
Header	DbId: edstdx DbLabel: TDX An: edstdx.10803.144619 RelevancyScore: 1301 AccessLevel: 3 PubType: Dissertation/ Thesis PubTypeId: dissertation PreciseRelevancyScore: 1301.34826660156
IllustrationInfo
Items	– Name: Title Label: Title Group: Ti Data: Efficient hardware/software co-designed schemes for low-power processors – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22López+Muñoz%2C+Pedro%22">López Muñoz, Pedro</searchLink> – Name: Author Label: Contributors Group: Au Data: University/Department: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors – Name: Author Label: Thesis Advisors Group: Au Data: Latorre Salinas, Fernando<br />Gibert Codina, Enric – Name: TitleSource Label: Source Group: Src Data: TDX (Tesis Doctorals en Xarxa) – Name: Publisher Label: Publisher Information Group: PubInfo Data: Universitat Politècnica de Catalunya, 2014. – Name: DatePubCY Label: Publication Year Group: Date Data: 2014 – Name: PhysDesc Label: Physical Description Group: PhysDesc Data: 193 p. – Name: AN Label: Original Identifier Group: ID Data: B 16002-2014 – Name: Abstract Label: Description Group: Ab Data: Nowadays, we are reaching a point where further improving single thread performance can only be done at the expenses of significantly increasing power consumption. Thus, multi-core chips have been adopted by the industry and the scientific community as a proven solution to improve performance with limited power consumption. However, the number of units to be integrated into a single die is limited by its area and power restrictions, and therefore the thread level parallelism (TLP) that could be exploited is also limited. One way to continue incrementing the number of core units is to reduce the complexity of each individual core at the cost of sacrificing instruction level parallelism (ILP). We face a design trade-off here: to dedicate the total available die area to put a lot of simple cores and favor TLP or to dedicate it to put fewer cores and favor ILP. Among the different solutions already studied in the literature to deal with this challenge, we selected hybrid hardware/software co-designed processors. This solution provides high single thread performance on simple low-power cores through a software dynamic binary optimizer tightly coupled with the hardware underneath. For this reason, we believe that hardware/software co-designed processors is an area that deserves special attention on the design of multi-core systems since it allows implementing multiple simple cores suitable to maximize TLP but sustaining better ILP than conventional pure hardware approaches. In particular, this thesis explores three different techniques to address some of the most relevant challenges on the design of a simple low-power hardware/software co-designed processor. The first technique is a profiling mechanism, named as LIU Profiler, able to detect hot code regions. It consists in a small hardware table that uses a novel replacement policy aimed at detecting hot code. Such simple hardware structure implements this mechanism and allows the software to apply heuristics when building code regions and applying optimizations. The LIU Profiler achieves 85.5% code coverage detection whereas similar profilers implementing traditional replacement policies reach up to 60% coverage requiring a 4x bigger table. Moreover, the LIU Profiler only increases by 1% the total area of a simple low-power processor and consumes less than 0.87% of the total processor power. The LIU Profiler enables improving single thread performance without significantly incrementing the area and power of the processor. The second technique is a rollback scheme aimed to support code reordering and aggressive speculative optimizations on hot code regions. It is named HRC and combines software and hardware mechanisms to checkpoint and to recover the architectural register state of the processor. When compared with pure hardware solutions that require doubling the number of registers, the proposal reduces by 11% the area of the processor and by 24.4% the register file power consumption, at the cost of only degrading 1% the performance. The third technique is a loop parallelization (LP) scheme that uses the software layer to dynamically detect loops of instructions and to prepare them to execute multiple iterations in parallel by using Simultaneous Multi-Threading threads. These are optimized by employing dedicated loop parallelization binary optimizations to speed-up loop execution. LP scheme uses novel fine-grain register communication and thread dynamic register binding technique, as well as already existing processor resources. It introduces small overheads to the system and even small loops and loops that iterate just a few times are able to get significant performance improvements. The execution time of the loops is improved by more than a 16.5% when compared to a fully optimized baseline. LP contributes positively to the integration of a high number of simple cores in the same die and it allows those cores to cooperate to some extent to continue exploiting ILP when necessary. – Name: Abstract Label: Description (Translated) Group: Ab Data: DOCTORAT EN ARQUITECTURA I TECNOLOGIA DE COMPUTADORS (Pla 1998) – Name: TypeDocument Label: Document Type Group: TypDoc Data: Dissertation/Thesis – Name: Format Label: File Description Group: SrcInfo Data: application/pdf – Name: Language Label: Language Group: Lang Data: English – Name: DOI Label: DOI Group: ID Data: 10.5821/dissertation-2117-95292 – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="http://hdl.handle.net/10803/144619" linkWindow="_blank">http://hdl.handle.net/10803/144619</link><br /><link linkTarget="URL" linkTerm="https://dx.doi.org/10.5821/dissertation-2117-95292" linkWindow="_blank">https://dx.doi.org/10.5821/dissertation-2117-95292</link> – Name: Copyright Label: Rights Group: Cpyrght Data: ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs. – Name: AN Label: Accession Number Group: ID Data: edstdx.10803.144619
PLink	https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edstdx&AN=edstdx.10803.144619
RecordInfo	BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.5821/dissertation-2117-95292 Languages: – Text: English PhysicalDescription: Pagination: PageCount: 193 Titles: – TitleFull: Efficient hardware/software co-designed schemes for low-power processors Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: López Muñoz, Pedro IsPartOfRelationships: – BibEntity: Dates: – D: 17 M: 03 Type: published Y: 2014
ResultId	1