Transforming TLP into DLP with the dynamic inter-thread vectorization architecture ; Transformer le TLP en DLP avec l'architecture de vectorisation dynamique inter-thread
Gespeichert in:
| Titel: | Transforming TLP into DLP with the dynamic inter-thread vectorization architecture ; Transformer le TLP en DLP avec l'architecture de vectorisation dynamique inter-thread |
|---|---|
| Autoren: | Kalathingal, Sajith |
| Weitere Verfasser: | Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Amdahl's Law is Forever (ALF), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-ARCHITECTURE (IRISA-D3), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Institut National de Recherche en Informatique et en Automatique (Inria)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université de Bretagne Sud (UBS)-École normale supérieure - Rennes (ENS Rennes)-Télécom Bretagne-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), Université de Rennes, André Seznec, Caroline Collange |
| Quelle: | https://theses.hal.science/tel-01426915 ; Hardware Architecture [cs.AR]. Université de Rennes, 2016. English. ⟨NNT : 2016REN1S133⟩. |
| Verlagsinformationen: | HAL CCSD |
| Publikationsjahr: | 2016 |
| Bestand: | Université de Rennes 1: Publications scientifiques (HAL) |
| Schlagwörter: | Simultaneous Multi-Threading, SPMD, SIMD, Dynamic inter-Thread vectorization, Microprocessor architectures, Multi-Threading simultané, Vectorisation dynamique, Architectures des microprocesseurs, [INFO.INFO-AR]Computer Science [cs]/Hardware Architecture [cs.AR] |
| Beschreibung: | Many modern microprocessors implement Simultaneous Multi-Threading (SMT) to improve the overall efficiency of superscalar CPU. SMT hides long latency operations by executing instructions from multiple threads simultaneously. SMT may execute threads of different processes, threads of the same processes or any combination of them. When the threads are from the same process, they often execute the same instructions with different data most of the time, especially in the case of Single-Program Multiple Data (SPMD) applications.Traditional SMT architecture exploit thread-level parallelism and with the use of SIMD execution units, they also support explicit data-level parallelism. SIMD execution is power efficient as the total number of instructions required to execute a complete program is significantly reduced. This instruction reduction is a factor of the width of SIMD execution units and the vectorization efficiency. Static vectorization efficiency depends on the programmer skill and the compiler. Often, the programs are not optimized for vectorization and hence it results in inefficient static vectorization by the compiler.In this thesis, we propose the Dynamic Inter-Thread vectorization Architecture (DITVA) to leverage the implicit data-level parallelism in SPMD applications by assembling dynamic vector instructions at runtime. DITVA optimizes an SIMD-enabled in-order SMT processor with inter-thread vectorization execution mode. When the threads are running in lockstep, similar instructions across threads are dynamically vectorized to form a SIMD instruction. The threads in the convergent paths share an instruction stream. When all the threads are in the convergent path, there is only a single stream of instructions. To optimize the performance in such cases, DITVA statically groups threads into fixed-size independently scheduled warps. DITVA leverages existing SIMD units and maintains binary compatibility with existing CPU architectures. ; De nombreux microprocesseurs modernes mettent en œuvre le ... |
| Publikationsart: | doctoral or postdoctoral thesis |
| Sprache: | English |
| Relation: | NNT: 2016REN1S133 |
| Verfügbarkeit: | https://theses.hal.science/tel-01426915 https://theses.hal.science/tel-01426915v3/document https://theses.hal.science/tel-01426915v3/file/KALATHINGAL_Sajith.pdf |
| Rights: | info:eu-repo/semantics/OpenAccess |
| Dokumentencode: | edsbas.44C83E9A |
| Datenbank: | BASE |
| Abstract: | Many modern microprocessors implement Simultaneous Multi-Threading (SMT) to improve the overall efficiency of superscalar CPU. SMT hides long latency operations by executing instructions from multiple threads simultaneously. SMT may execute threads of different processes, threads of the same processes or any combination of them. When the threads are from the same process, they often execute the same instructions with different data most of the time, especially in the case of Single-Program Multiple Data (SPMD) applications.Traditional SMT architecture exploit thread-level parallelism and with the use of SIMD execution units, they also support explicit data-level parallelism. SIMD execution is power efficient as the total number of instructions required to execute a complete program is significantly reduced. This instruction reduction is a factor of the width of SIMD execution units and the vectorization efficiency. Static vectorization efficiency depends on the programmer skill and the compiler. Often, the programs are not optimized for vectorization and hence it results in inefficient static vectorization by the compiler.In this thesis, we propose the Dynamic Inter-Thread vectorization Architecture (DITVA) to leverage the implicit data-level parallelism in SPMD applications by assembling dynamic vector instructions at runtime. DITVA optimizes an SIMD-enabled in-order SMT processor with inter-thread vectorization execution mode. When the threads are running in lockstep, similar instructions across threads are dynamically vectorized to form a SIMD instruction. The threads in the convergent paths share an instruction stream. When all the threads are in the convergent path, there is only a single stream of instructions. To optimize the performance in such cases, DITVA statically groups threads into fixed-size independently scheduled warps. DITVA leverages existing SIMD units and maintains binary compatibility with existing CPU architectures. ; De nombreux microprocesseurs modernes mettent en œuvre le ... |
|---|
Nájsť tento článok vo Web of Science