Hardware and software mechanisms for multithreading in uniprocessors and heterogeneous multiprocessors

This thesis proposes, develops, and evaluates hardware and software mechanisms that enhance the efficiency and performance of multithreading in uniprocessors and in heterogeneous multiprocessors. Hardware synchronization mechanisms are shown via simulation to provide a performance improvement betwee...

Celý popis

Uloženo v:
Podrobná bibliografie
Hlavní autor: Bradford, Jeffrey Powers
Médium: Dissertation
Jazyk:angličtina
Vydáno: ProQuest Dissertations & Theses 01.01.2001
Témata:
ISBN:9780493508450, 0493508457
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:This thesis proposes, develops, and evaluates hardware and software mechanisms that enhance the efficiency and performance of multithreading in uniprocessors and in heterogeneous multiprocessors. Hardware synchronization mechanisms are shown via simulation to provide a performance improvement between 0% and 400%, depending on the workload and the synchronization frequency, for a constrained simulation model. Further results show the decrease in available parallelism with increasing synchronization overhead. In addition, a VHDL implementation of the functionality required to support hardware synchronization is discussed. Novel context-switch criteria are proposed that allow processors to better tolerate memory-access latency. While this advantage has been discussed previously for multithreaded processors, this work is the first to examine the performance of context-switch criteria that are based on architectural features used to provide some latency tolerance such as out-of-order dispatch and lockup-free caches. Results from a detailed multiprocessor simulator show a performance improvement of up to 35% over no multithreading, although many criteria examined result in a performance decrease. “Virtual Processors” are shown to provide a performance advantage for applications that have been parallelized for a homogeneous multiprocessor executing on a heterogeneous multiprocessor. Three additional modification are also discussed that provide an ease-of-use or ease-of-design advantages: more efficient interrupt support, starting and stopping threads, and entry to the operating system. The performance improvement is measured using several scientific programs (from the SPLASH-2 benchmark suite) and one commercial program (C4.5, a decision tree induction application). This work is the first to use C4.5 as a benchmark application: thus, this thesis presents the first complete characterization of the memory hierarchy behavior of C4.5, presents the first parallelization of decision tree induction optimized for a ccNUMA architecture, characterizes the parallel version, and examines decision tree induction as a possible benchmark application.
Bibliografie:SourceType-Dissertations & Theses-1
ObjectType-Dissertation/Thesis-1
content type line 12
ISBN:9780493508450
0493508457