TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS
Uložené v:
| Názov: | TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS |
|---|---|
| Autori: | Ong Ye, Robert Pavel, Aaron L, Guang R. Gao |
| Prispievatelia: | The Pennsylvania State University CiteSeerX Archives |
| Zdroj: | http://www.capsl.udel.edu/pub/doc/memos/memo097.pdf. |
| Rok vydania: | 2010 |
| Zbierka: | CiteSeerX |
| Popis: | Operating Systems have been considered as a cornerstone of the modern computer system, and the conventional operating system model targets computers designed around the sequential execu-tion model. However, with the rapid progress of the multi-core/many-core technologies, we argue that OSes must be adapted to the underlying hardware platform to fully exploit parallelism. To illus-trate this, our paper reports a study on how to perform such an adaptation for the IBM BlueGene/P multi-core system. This paper’s major contributions are threefold. First, we have proposed a strategy to isolate the traditional OS functions to a single core of the BG/P multi-core chip, leaving the management of the remaining cores to a runtime software that is optimized to realize the parallel semantics of the user application according to a parallel program execution model. Second, we have ported the TNT (TiNy Thread) execution model to allow for further utilization of the BG/P compute cores. Finally, we have expanded the design framework described above to a multi-chip system designed for scalability to a large number of chips. An implementation of our method has been completed on the Surveyor BG/P machine operated by Argonne National Laboratory. Our experimental results provide insight into the strengths of this approach: (1) The performance of the TNT thread system shows comparable speedup to that of Pthreads running on the same hardware; (2) The distributed shared memory operates at 95 % of the experimental peak performance of the machine, with distance between nodes not being a sensitive factor; (3) The cost of thread creation shows a linear relationship as threads increase; (4) The cost of waiting at a barrier is constant and independent of the number of threads involved. 1 |
| Druh dokumentu: | text |
| Popis súboru: | application/pdf |
| Jazyk: | English |
| Relation: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.598.9285; http://www.capsl.udel.edu/pub/doc/memos/memo097.pdf |
| Dostupnosť: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.598.9285 http://www.capsl.udel.edu/pub/doc/memos/memo097.pdf |
| Rights: | Metadata may be used without restrictions as long as the oai identifier remains attached to it. |
| Prístupové číslo: | edsbas.5850C45F |
| Databáza: | BASE |
| Abstrakt: | Operating Systems have been considered as a cornerstone of the modern computer system, and the conventional operating system model targets computers designed around the sequential execu-tion model. However, with the rapid progress of the multi-core/many-core technologies, we argue that OSes must be adapted to the underlying hardware platform to fully exploit parallelism. To illus-trate this, our paper reports a study on how to perform such an adaptation for the IBM BlueGene/P multi-core system. This paper’s major contributions are threefold. First, we have proposed a strategy to isolate the traditional OS functions to a single core of the BG/P multi-core chip, leaving the management of the remaining cores to a runtime software that is optimized to realize the parallel semantics of the user application according to a parallel program execution model. Second, we have ported the TNT (TiNy Thread) execution model to allow for further utilization of the BG/P compute cores. Finally, we have expanded the design framework described above to a multi-chip system designed for scalability to a large number of chips. An implementation of our method has been completed on the Surveyor BG/P machine operated by Argonne National Laboratory. Our experimental results provide insight into the strengths of this approach: (1) The performance of the TNT thread system shows comparable speedup to that of Pthreads running on the same hardware; (2) The distributed shared memory operates at 95 % of the experimental peak performance of the machine, with distance between nodes not being a sensitive factor; (3) The cost of thread creation shows a linear relationship as threads increase; (4) The cost of waiting at a barrier is constant and independent of the number of threads involved. 1 |
|---|
Nájsť tento článok vo Web of Science