TOD-Tree: Task-Overlapped Direct Send Tree Image Compositing for Hybrid MPI Parallelism and GPUs

Modern supercomputers have thousands of nodes, each with CPUs and/or GPUs capable of several teraflops. However, the network connecting these nodes is relatively slow, on the order of gigabits per second. For time-critical workloads such as interactive visualization, the bottleneck is no longer comp...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE transactions on visualization and computer graphics Ročník 23; číslo 6; s. 1677 - 1690
Hlavní autori:	Pascal Grosset, A. V., Prasad, Manasa, Christensen, Cameron, Knoll, Aaron, Hansen, Charles
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	United States IEEE 01.06.2017 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	Algorithms Communication Computation Computer Science Data visualization Distributed volume rendering Graphics processing units image compositing Loading Message systems Nodes Parallel processing Rendering (computer graphics) Supercomputers Workflow
ISSN:	1077-2626, 1941-0506
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Modern supercomputers have thousands of nodes, each with CPUs and/or GPUs capable of several teraflops. However, the network connecting these nodes is relatively slow, on the order of gigabits per second. For time-critical workloads such as interactive visualization, the bottleneck is no longer computation but communication. In this paper, we present an image compositing algorithm that works on both CPU-only and GPU-accelerated supercomputers and focuses on communication avoidance and overlapping communication with computation at the expense of evenly balancing the workload. The algorithm has three stages: a parallel direct send stage, followed by a tree compositing stage and a gather stage. We compare our algorithm with radix-k and binary-swap from the IceT library in a hybrid OpenMP/MPI setting on the Stampede and Edison supercomputers, show strong scaling results and explain how we generally achieve better performance than these two algorithms. We developed a GPU-based image compositing algorithm where we use CUDA kernels for computation and GPU Direct RDMA for inter-node GPU communication. We tested the algorithm on the Piz Daint GPU-accelerated supercomputer and show that we achieve performance on par with CPUs. Last, we introduce a workflow in which both rendering and compositing are done on the GPU.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 NA0002375; SC0007446 USDOE National Nuclear Security Administration (NNSA)
ISSN:	1077-2626 1941-0506
DOI:	10.1109/TVCG.2016.2542069