Achieving Scalability in a k-NN Multi-GPU Network Service with Centaur

Centaur is a GPU-centric architecture for building a low-latency approximate k-Nearest-Neighbors network server. We implement a multi-GPU distributed data flow runtime which enables efficient and scalable network request processing on GPUs. The runtime eliminates GPU management overheads from the CP...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings / International Conference on Parallel Architectures and Compilation Techniques s. 245 - 257
Hlavní autoři: Watad, Amir, Libov, Alexander, Shacham, Ohad, Bortnikov, Edward, Silberstein, Mark
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.09.2019
Témata:
ISSN:2641-7936
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Centaur is a GPU-centric architecture for building a low-latency approximate k-Nearest-Neighbors network server. We implement a multi-GPU distributed data flow runtime which enables efficient and scalable network request processing on GPUs. The runtime eliminates GPU management overheads from the CPU, making the server throughput and response time largely agnostic to the CPU load, speed or the number of dedicated CPU cores. Our experiments systems show that our server achieves near-perfect scaling for 16 GPUs, beating the throughput of a highly-optimized CPU-driven server by 35% while maintaining about 2msec average request latency. Furthermore, it requires only a single CPU core to run, achieving over an order of magnitude higher throughput than the standard CPU-driven server architecture in this setting.
ISSN:2641-7936
DOI:10.1109/PACT.2019.00027