An Efficient Implementation of the Bellman-Ford Algorithm for Kepler GPU Architectures

Finding the shortest paths from a single source to all other vertices is a common problem in graph analysis. The Bellman-Ford's algorithm is the solution that solves such a single-source shortest path (SSSP) problem and better applies to be parallelized for many-core architectures. Nevertheless...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on parallel and distributed systems Vol. 27; no. 8; pp. 2222 - 2233
Main Authors:	Busato, Federico, Bombieri, Nicola
Format:	Journal Article
Language:	English
Published:	New York IEEE 01.08.2016 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Architecture Bellman-Ford Computation Computer architecture CUDA GPU Graphics processing units Graphs Heuristic algorithms Instruction sets Kepler Kernel Optimization Parallel processing Redundant Shortest-path problems SSSP Weight reduction
ISSN:	1045-9219, 1558-2183
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Finding the shortest paths from a single source to all other vertices is a common problem in graph analysis. The Bellman-Ford's algorithm is the solution that solves such a single-source shortest path (SSSP) problem and better applies to be parallelized for many-core architectures. Nevertheless, the high degree of parallelism is guaranteed at the cost of low work efficiency, which, compared to similar algorithms in literature (e.g., Dijkstra's) involves much more redundant work and a consequent waste of power consumption. This article presents a parallel implementation of the Bellman-Ford algorithm that exploits the architectural characteristics of recent GPU architectures (i.e., NVIDIA Kepler, Maxwell) to improve both performance and work efficiency. The article presents different optimizations to the implementation, which are oriented both to the algorithm and to the architecture. The experimental results show that the proposed implementation provides an average speedup of <inline-formula><tex-math notation="LaTeX">5 \times </tex-math> <inline-graphic xlink:type="simple" xlink:href="bombieri-ieq1-2485994.gif"/> </inline-formula> higher than the existing most efficient parallel implementations for SSSP, that it works on graphs where those implementations cannot work or are inefficient (e.g., graphs with negative weight edges, sparse graphs), and that it sensibly reduces the redundant work caused by the parallelization process.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2015.2485994