Improving matrix-based dynamic programming on massively parallel accelerators

Dynamic programming techniques are well-established and employed by various practical algorithms, including the edit-distance algorithm or the dynamic time warping algorithm. These algorithms usually operate in an iteration-based manner where new values are computed from values of the previous itera...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information systems (Oxford) Jg. 64; S. 175 - 193
Hauptverfasser:	Bednárek, David, Brabec, Michal, Kruliš, Martin
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier Ltd 01.03.2017
Schlagworte:	Dynamic programming Dynamic time warping Edit distance GPU Intel Xeon Phi Multicore Parallel Multicore Edit distance Intel Xeon Phi Dynamic time warping Parallel Dynamic programming GPU
ISSN:	0306-4379, 1873-6076
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Dynamic programming techniques are well-established and employed by various practical algorithms, including the edit-distance algorithm or the dynamic time warping algorithm. These algorithms usually operate in an iteration-based manner where new values are computed from values of the previous iteration. The data dependencies enforce synchronization which limits possibilities for internal parallel processing. In this paper, we investigate parallel approaches to processing matrix-based dynamic programming algorithms on modern multicore CPUs, Intel Xeon Phi accelerators, and general purpose GPUs. We address both the problem of computing a single distance on large inputs and the problem of computing a number of distances of smaller inputs simultaneously (e.g., when a similarity query is being resolved). Our proposed solutions yielded significant improvements in performance and achieved speedup of two orders of magnitude when compared to the serial baseline. •Dynamic programming algorithms with matrix organization (e.g., Levenshtein distance).•Employing task parallelism and SIMD/SIMT vectorization.•Proposed hierarchical algorithm optimized for CPUs, Intel Xeon Phi devices, and GPUs.•Can be efficiently parallelized if inputs are large or many distances are computed.•Experiments also determine optimal configurations for current hardware.
ISSN:	0306-4379 1873-6076
DOI:	10.1016/j.is.2016.06.001