Parallel Branch-and-Bound in multi-core multi-CPU multi-GPU heterogeneous environments

We investigate the design of parallel B&B in large scale heterogeneous compute environments where processing units can be composed of a mixture of multiple shared memory cores, multiple distributed CPUs and multiple GPUs devices. We describe two approaches addressing the critical issue of how to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Future generation computer systems Jg. 56; S. 95 - 109
Hauptverfasser: Vu, Trong-Tuan, Derbel, Bilel
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.03.2016
Elsevier
Schlagworte:
ISSN:0167-739X, 1872-7115
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We investigate the design of parallel B&B in large scale heterogeneous compute environments where processing units can be composed of a mixture of multiple shared memory cores, multiple distributed CPUs and multiple GPUs devices. We describe two approaches addressing the critical issue of how to map B&B workload with the different levels of parallelism exposed by the target compute platform. We also contribute a throughout large scale experimental study which allows us to derive a comprehensive and fair analysis of the proposed approaches under different system configurations using up to 16 GPUs and up to 512 distributed cores. Our results shed more light on the main challenges one has to face when tackling B&B algorithms while describing efficient techniques to address them. In particular, we are able to obtain linear speed-ups at moderate scales where adaptive load balancing among the heterogeneous compute resources is shown to have a significant impact on performance. At the largest scales, intra-node parallelism and hybrid decentralized load balancing is shown to have a crucial importance in order to alleviate locking issues among shared memory threads and to scale the distributed resources while optimizing communication costs and minimizing idle times. •Key challenges in parallelizing Branch-and-Bound for large scale systems.•Heterogeneous load balancing for parallel Branch-and-Bound.•Shared and distributed memory hybrid work stealing.•CPU–GPU distributed B&B protocol with near-optimal speedup.
ISSN:0167-739X
1872-7115
DOI:10.1016/j.future.2015.10.009