Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distri...

Full description

Saved in:

Bibliographic Details
Published in:	Scientific programming Vol. 2015; no. 2015; pp. 1 - 16
Main Authors:	Muddukrishna, Ananya, Brorsson, Mats, Jonsson, Peter A.
Format:	Journal Article
Language:	English
Published:	Cairo, Egypt Hindawi Publishing Corporation 01.01.2015 John Wiley & Sons, Inc
Subjects:	Application programming interfaces (API) Architectural design Architectural knowledge Benchmarking Computer architecture Data distribution Distributing Improve performance Many-core processors Microprocessors Multiprocessing systems Multitasking Network management Non uniform data Performance degradation Performance enhancement Policies Processor architectures Processors Scheduling Scheduling algorithms Scheduling techniques Software architecture Task scheduling
ISSN:	1058-9244, 1875-919X, 1875-919X
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1058-9244 1875-919X 1875-919X
DOI:	10.1155/2015/981759