An Energy and Performance Efficient DVFS Scheme for Irregular Parallel Divide-and-Conquer Algorithms on the Intel SCC

The divide-and-conquer paradigm can be used to express many computationally significant problems, but an important subset of these applications is inherently load-imbalanced. Load balancing is a challenge for irregular parallel divide-and-conquer algorithms and efficiently solving these applications...

Full description

Saved in:
Bibliographic Details
Published in:IEEE computer architecture letters Vol. 13; no. 1; pp. 13 - 16
Main Authors: Yu-Liang Chou, Shaoshan Liu, Eui-Young Chung, Gaudiot, Jeen-Luc
Format: Journal Article
Language:English
Published: New York IEEE 01.01.2014
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1556-6056, 1556-6064
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The divide-and-conquer paradigm can be used to express many computationally significant problems, but an important subset of these applications is inherently load-imbalanced. Load balancing is a challenge for irregular parallel divide-and-conquer algorithms and efficiently solving these applications will be a key requirement for future many-core systems. To address the load imbalance issue, instead of attempting to dynamically balancing the workloads, this paper proposes an energy and performance efficient Dynamic Voltage and Frequency Scaling (DVFS) scheduling scheme, which takes into account the load imbalance behavior exhibited by these applications. More specifically, we examine the core of the divide-and-conquer paradigm and determine that the base-case-reached point where recursion stops is a suitable place in a divide-and-conquer paradigm to apply the proposed DVFS scheme. To evaluate the proposed scheme, we implement four representative irregular parallel divide-and-conquer algorithms, tree traversal, quicksort, finding primes, and n-queens puzzle, on the Intel Single-chip Cloud Computer (SCC) many-core machine. We demonstrate that, on average, the proposed scheme can improve performance by 41% while reducing energy consumption by 36% compared to the baseline running the whole computation with the default frequency configuration (400MHz).
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1556-6056
1556-6064
DOI:10.1109/L-CA.2013.1