Cluster-based approach for improving graphics processing unit performance by inter streaming multiprocessors locality

Owing to a new platform for high performance and general-purpose computing, graphics processing unit (GPU) is one of the most promising candidates for faster improvement in peak processing speed, low latency and high performance. As GPUs employ multithreading to hide latency, there is a small privat...

Full description

Saved in:

Bibliographic Details
Published in:	Chronic diseases and translational medicine Vol. 9; no. 5; pp. 275 - 282
Main Authors:	Keshtegar, Mohammad Mahdi, Falahati, Hajar, Hessabi, Shaahin
Format:	Journal Article
Language:	English
Published:	Beijing The Institution of Engineering and Technology 01.09.2015 John Wiley & Sons, Inc
Subjects:	Architecture cache storage cluster‐based approach cluster‐based architecture Communication Computation Computer simulation Consumption energy consumption overhead Gain general‐purpose computing global memory GPU graphics processing unit performance Graphics processing units High performance computing Information sharing interstreaming multiprocessor locality miss events multiprocessing systems multithreading multi‐threading off‐chip memory requests on‐chip caches parallel processing pattern clustering Platforms power aware computing Power consumption private data cache public memory Similarity SIMT core SIMT cores single instruction multiple thread core Transistors interstreaming multiprocessor locality graphics processing unit performance cache storage SIMT cores SIMT core GPU parallel processing miss events power aware computing single instruction multiple thread core multithreading on-chip caches multiprocessing systems multi-threading public memory private data cache graphics processing units cluster-based architecture pattern clustering cluster-based approach general-purpose computing global memory off-chip memory requests high performance computing energy consumption overhead
ISSN:	1751-8601, 1751-861X, 2095-882X, 1751-861X, 2589-0514
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Owing to a new platform for high performance and general-purpose computing, graphics processing unit (GPU) is one of the most promising candidates for faster improvement in peak processing speed, low latency and high performance. As GPUs employ multithreading to hide latency, there is a small private data cache in each single instruction multiple thread (SIMT) core. Hence, these cores communicate in many applications through the global memory. Access to this public memory takes long time and consumes large amount of power. Moreover, the memory bandwidth is limited which is quite challenging in parallel processing. The missed memory requests in last level cache that are followed by accesses to the slow off-chip memory harm power and performance significantly. In this research, the authors introduce a light overhead mechanism to reduce off-chip memory requests which are triggering by miss events in on-chip caches. The authors propose a cluster-based architecture to capture the similarity of memory requests between SIMT cores and provide data for missed requests by adjacent cores. Simulation results reveal that the proposed architecture enhances the geometric mean of instructions per cycle by 6.3% for evaluated benchmarks, whereas the maximum gain is 22%. Furthermore, the geometric mean of total energy consumption overhead is 4.8% for evaluated applications.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1751-8601 1751-861X 2095-882X 1751-861X 2589-0514
DOI:	10.1049/iet-cdt.2014.0092