Exploring novel many-core architectures for scientific computing

The rapid revolution in microprocessor chip architecture due to the many-core technology is presenting unprecedented challenges to the application developers as well as system software designers: how to best exploit the computation potential provided by such many-core architectures? The scope of thi...

Celý popis

Uloženo v:

Podrobná bibliografie
Hlavní autor:	Chen, Long
Médium:	Dissertation
Jazyk:	angličtina
Vydáno:	ProQuest Dissertations & Theses 01.01.2010
Témata:	Computer Engineering
ISBN:	9781124479415, 1124479414
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	The rapid revolution in microprocessor chip architecture due to the many-core technology is presenting unprecedented challenges to the application developers as well as system software designers: how to best exploit the computation potential provided by such many-core architectures? The scope of this dissertation is to study programming issues for many-core architectures, and the contributions of this dissertation are in two main areas. Optimizing the Fast Fourier Transform for IBM Cyclops-64. To understand issues in designing and developing high-performance algorithms for many-core architectures, we use the fast Fourier transform (FFT) as a case study to investigate the above issues on the IBM Cyclops-64 many-core chip architecture. We analyze the optimization challenges and opportunities for FFT problems, and identify domain-specific features of the target problems and match them well with some key many-core architecture features. We quantitatively address the impacts of various optimization techniques and effectiveness of the target architecture. The resulting FFT implementations achieve excellent performance results in terms of both speedup and absolute performance. To assist the algorithm design and performance analysis, we present a model that estimates the performance of parallel FFT algorithms for an abstract many-core architecture. This abstract architecture captures generic features and parameters of several real many-core architectures; therefore the performance model is applicable for any architecture with similar features. We derive the performance model based on cost functions for three main components of an execution: the memory accesses, the computation, and the synchronization. The experimental results demonstrate that our model can predict the performance trend accurately, and therefore can provides valuable insights for designing and tuning FFT algorithms on many-core architectures. Exploring Fine-grained Task-based Execution on Graphics Processing Unit-enabled Systems. Using many-core Graphics Processing Unit (GPU) is gaining popularity in scientific computing. However, the conventional data parallel GPU programming paradigms, e.g., NVIDIA CUDA, cannot satisfactorily address certain issues, such as load balancing, GPU resource utilization, overlapping fine-grained computation with communication, etc. The problem is exacerbated when trying to effectively exploit multiple GPUs concurrently, which are commonly available in many modern systems. Our solution to this problem is a fine-grained task-based execution framework for GPU-enabled systems. Our framework allows concurrent execution of fine-grained tasks on GPU-enabled systems. The granularity of task execution is finer than what is currently supported in CUDA; the execution of a task only requires a subset of the GPU hardware resources. Our framework provides means for solving the above issues and efficiently utilizing the computation power provided by the GPUs. We evaluate our approach using both micro-benchmarks and a molecular dynamics (MD) application that exhibits significant load imbalance. Experimental results with a single-GPU configuration show that our fine-grained task-based solution can utilize the hardware more efficiently than the CUDA scheduler for unbalanced workload. On multi-GPU systems, our solution achieves near-linear speedup, good dynamic load balance, and significant performance improvement over other techniques based on standard CUDA APIs.
AbstractList	The rapid revolution in microprocessor chip architecture due to the many-core technology is presenting unprecedented challenges to the application developers as well as system software designers: how to best exploit the computation potential provided by such many-core architectures? The scope of this dissertation is to study programming issues for many-core architectures, and the contributions of this dissertation are in two main areas. Optimizing the Fast Fourier Transform for IBM Cyclops-64. To understand issues in designing and developing high-performance algorithms for many-core architectures, we use the fast Fourier transform (FFT) as a case study to investigate the above issues on the IBM Cyclops-64 many-core chip architecture. We analyze the optimization challenges and opportunities for FFT problems, and identify domain-specific features of the target problems and match them well with some key many-core architecture features. We quantitatively address the impacts of various optimization techniques and effectiveness of the target architecture. The resulting FFT implementations achieve excellent performance results in terms of both speedup and absolute performance. To assist the algorithm design and performance analysis, we present a model that estimates the performance of parallel FFT algorithms for an abstract many-core architecture. This abstract architecture captures generic features and parameters of several real many-core architectures; therefore the performance model is applicable for any architecture with similar features. We derive the performance model based on cost functions for three main components of an execution: the memory accesses, the computation, and the synchronization. The experimental results demonstrate that our model can predict the performance trend accurately, and therefore can provides valuable insights for designing and tuning FFT algorithms on many-core architectures. Exploring Fine-grained Task-based Execution on Graphics Processing Unit-enabled Systems. Using many-core Graphics Processing Unit (GPU) is gaining popularity in scientific computing. However, the conventional data parallel GPU programming paradigms, e.g., NVIDIA CUDA, cannot satisfactorily address certain issues, such as load balancing, GPU resource utilization, overlapping fine-grained computation with communication, etc. The problem is exacerbated when trying to effectively exploit multiple GPUs concurrently, which are commonly available in many modern systems. Our solution to this problem is a fine-grained task-based execution framework for GPU-enabled systems. Our framework allows concurrent execution of fine-grained tasks on GPU-enabled systems. The granularity of task execution is finer than what is currently supported in CUDA; the execution of a task only requires a subset of the GPU hardware resources. Our framework provides means for solving the above issues and efficiently utilizing the computation power provided by the GPUs. We evaluate our approach using both micro-benchmarks and a molecular dynamics (MD) application that exhibits significant load imbalance. Experimental results with a single-GPU configuration show that our fine-grained task-based solution can utilize the hardware more efficiently than the CUDA scheduler for unbalanced workload. On multi-GPU systems, our solution achieves near-linear speedup, good dynamic load balance, and significant performance improvement over other techniques based on standard CUDA APIs.
Author	Chen, Long
Author_xml	– sequence: 1 givenname: Long surname: Chen fullname: Chen, Long
BookMark	eNotj01LxDAURQMqqGP_Q3BfSPNem3SnDOMHDLiZ_dC-vI6RTlKTVvTfW9DVXd1zz70VlyEGvhBFa2xVaUTTYlVfiyJn3yulWgCF-kY87L6nMSYfTjLELx7luQs_JcXEskv07memeUmc5RCTzOQ5zH7wJCmep2Vea3fiaujGzMV_bsThaXfYvpT7t-fX7eO-PFmAElkPTW-owVZ3tnF9ZR2xNq5B7ioajLWklQVqjEVWpFyvHAEArd7gADbi_g87pfi5cJ6PH3FJYV082hqxxXp99AtdwUgs
ContentType	Dissertation
Copyright	Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Copyright_xml	– notice: Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
DBID	053 0BH 0MV CBPLH EU9 G20 M8- PHGZT PKEHL PQEST PQQKQ PQUKI
DatabaseName	Dissertations & Theses Europe Full Text: Science & Technology ProQuest Dissertations and Theses Professional Dissertations & Theses @ University of Delaware ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection ProQuest Dissertations & Theses A&I ProQuest Dissertations & Theses Global ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic (retired) ProQuest One Academic UKI Edition
DatabaseTitle	Dissertations & Theses Europe Full Text: Science & Technology ProQuest One Academic Middle East (New) ProQuest One Academic UKI Edition Dissertations & Theses @ University of Delaware ProQuest One Academic Eastern Edition ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection ProQuest Dissertations and Theses Professional ProQuest One Academic ProQuest Dissertations & Theses A&I ProQuest One Academic (New) ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection ProQuest Dissertations & Theses Global
DatabaseTitleList	Dissertations & Theses Europe Full Text: Science & Technology
Database_xml	– sequence: 1 dbid: G20 name: ProQuest Dissertations & Theses Global url: https://www.proquest.com/pqdtglobal1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2280613851
Genre	Dissertation/Thesis
GroupedDBID	053 0BH 0MV 8R4 8R5 CBPLH EU9 G20 M8- PHGZT PKEHL PQEST PQQKQ PQUKI Q2X
ID	FETCH-LOGICAL-g833-4e2f6b7c6492a86db18dce27d64ea1cf788c2083c6784e0c0db0dc333c7943d33
IEDL.DBID	G20
ISBN	9781124479415 1124479414
IngestDate	Mon Jun 30 05:15:51 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-g833-4e2f6b7c6492a86db18dce27d64ea1cf788c2083c6784e0c0db0dc333c7943d33
Notes	SourceType-Dissertations & Theses-1 ObjectType-Dissertation/Thesis-1 content type line 12
PQID	854494593
PQPubID	18750
ParticipantIDs	proquest_journals_854494593
PublicationCentury	2000
PublicationDate	20100101
PublicationDateYYYYMMDD	2010-01-01
PublicationDate_xml	– month: 01 year: 2010 text: 20100101 day: 01
PublicationDecade	2010
PublicationYear	2010
Publisher	ProQuest Dissertations & Theses
Publisher_xml	– name: ProQuest Dissertations & Theses
SSID	ssib000933042
Score	1.5273824
Snippet	The rapid revolution in microprocessor chip architecture due to the many-core technology is presenting unprecedented challenges to the application developers...
SourceID	proquest
SourceType	Aggregation Database
SubjectTerms	Computer Engineering
Title	Exploring novel many-core architectures for scientific computing
URI	https://www.proquest.com/docview/854494593
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEB509SAeVFTUVcnBa7Bt0jxOCuriafGwh70tzSQVQbu6Xff3m7RpKQhePCYhECaTeWQeH8CNsaZEjSnVNkfKucuo8WYDdVqU0oVqXdWCTcjpVM3n-iXm5tQxrbKTiY2gtksMf-S3Kudc81yzu88vGkCjQnA1Imhsw04orm1qfYfWT--sp0GLec5Leezy1I3zXyK40SuTg3-e6BD2Hwfx9CPYctUx3Pd5daRabtw7-fAPnoZ2lWQYNqiJt1dJWxAZ8oUINgAPftsJzCZPs4dnGoES6KtijHoCl8JIFFxnhRLWpMqiy6QV3BUplt7LxcybWugVE3cJJtYkFhljGLrDWcZOYVQtK3cGBMvQnSdDaVHxQqJ3X72HZaTwM9xacQ7jjhaLyOz1oifExZ-rY9hrQ-_h_-ISRuvVt7uCXdys3-rVdXN1P4aQpDk
linkProvider	ProQuest
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1NS8NAEB1KFRQPKipq_diDHhebZJPsHkTBWlpai4ceegvN7EYETbSpFf-TP9LZpCkFwVsPHvMFy-zm7czszHsAF7GOE1TocKV95EIYl8fkNnCjgiQ0tltXlmIT4WAgRyP1WIPvqhfGllVWmFgAtc7Q5sivpC-EEr7ybt7euRWNsoerlYJGuSp65uuTIrb8utui6b103fb98K7D56IC_El6HqfBJEEcYiCUO5aBjh2p0bihDoQZO5hQRIguuSVIIC5ME5s6bmr0PA8tk5q26U8C_DVhie5sa_Gys7XIDTh206T3HTEnlaqu_V-IX2xj7e3_ZYAd2GotVQvsQs2ke3C7qBpkaTYzL-yV4IxbMk62fCiSM_LGWdnuaauhGBbyFfTZPgxXMeADqKdZag6BYWK5h1wMNUoxDpGCc4of4zCgO0Lr4Agalemj-a-cRwu7H__59Bw2OsOHftTvDnoN2CyLDGym5gTq08mHOYV1nE2f88lZsWoYRCuepB_T7AA6
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adissertation&rft.genre=dissertation&rft.title=Exploring+novel+many-core+architectures+for+scientific+computing&rft.DBID=053%3B0BH%3B0MV%3BCBPLH%3BEU9%3BG20%3BM8-%3BPHGZT%3BPKEHL%3BPQEST%3BPQQKQ%3BPQUKI&rft.PQPubID=18750&rft.au=Chen%2C+Long&rft.date=2010-01-01&rft.pub=ProQuest+Dissertations+%26+Theses&rft.isbn=9781124479415&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=2280613851
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781124479415/lc.gif&client=summon&freeimage=true
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781124479415/mc.gif&client=summon&freeimage=true
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781124479415/sc.gif&client=summon&freeimage=true