Exploring novel many-core architectures for scientific computing

The rapid revolution in microprocessor chip architecture due to the many-core technology is presenting unprecedented challenges to the application developers as well as system software designers: how to best exploit the computation potential provided by such many-core architectures? The scope of thi...

Celý popis

Uloženo v:
Podrobná bibliografie
Hlavní autor: Chen, Long
Médium: Dissertation
Jazyk:angličtina
Vydáno: ProQuest Dissertations & Theses 01.01.2010
Témata:
ISBN:9781124479415, 1124479414
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract The rapid revolution in microprocessor chip architecture due to the many-core technology is presenting unprecedented challenges to the application developers as well as system software designers: how to best exploit the computation potential provided by such many-core architectures? The scope of this dissertation is to study programming issues for many-core architectures, and the contributions of this dissertation are in two main areas. Optimizing the Fast Fourier Transform for IBM Cyclops-64. To understand issues in designing and developing high-performance algorithms for many-core architectures, we use the fast Fourier transform (FFT) as a case study to investigate the above issues on the IBM Cyclops-64 many-core chip architecture. We analyze the optimization challenges and opportunities for FFT problems, and identify domain-specific features of the target problems and match them well with some key many-core architecture features. We quantitatively address the impacts of various optimization techniques and effectiveness of the target architecture. The resulting FFT implementations achieve excellent performance results in terms of both speedup and absolute performance. To assist the algorithm design and performance analysis, we present a model that estimates the performance of parallel FFT algorithms for an abstract many-core architecture. This abstract architecture captures generic features and parameters of several real many-core architectures; therefore the performance model is applicable for any architecture with similar features. We derive the performance model based on cost functions for three main components of an execution: the memory accesses, the computation, and the synchronization. The experimental results demonstrate that our model can predict the performance trend accurately, and therefore can provides valuable insights for designing and tuning FFT algorithms on many-core architectures. Exploring Fine-grained Task-based Execution on Graphics Processing Unit-enabled Systems. Using many-core Graphics Processing Unit (GPU) is gaining popularity in scientific computing. However, the conventional data parallel GPU programming paradigms, e.g., NVIDIA CUDA, cannot satisfactorily address certain issues, such as load balancing, GPU resource utilization, overlapping fine-grained computation with communication, etc. The problem is exacerbated when trying to effectively exploit multiple GPUs concurrently, which are commonly available in many modern systems. Our solution to this problem is a fine-grained task-based execution framework for GPU-enabled systems. Our framework allows concurrent execution of fine-grained tasks on GPU-enabled systems. The granularity of task execution is finer than what is currently supported in CUDA; the execution of a task only requires a subset of the GPU hardware resources. Our framework provides means for solving the above issues and efficiently utilizing the computation power provided by the GPUs. We evaluate our approach using both micro-benchmarks and a molecular dynamics (MD) application that exhibits significant load imbalance. Experimental results with a single-GPU configuration show that our fine-grained task-based solution can utilize the hardware more efficiently than the CUDA scheduler for unbalanced workload. On multi-GPU systems, our solution achieves near-linear speedup, good dynamic load balance, and significant performance improvement over other techniques based on standard CUDA APIs.
AbstractList The rapid revolution in microprocessor chip architecture due to the many-core technology is presenting unprecedented challenges to the application developers as well as system software designers: how to best exploit the computation potential provided by such many-core architectures? The scope of this dissertation is to study programming issues for many-core architectures, and the contributions of this dissertation are in two main areas. Optimizing the Fast Fourier Transform for IBM Cyclops-64. To understand issues in designing and developing high-performance algorithms for many-core architectures, we use the fast Fourier transform (FFT) as a case study to investigate the above issues on the IBM Cyclops-64 many-core chip architecture. We analyze the optimization challenges and opportunities for FFT problems, and identify domain-specific features of the target problems and match them well with some key many-core architecture features. We quantitatively address the impacts of various optimization techniques and effectiveness of the target architecture. The resulting FFT implementations achieve excellent performance results in terms of both speedup and absolute performance. To assist the algorithm design and performance analysis, we present a model that estimates the performance of parallel FFT algorithms for an abstract many-core architecture. This abstract architecture captures generic features and parameters of several real many-core architectures; therefore the performance model is applicable for any architecture with similar features. We derive the performance model based on cost functions for three main components of an execution: the memory accesses, the computation, and the synchronization. The experimental results demonstrate that our model can predict the performance trend accurately, and therefore can provides valuable insights for designing and tuning FFT algorithms on many-core architectures. Exploring Fine-grained Task-based Execution on Graphics Processing Unit-enabled Systems. Using many-core Graphics Processing Unit (GPU) is gaining popularity in scientific computing. However, the conventional data parallel GPU programming paradigms, e.g., NVIDIA CUDA, cannot satisfactorily address certain issues, such as load balancing, GPU resource utilization, overlapping fine-grained computation with communication, etc. The problem is exacerbated when trying to effectively exploit multiple GPUs concurrently, which are commonly available in many modern systems. Our solution to this problem is a fine-grained task-based execution framework for GPU-enabled systems. Our framework allows concurrent execution of fine-grained tasks on GPU-enabled systems. The granularity of task execution is finer than what is currently supported in CUDA; the execution of a task only requires a subset of the GPU hardware resources. Our framework provides means for solving the above issues and efficiently utilizing the computation power provided by the GPUs. We evaluate our approach using both micro-benchmarks and a molecular dynamics (MD) application that exhibits significant load imbalance. Experimental results with a single-GPU configuration show that our fine-grained task-based solution can utilize the hardware more efficiently than the CUDA scheduler for unbalanced workload. On multi-GPU systems, our solution achieves near-linear speedup, good dynamic load balance, and significant performance improvement over other techniques based on standard CUDA APIs.
Author Chen, Long
Author_xml – sequence: 1
  givenname: Long
  surname: Chen
  fullname: Chen, Long
BookMark eNotj01LxDAURQMqqGP_Q3BfSPNem3SnDOMHDLiZ_dC-vI6RTlKTVvTfW9DVXd1zz70VlyEGvhBFa2xVaUTTYlVfiyJn3yulWgCF-kY87L6nMSYfTjLELx7luQs_JcXEskv07memeUmc5RCTzOQ5zH7wJCmep2Vea3fiaujGzMV_bsThaXfYvpT7t-fX7eO-PFmAElkPTW-owVZ3tnF9ZR2xNq5B7ioajLWklQVqjEVWpFyvHAEArd7gADbi_g87pfi5cJ6PH3FJYV082hqxxXp99AtdwUgs
ContentType Dissertation
Copyright Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Copyright_xml – notice: Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
DBID 053
0BH
0MV
CBPLH
EU9
G20
M8-
PHGZT
PKEHL
PQEST
PQQKQ
PQUKI
DatabaseName Dissertations & Theses Europe Full Text: Science & Technology
ProQuest Dissertations and Theses Professional
Dissertations & Theses @ University of Delaware
ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection
ProQuest Dissertations & Theses A&I
ProQuest Dissertations & Theses Global
ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
DatabaseTitle Dissertations & Theses Europe Full Text: Science & Technology
ProQuest One Academic Middle East (New)
ProQuest One Academic UKI Edition
Dissertations & Theses @ University of Delaware
ProQuest One Academic Eastern Edition
ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection
ProQuest Dissertations and Theses Professional
ProQuest One Academic
ProQuest Dissertations & Theses A&I
ProQuest One Academic (New)
ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection
ProQuest Dissertations & Theses Global
DatabaseTitleList Dissertations & Theses Europe Full Text: Science & Technology
Database_xml – sequence: 1
  dbid: G20
  name: ProQuest Dissertations & Theses Global
  url: https://www.proquest.com/pqdtglobal1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2280613851
Genre Dissertation/Thesis
GroupedDBID 053
0BH
0MV
8R4
8R5
CBPLH
EU9
G20
M8-
PHGZT
PKEHL
PQEST
PQQKQ
PQUKI
Q2X
ID FETCH-LOGICAL-g833-4e2f6b7c6492a86db18dce27d64ea1cf788c2083c6784e0c0db0dc333c7943d33
IEDL.DBID G20
ISBN 9781124479415
1124479414
IngestDate Mon Jun 30 05:15:51 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-g833-4e2f6b7c6492a86db18dce27d64ea1cf788c2083c6784e0c0db0dc333c7943d33
Notes SourceType-Dissertations & Theses-1
ObjectType-Dissertation/Thesis-1
content type line 12
PQID 854494593
PQPubID 18750
ParticipantIDs proquest_journals_854494593
PublicationCentury 2000
PublicationDate 20100101
PublicationDateYYYYMMDD 2010-01-01
PublicationDate_xml – month: 01
  year: 2010
  text: 20100101
  day: 01
PublicationDecade 2010
PublicationYear 2010
Publisher ProQuest Dissertations & Theses
Publisher_xml – name: ProQuest Dissertations & Theses
SSID ssib000933042
Score 1.5273824
Snippet The rapid revolution in microprocessor chip architecture due to the many-core technology is presenting unprecedented challenges to the application developers...
SourceID proquest
SourceType Aggregation Database
SubjectTerms Computer Engineering
Title Exploring novel many-core architectures for scientific computing
URI https://www.proquest.com/docview/854494593
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8QwEB509SAeVFTUVcnBa7Bt0jxOCuriafGwh70tzSQVQbu6Xff3m7RpKQhePCYhECaTeWQeH8CNsaZEjSnVNkfKucuo8WYDdVqU0oVqXdWCTcjpVM3n-iXm5tQxrbKTiY2gtksMf-S3Kudc81yzu88vGkCjQnA1Imhsw04orm1qfYfWT--sp0GLec5Leezy1I3zXyK40SuTg3-e6BD2Hwfx9CPYctUx3Pd5daRabtw7-fAPnoZ2lWQYNqiJt1dJWxAZ8oUINgAPftsJzCZPs4dnGoES6KtijHoCl8JIFFxnhRLWpMqiy6QV3BUplt7LxcybWugVE3cJJtYkFhljGLrDWcZOYVQtK3cGBMvQnSdDaVHxQqJ3X72HZaTwM9xacQ7jjhaLyOz1oifExZ-rY9hrQ-_h_-ISRuvVt7uCXdys3-rVdXN1P4aQpDk
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1NS8NAEB1KFRQPKipq_diDHhebZJPsHkTBWlpai4ceegvN7EYETbSpFf-TP9LZpCkFwVsPHvMFy-zm7czszHsAF7GOE1TocKV95EIYl8fkNnCjgiQ0tltXlmIT4WAgRyP1WIPvqhfGllVWmFgAtc7Q5sivpC-EEr7ybt7euRWNsoerlYJGuSp65uuTIrb8utui6b103fb98K7D56IC_El6HqfBJEEcYiCUO5aBjh2p0bihDoQZO5hQRIguuSVIIC5ME5s6bmr0PA8tk5q26U8C_DVhie5sa_Gys7XIDTh206T3HTEnlaqu_V-IX2xj7e3_ZYAd2GotVQvsQs2ke3C7qBpkaTYzL-yV4IxbMk62fCiSM_LGWdnuaauhGBbyFfTZPgxXMeADqKdZag6BYWK5h1wMNUoxDpGCc4of4zCgO0Lr4Agalemj-a-cRwu7H__59Bw2OsOHftTvDnoN2CyLDGym5gTq08mHOYV1nE2f88lZsWoYRCuepB_T7AA6
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adissertation&rft.genre=dissertation&rft.title=Exploring+novel+many-core+architectures+for+scientific+computing&rft.DBID=053%3B0BH%3B0MV%3BCBPLH%3BEU9%3BG20%3BM8-%3BPHGZT%3BPKEHL%3BPQEST%3BPQQKQ%3BPQUKI&rft.PQPubID=18750&rft.au=Chen%2C+Long&rft.date=2010-01-01&rft.pub=ProQuest+Dissertations+%26+Theses&rft.isbn=9781124479415&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=2280613851
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781124479415/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781124479415/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781124479415/sc.gif&client=summon&freeimage=true