Supporting Applications Involving Dynamic Data Structures and Irregular Memory Access on Emerging Parallel Platforms

SIMD accelerators and many-core coprocessors with coarse-grained and fine-grained level parallelism, become more and more popular. Streaming SIMD Extensions (SSE), Graphics Processing Unit (GPU), and Intel Xeon Phi (MIC) can provide orders of magnitude better performance and efficiency for parallel...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Ren, Bin
Format:	Dissertation
Sprache:	Englisch
Veröffentlicht:	ProQuest Dissertations & Theses 01.01.2014
Schlagworte:	Computer science
ISBN:	1321476949, 9781321476941
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	SIMD accelerators and many-core coprocessors with coarse-grained and fine-grained level parallelism, become more and more popular. Streaming SIMD Extensions (SSE), Graphics Processing Unit (GPU), and Intel Xeon Phi (MIC) can provide orders of magnitude better performance and efficiency for parallel workloads as compared to single core CPUs. However, parallelizing irregular applications involving dynamic data structures and irregular memory access on these parallel platforms is not straightforward due to their intensive control-flow dependency and lack of memory locality. Our efforts focus on three classes of irregular applications: Irregular Trees and Graphs Traversals, Irregular Reductions and Dynamic Allocated Arrays and Lists, and explore the mechanism of parallelizing them on various parallel architectures from both fine-grained and coarse-grained perspectives. We first focus on the traversal of irregular trees and graphs , more specifically, a class of applications involving the traversal of many pointer-intensive data structures, e.g. Random Forest, and Regular Expressions, on various fine-grained SIMD architectures, e.g. the Streaming SIMD Extension (SSE), and Graphic Processing Unit (GPU). We address this problem by developing an intermediate language for specifying such traversals, followed by a run-time scheduler that maps traversals to SIMD units. A key idea in our run-time scheme is converting branches to arithmetic operations, which then allows us to use SIMD hardware. However, different SIMD architectures have different features, so a significant challenge to our previous work is to automatically optimize applications for various architectures, i.e., we need to implement performance portability . Moreover, one of the first architectural features programmers look to when optimizing their applications is the memory hierarchy. Thus, we design a portable optimization engine for accelerating irregular data traversal applications on various SIMD architectures by emphasizing on improving the data locality and hiding memory latency. We next explore the possibility of efficiently parallelizing two irregular reduction applications on Intel Xeon Phi architecture, an emerging many-core coprocessor architecture with long SIMD vectors, via data layout optimization. During this process, we also identify a general data management problem in the CPU-Coprocessor programming model, i.e., the problem of automating and optimizing dynamic allocated data structures transfers between CPU and Coprocessors. For dynamic multi-dimensional arrays, we design a set of compile-time solutions involving heap layout transformation, while for other irregular data structures such as linked lists, we improve the existing shared memory runtime solution to reduce the transfer costs. Dynamic allocated data structures like List are also commonly used in high-level programming languages, such as Python to support dynamic, flexible features to increase the programming productivity. To parallelize applications in such high-level programming languages on both coarse-grained and fine-grained parallel platforms, we design a compilation system linearizing dynamic data structures into arrays, and invoking low level multi-core, many-core libraries. A critical issue of our linearization method is that it incurs extra data structure transformation overhead, especially for the irregular data structures not reused frequently. To address this challenge, we design a set of transformation optimization algorithms including an inter-procedural Partial Redundancy Elimination (PRE) algorithm to minimize the data transformation overhead automatically.
AbstractList	SIMD accelerators and many-core coprocessors with coarse-grained and fine-grained level parallelism, become more and more popular. Streaming SIMD Extensions (SSE), Graphics Processing Unit (GPU), and Intel Xeon Phi (MIC) can provide orders of magnitude better performance and efficiency for parallel workloads as compared to single core CPUs. However, parallelizing irregular applications involving dynamic data structures and irregular memory access on these parallel platforms is not straightforward due to their intensive control-flow dependency and lack of memory locality. Our efforts focus on three classes of irregular applications: Irregular Trees and Graphs Traversals, Irregular Reductions and Dynamic Allocated Arrays and Lists, and explore the mechanism of parallelizing them on various parallel architectures from both fine-grained and coarse-grained perspectives. We first focus on the traversal of irregular trees and graphs , more specifically, a class of applications involving the traversal of many pointer-intensive data structures, e.g. Random Forest, and Regular Expressions, on various fine-grained SIMD architectures, e.g. the Streaming SIMD Extension (SSE), and Graphic Processing Unit (GPU). We address this problem by developing an intermediate language for specifying such traversals, followed by a run-time scheduler that maps traversals to SIMD units. A key idea in our run-time scheme is converting branches to arithmetic operations, which then allows us to use SIMD hardware. However, different SIMD architectures have different features, so a significant challenge to our previous work is to automatically optimize applications for various architectures, i.e., we need to implement performance portability . Moreover, one of the first architectural features programmers look to when optimizing their applications is the memory hierarchy. Thus, we design a portable optimization engine for accelerating irregular data traversal applications on various SIMD architectures by emphasizing on improving the data locality and hiding memory latency. We next explore the possibility of efficiently parallelizing two irregular reduction applications on Intel Xeon Phi architecture, an emerging many-core coprocessor architecture with long SIMD vectors, via data layout optimization. During this process, we also identify a general data management problem in the CPU-Coprocessor programming model, i.e., the problem of automating and optimizing dynamic allocated data structures transfers between CPU and Coprocessors. For dynamic multi-dimensional arrays, we design a set of compile-time solutions involving heap layout transformation, while for other irregular data structures such as linked lists, we improve the existing shared memory runtime solution to reduce the transfer costs. Dynamic allocated data structures like List are also commonly used in high-level programming languages, such as Python to support dynamic, flexible features to increase the programming productivity. To parallelize applications in such high-level programming languages on both coarse-grained and fine-grained parallel platforms, we design a compilation system linearizing dynamic data structures into arrays, and invoking low level multi-core, many-core libraries. A critical issue of our linearization method is that it incurs extra data structure transformation overhead, especially for the irregular data structures not reused frequently. To address this challenge, we design a set of transformation optimization algorithms including an inter-procedural Partial Redundancy Elimination (PRE) algorithm to minimize the data transformation overhead automatically.
Author	Ren, Bin
Author_xml	– sequence: 1 givenname: Bin surname: Ren fullname: Ren, Bin
BookMark	eNotjVFLwzAURgMq6Ob-Q8DnQpukTfNYtqmDiYPtfdwmt6WSJjVJB_v3OvTpgwPnfAty77zDO7IoOCuErJRQj2QV49Dmea44zwV7Iuk4T5MPaXA9babJDhrS4F2kO3fx9nLDm6uDcdB0AwnoMYVZpzlgpOAM3YWA_Wwh0A8cfbjSRmuMkXpHtyOG_uYfIIC1aOnBQup8GOMzeejARlz975KcXren9Xu2_3zbrZt9NpVcZa2QNRilEBV2reGVkVpywUyljChrhsi0BlbXoqxMy1CDNGjqupVFq0RX8iV5-ctOwX_PGNP5y8_B_T6ei0pIViqmJP8BTHxbnw
ContentType	Dissertation
Copyright	Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Copyright_xml	– notice: Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
DBID	053 0BH 0J6 CBPLH EU9 G20 M8- OK5 PHGZT PKEHL PQEST PQQKQ PQUKI
DatabaseName	Dissertations & Theses Europe Full Text: Science & Technology ProQuest Dissertations and Theses Professional Dissertations & Theses @ Ohio State University ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection ProQuest Dissertations & Theses A&I ProQuest Dissertations & Theses Global ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection Dissertations & Theses @ Big Ten Academic Alliance ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic (retired) ProQuest One Academic UKI Edition
DatabaseTitle	ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition Dissertations & Theses @ Ohio State University ProQuest Dissertations & Theses Global: The Sciences and Engineering Collection ProQuest Dissertations and Theses Professional ProQuest Dissertations and Theses A&I: The Sciences and Engineering Collection ProQuest Dissertations & Theses Global Dissertations & Theses Europe Full Text: Science & Technology ProQuest One Academic UKI Edition Dissertations & Theses @ CIC Institutions ProQuest One Academic ProQuest Dissertations & Theses A&I ProQuest One Academic (New)
DatabaseTitleList	ProQuest One Academic Middle East (New)
Database_xml	– sequence: 1 dbid: G20 name: ProQuest Dissertations & Theses Global url: https://www.proquest.com/pqdtglobal1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
ExternalDocumentID	3564273631
Genre	Dissertation/Thesis
GroupedDBID	053 0BH 0J6 8R4 8R5 CBPLH EU9 G20 M8- OK5 PHGZT PKEHL PQEST PQQKQ PQUKI Q2X
ID	FETCH-LOGICAL-p539-b478ad99ee9efbd36d7c7342d69d4582ee2cca288456db2eca7ded88b71b94f53
IEDL.DBID	G20
ISBN	1321476949 9781321476941
IngestDate	Mon Jun 30 06:03:12 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-p539-b478ad99ee9efbd36d7c7342d69d4582ee2cca288456db2eca7ded88b71b94f53
Notes	SourceType-Dissertations & Theses-1 ObjectType-Dissertation/Thesis-1 content type line 12
PQID	1647259297
PQPubID	18750
ParticipantIDs	proquest_journals_1647259297
PublicationCentury	2000
PublicationDate	20140101
PublicationDateYYYYMMDD	2014-01-01
PublicationDate_xml	– month: 01 year: 2014 text: 20140101 day: 01
PublicationDecade	2010
PublicationYear	2014
Publisher	ProQuest Dissertations & Theses
Publisher_xml	– name: ProQuest Dissertations & Theses
SSID	ssib000933042
Score	1.6440281
Snippet	SIMD accelerators and many-core coprocessors with coarse-grained and fine-grained level parallelism, become more and more popular. Streaming SIMD Extensions...
SourceID	proquest
SourceType	Aggregation Database
SubjectTerms	Computer science
Title	Supporting Applications Involving Dynamic Data Structures and Irregular Memory Access on Emerging Parallel Platforms
URI	https://www.proquest.com/docview/1647259297
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpZ3PS8MwFMeDTg_iYf7EH1PewWtwTdMmOclwDndwDNxht5E0iQijnW0V_O9NslYHghePpVDKa98337wX3gehG2sSa0kmMDOKYcpdpjsdTHHG-m7xVX0VWR1gE2wy4fO5mDYFt6o5VtlqYhBqXWS-Rn7r5145q04Eu1u9YU-N8t3VBqGxjXYYFzSgGzbtz_duPQo8nlRQ0Yx5aq-jXxocFpZR97-vdID2hxsd9UO0ZfIj1G1ZDdCk7jGqPb6z8CMDXmCw0bSGce4EylcVYLiG08NQ1hKew1zZd7cZB5lrGJdlgNaX8OSP5n7CIIAWocjBl7U86QimsvRgliVMl7L2Xrg6QbPRw-z-ETfEBbxKYoEVZVxqIYwRxiodp5plLKZEp0L7_poxxH1wwrlzXVoRk0mmjeZcsUgJapP4FHXyIjdnCBIZaW0ZsSamVLptlRMGo7SNlXuw8zznqNfGdNFkTbX4CejF37cv0Z4zLnRdCumhjguJuUK72Uf9WpXX4Sf4Alorv5c
linkProvider	ProQuest
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1NS8MwGA5DBcXD_MSPqTnosbimWdMcRIZzWtQxcIfdRtK8EUE7baeyH-V_NG-36kDwtoPHUAgkb_rkeT_yPoQcW2hYyxLpCdDC45H70x0Ohl4i6u7y1XXtW1OITYhOJ-r3ZbdCPsu3MFhWWWJiAdRmmGCM_BT7XjmqzqQ4f3n1UDUKs6ulhMbkWNzA-MO5bPlZ3HL2PWGsfdm7uPamqgLeSyOQnuYiUkZKAAlWmyA0IhEBZyaUBnNIAMwtikWRYxZGM0iUMGCiSAtfS25RJMIh_iIPRB19vatZtvUdHPAL-Z9QcjntKlWO_V-QX9xj7eo_24E1stqaqRdYJxVIN0i1VKKgU2DaJCMUJx1iQ4QH2pxJydM4dfCLMRPaGqfq-TGhLTVS9L7omvuWQU5VamicZfCA9bj0DguPx7RZyEjSYUoxaIc6TrSrMpSdeaLdJzVCpp9vkd481r1NFtJhCjuENpRvjBXMQsC5ck6jgz3QxgbaTewY3S6plSYcTDEhH_zYb-_vz0dk-bp3dzu4jTs3-2TFUTQ-CfrUyILbHjggS8n76DHPDovzR8lgztb-AnbGHlo
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adissertation&rft.genre=dissertation&rft.title=Supporting+Applications+Involving+Dynamic+Data+Structures+and+Irregular+Memory+Access+on+Emerging+Parallel+Platforms&rft.DBID=053%3B0BH%3B0J6%3BCBPLH%3BEU9%3BG20%3BM8-%3BOK5%3BPHGZT%3BPKEHL%3BPQEST%3BPQQKQ%3BPQUKI&rft.PQPubID=18750&rft.au=Ren%2C+Bin&rft.date=2014-01-01&rft.pub=ProQuest+Dissertations+%26+Theses&rft.isbn=1321476949&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=3564273631
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781321476941/lc.gif&client=summon&freeimage=true
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781321476941/mc.gif&client=summon&freeimage=true
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781321476941/sc.gif&client=summon&freeimage=true