Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distri...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Scientific programming Ročník 2015; číslo 2015; s. 1 - 16
Hlavní autoři: Muddukrishna, Ananya, Brorsson, Mats, Jonsson, Peter A.
Médium: Journal Article
Jazyk:angličtina
Vydáno: Cairo, Egypt Hindawi Publishing Corporation 01.01.2015
John Wiley & Sons, Inc
Témata:
ISSN:1058-9244, 1875-919X, 1875-919X
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.
AbstractList Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.
Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and on manycore processors is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor, and we identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.
Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers
Author Brorsson, Mats
Jonsson, Peter A.
Muddukrishna, Ananya
Author_xml – sequence: 1
  fullname: Muddukrishna, Ananya
– sequence: 2
  fullname: Brorsson, Mats
– sequence: 3
  fullname: Jonsson, Peter A.
BackLink https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-166580$$DView record from Swedish Publication Index (Kungliga Tekniska Högskolan)
https://urn.kb.se/resolve?urn=urn:nbn:se:ri:diva-41881$$DView record from Swedish Publication Index
BookMark eNqF0ctrHCEYAHApKTRJe-q9CL21TKMzo6PHJds2hd0mkKT0Jt86umuy0Yk6LPvfx2X6OhRy8sHve-h3go588Aaht5R8opSxs5pQdiYF7Zh8gY6p6Fglqfx5VPaEiUrWbfsKnaR0RwgVlJBjNCyChq3L-2q2g2jwDaR7fK03ph-3zq8x-B7PIQOeu5SjW43ZBY9tiPhyMH55ha9iWEd4SLhcf79dzvD1PmVTzofIJfi9DiVtUdqkFGJ6jV5a2Cbz5td6im6_fL45v6gWl1-_nc8WlW4bkitqSW0lNLJvoW-AMyZ7YIxzQaFjICXlnMhaW2YJ76GtoRa8Mx3R1lqzappT9HHKm3ZmGFdqiO4B4l4FcGrufsxUiGsVnWqpELTo6nl9nzeqlGWCFP9-8kMMj6NJWd2FMfryIFUTIkjNaSeLopPSMaQUjVXaZTj8YI7gtooSdRibOoxNTWP72_efmN-9_F9_mPTG-R527hn8bsKmEGPhH9xxxtvmCcsgsOg
CitedBy_id crossref_primary_10_1002_cpe_6887
crossref_primary_10_1145_3293448
Cites_doi 10.1007/978-3-642-02303-3_7
10.1145/1555815.1555779
10.1007/978-3-642-36949-0_39
10.3233/SPR-2010-0307
10.1007/978-3-642-21487-5_6
10.1177/1094342011434065
10.1109/tc.2010.199
10.1007/978-3-642-30961-8_14
10.1145/2426642.2259000
10.1007/978-3-642-40698-0_12
10.1109/mm.2010.31
10.1109/tpds.2012.322
10.1016/j.procs.2013.05.201
10.1007/978-3-642-19328-6_27
10.1007/978-3-642-24650-0_15
ContentType Journal Article
Copyright Copyright © 2015 Ananya Muddukrishna et al.
Copyright © 2015 Ananya Muddukrishna et al.; This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright_xml – notice: Copyright © 2015 Ananya Muddukrishna et al.
– notice: Copyright © 2015 Ananya Muddukrishna et al.; This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
DBID ADJCN
AHFXO
RHU
RHW
RHX
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ADTPV
AOWAS
D8V
D8T
ZZAVC
DOI 10.1155/2015/981759
DatabaseName الدوريات العلمية والإحصائية - e-Marefa Academic and Statistical Periodicals
معرفة - المحتوى العربي الأكاديمي المتكامل - e-Marefa Academic Complete
Hindawi Publishing Complete
Hindawi Publishing Subscription Journals
Hindawi Publishing Open Access
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
SwePub
SwePub Articles
SWEPUB Kungliga Tekniska Högskolan
SWEPUB Freely available online
SwePub Articles full text
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database



CrossRef
Database_xml – sequence: 1
  dbid: RHX
  name: Hindawi Publishing Open Access
  url: http://www.hindawi.com/journals/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1875-919X
Editor Chandrasekaran, Sunita
Editor_xml – sequence: 1
  givenname: Sunita
  surname: Chandrasekaran
  fullname: Chandrasekaran, Sunita
EndPage 16
ExternalDocumentID oai_DiVA_org_ri_41881
oai_DiVA_org_kth_166580
10_1155_2015_981759
1076564
GroupedDBID .DC
0R~
24P
4.4
5VS
AAFWJ
AAMMB
ABEFU
ABJNI
ABUBZ
ACCMX
ACGFS
ACPQW
ADBBV
ADJCN
AEFGJ
AENEX
AFRHK
AGIAB
AGXDD
AHFXO
AIDQK
AIDYY
ALMA_UNASSIGNED_HOLDINGS
ASPBG
AVWKF
BCNDV
CAG
COF
DU5
EBS
EJD
FEDTE
H13
HZ~
IL9
IOS
IPNFZ
KQ8
MET
MIO
MV1
NGNOM
O9-
OK1
RIG
VOH
.4S
AAJEY
ABDBF
ARCSS
EAD
EAP
EDO
EMK
EPL
EST
ESX
GROUPED_DOAJ
I-F
MK~
ML~
RHU
RHW
RHX
TUS
AAYXX
ALUQN
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ADTPV
AOWAS
D8V
D8T
ZZAVC
ID FETCH-LOGICAL-c430t-1f02f9a39d4ad3a6559da556681a75a99166092cf5f06da42a2867e70cfffeb33
IEDL.DBID RHX
ISICitedReferencesCount 12
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000364899300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1058-9244
1875-919X
IngestDate Tue Nov 04 16:12:28 EST 2025
Tue Nov 04 16:38:12 EST 2025
Fri Jul 25 09:31:29 EDT 2025
Sat Nov 29 04:06:54 EST 2025
Tue Nov 18 22:38:34 EST 2025
Sun Jun 02 18:51:42 EDT 2024
Thu Sep 25 15:09:57 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2015
Language English
License This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
http://creativecommons.org/licenses/by/3.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c430t-1f02f9a39d4ad3a6559da556681a75a99166092cf5f06da42a2867e70cfffeb33
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://dx.doi.org/10.1155/2015/981759
PQID 2008026179
PQPubID 2046410
PageCount 16
ParticipantIDs swepub_primary_oai_DiVA_org_ri_41881
swepub_primary_oai_DiVA_org_kth_166580
proquest_journals_2008026179
crossref_citationtrail_10_1155_2015_981759
crossref_primary_10_1155_2015_981759
hindawi_primary_10_1155_2015_981759
emarefa_primary_1076564
PublicationCentury 2000
PublicationDate 2015-01-01
PublicationDateYYYYMMDD 2015-01-01
PublicationDate_xml – month: 01
  year: 2015
  text: 2015-01-01
  day: 01
PublicationDecade 2010
PublicationPlace Cairo, Egypt
PublicationPlace_xml – name: Cairo, Egypt
– name: New York
PublicationTitle Scientific programming
PublicationYear 2015
Publisher Hindawi Publishing Corporation
John Wiley & Sons, Inc
Publisher_xml – name: Hindawi Publishing Corporation
– name: John Wiley & Sons, Inc
References Nikolopoulos D. S. Papatheodorou T. S. Polychronopoulos C. D. Labarta J. Ayguade E. Is data distribution necessary in OpenMP? Proceedings of the ACM/IEEE Conference on Supercomputing (CDROM '07) November 2000 47
Tilera Tile Processor User Architecture Manual 2012 http://www.tilera.com/scm/docs/UG101-User-Architecture-Reference.pdf
Wittmann M. Hager G. Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems Computing Research Repository, http://arxiv.org/abs/1101.0093
Li Y. Melhem R. Jones A. K. Practically private: enabling high performance CMPs through compiler-assisted data classification Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT '12) September 2012 ACM 231 240 10.1145/2370816.2370852 2-s2.0-84867548843
Majo Z. Gross T. R. Matching memory access patterns and data placement for NUMA systems Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO '12) April 2012 230 241 10.1145/2259016.2259046 2-s2.0-84863469851
Olivier S. L. de Supinski B. R. Schulz M. Prins J. F. Characterizing and mitigating work time inflation in task parallel programs Proceedings of the 24th International Conference for High Performance Computing, Networking, Storage and Analysis (SC '12) November 2012 Salt Lake City, Utah, USA 1 12 10.1109/sc.2012.27 2-s2.0-84877698611
Terboven C. Mey D. Schmidl D. Jin H. Reichstein T. Data and thread affinity in OpenMP programs Proceedings of the Workshop on Memory Access on Future Processors: A Solved Problem? (MAW '08) May 2008 377 384 10.1145/1366219.1366222 2-s2.0-56849098650
Kleen A. A NUMA API for Linux 2005 Kirkland, Wash, USA Novel
(11) 2010; 30
Muddukrishna A. Jonsson P. A. Vlassov V. Brorsson M. Locality-aware task scheduling and data distribution on NUMA systems OpenMP in the Era of Low Power Devices and Accelerators 2013 8122 Berlin, Germany Springer 156 170 Lecture Notes in Computer Science 10.1007/978-3-642-40698-0_12
Molka D. Schöne R. Hackenberg D. Müller M. Memory performance and SPEC OpenMP scalability on quad-socket x86_64 systems Algorithms and Architectures for Parallel Processing 2011 7016 Berlin, Germany Springer 170 181 Lecture Notes in Computer Science 10.1007/978-3-642-24650-0_15
(2) 2012; 47
Liu X. Mellor-Crummey J. A tool to analyze the performance of multithreaded programs on NUMA architectures Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14) February 2014 Orlando, Fla, USA ACM 259 271 10.1145/2555243.2555271 2-s2.0-84896838967
Pilla L. L. Ribeiro C. P. Cordeiro D. Méhaut J.-F. Charm++ on NUMA platforms: the impact of SMP optimizations and a NUMA-aware load balancer Proceedings of the 4th Workshop of the INRIA-Illinois Joint Laboratory on Petascale Computing 2010 Urbana, Ill, USA
Tousimojarad A. Vanderbauwhede W. A parallel task-based approach to linear algebra Proceedings of the IEEE 13th International Symposium on Parallel and Distributed Computing (ISPDC '14) 2014 IEEE 59 66
(38) 2009; 37
Dashti M. Fedorova A. Funston J. Gaud F. Lachaize R. Lepers B. Quéma V. Roth M. Traffic management: a holistic approach to memory placement on NUMA systems Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13) March 2013 ACM 381 394 10.1145/2451116.2451157 2-s2.0-84875650624
AMD BIOS and kernel developer's guide for AMD family 10h processors, 2010
McCurdy C. Vetter J. S. Memphis: finding and fixing NUMA-related performance problems on multi-core platforms Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS '10) March 2010 87 96 10.1109/ispass.2010.5452060 2-s2.0-77952562600
Ribeiro C. P. Castro M. Méhaut J.-F. Carissimi A. Improving memory affinity of geophysics applications on NUMA platforms using minas High Performance Computing for Computational Science—VECPAR 2010 2011 6449 Berlin, Germany Springer 279 292 Lecture Notes in Computer Science 10.1007/978-3-642-19328-6_27
Broquedis F. Furmento N. Goglin B. Namyst R. Wacrenier P. Dynamic task and data placement over numa architectures: an openmp runtime perspective Evolving OpenMP in an Age of Extreme Parallelism 2009 5568 Berlin, Germany Springer 79 92 Lecture Notes in Computer Science 10.1007/978-3-642-02303-3_7
Li Y. Abousamra A. Melhem R. Jones A. K. Compiler-assisted data distribution for chip multiprocessors Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10) September 2010 ACM 501 512 10.1145/1854273.1854335 2-s2.0-78149255516
(17) 2010; 18
(25) 2012; 26
Terboven C. Schmidl D. Cramer T. An Mey D. Assessing OpenMP tasking implementations on NUMA architectures OpenMP in a Heterogeneous World 2012 7312 Berlin, Germany Springer 182 195 Lecture Notes in Computer Science 10.1007/978-3-642-30961-8_14
McCool M. Reinders J. Robison A. Structured Parallel Programming: Patterns for Efficient Computation 2012 Elsevier
(35) 2012; 61
(27) 2013; 24
Schmidl D. Terboven C. an Mey D. Towards NUMA support with distance information OpenMP in the Petascale Era 2011 6665 Berlin, Germany Springer 69 79 Lecture Notes in Computer Science 10.1007/978-3-642-21487-5_6
Chase D. Lev Y. Dynamic circular work-stealing deque Proceedings of the 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '05) July 2005 Las Vegas, Nev, USA ACM 21 28 10.1145/1073970.1073974 2-s2.0-32144435090
(32) 2013; 18
Lu Q. Alias C. Bondhugula U. Henretty T. Krishnamoorthy S. Ramanujam J. Rountev A. Sadayappan P. Chen Y. Ngai T.-F. Lin H. Data layout transformation for enhancing data locality on NUCA chip multiprocessors Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT '09) September 2009 IEEE 348 357 10.1109/pact.2009.36 2-s2.0-70449628310
Broquedis F. Clet-Ortega J. Moreaud S. Furmento N. Goglin B. Mercier G. Thibault S. Namyst R. Hwloc: a generic framework for managing hardware affinities in HPC applications Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP '10) February 2010 180 186 10.1109/pdp.2010.67 2-s2.0-77952719747
Muddukrishna A. Podobas A. Brorsson M. Vlassov V. Task scheduling on manycore processors with home caches Euro-Par 2012: Parallel Processing Workshops 2013 Berlin, Germany Springer 357 367 Lecture Notes in Computer Science 10.1007/978-3-642-36949-0_39
Broquedis F. Furmento N. Goglin B. Namyst R. Wacrenier P.-A. Dynamic task and data placement over NUMA architectures: an OpenMP runtime perspective Evolving OpenMP in an Age of Extreme Parallelism 2009 5568 Berlin, Germany Springer 79 92 Lecture Notes in Computer Science 10.1007/978-3-642-02303-3_7
Yoo R. M. Hughes C. J. Kim C. Chen Y.-K. Kozyrakis C. Locality-aware task management for unstructured parallelism: a quantitative limit study Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '13) July 2013 Portland, Ore, USA ACM 315 325 2-s2.0-84883495254
Goglin B. Furmento N. Enabling high-performance memory migration for multithreaded applications on linux Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium (IPDPS '09) May 2009 1 9 10.1109/ipdps.2009.5161101 2-s2.0-70450079694
Duran A. Teruel X. Ferrer R. Martorell X. Ayguade E. Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP Proceedings of the International Conference on Parallel Processing (ICPP '09) September 2009 Vienna, Austria 124 131 10.1109/ICPP.2009.64
11
(14) 2012
35
25
(15) 2012
27
38
17
29
(5) 2005
2
4
6
8
9
30
10
21
References_xml – reference: Molka D. Schöne R. Hackenberg D. Müller M. Memory performance and SPEC OpenMP scalability on quad-socket x86_64 systems Algorithms and Architectures for Parallel Processing 2011 7016 Berlin, Germany Springer 170 181 Lecture Notes in Computer Science 10.1007/978-3-642-24650-0_15
– reference: Terboven C. Mey D. Schmidl D. Jin H. Reichstein T. Data and thread affinity in OpenMP programs Proceedings of the Workshop on Memory Access on Future Processors: A Solved Problem? (MAW '08) May 2008 377 384 10.1145/1366219.1366222 2-s2.0-56849098650
– volume: 61
  start-page: 222
  issue: 2
  year: 2012
  end-page: 236
  ident: 35
  article-title: An OpenMP compiler for efficient use of distributed scratchpad memory in MPSoCs
– reference: McCool M. Reinders J. Robison A. Structured Parallel Programming: Patterns for Efficient Computation 2012 Elsevier
– reference: Schmidl D. Terboven C. an Mey D. Towards NUMA support with distance information OpenMP in the Petascale Era 2011 6665 Berlin, Germany Springer 69 79 Lecture Notes in Computer Science 10.1007/978-3-642-21487-5_6
– volume: 18
  start-page: 169
  issue: 3-4
  year: 2010
  end-page: 181
  ident: 17
  article-title: Enabling locality-aware computations in OpenMP
– volume: 26
  start-page: 110
  issue: 2
  year: 2012
  end-page: 124
  ident: 25
  article-title: OpenMP task scheduling strategies for multicore NUMA systems
– volume: 18
  start-page: 379
  year: 2013
  end-page: 388
  ident: 32
  article-title: Topology aware task stealing for on-chip NUMA multi-core processors
– reference: Nikolopoulos D. S. Papatheodorou T. S. Polychronopoulos C. D. Labarta J. Ayguade E. Is data distribution necessary in OpenMP? Proceedings of the ACM/IEEE Conference on Supercomputing (CDROM '07) November 2000 47
– reference: Ribeiro C. P. Castro M. Méhaut J.-F. Carissimi A. Improving memory affinity of geophysics applications on NUMA platforms using minas High Performance Computing for Computational Science—VECPAR 2010 2011 6449 Berlin, Germany Springer 279 292 Lecture Notes in Computer Science 10.1007/978-3-642-19328-6_27
– reference: Broquedis F. Clet-Ortega J. Moreaud S. Furmento N. Goglin B. Mercier G. Thibault S. Namyst R. Hwloc: a generic framework for managing hardware affinities in HPC applications Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP '10) February 2010 180 186 10.1109/pdp.2010.67 2-s2.0-77952719747
– reference: Chase D. Lev Y. Dynamic circular work-stealing deque Proceedings of the 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '05) July 2005 Las Vegas, Nev, USA ACM 21 28 10.1145/1073970.1073974 2-s2.0-32144435090
– reference: Li Y. Melhem R. Jones A. K. Practically private: enabling high performance CMPs through compiler-assisted data classification Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT '12) September 2012 ACM 231 240 10.1145/2370816.2370852 2-s2.0-84867548843
– volume: 37
  start-page: 184
  issue: 3
  year: 2009
  end-page: 195
  ident: 38
  article-title: Reactive NUCA: near-optimal block placement and replication in distributed caches
– reference: Muddukrishna A. Jonsson P. A. Vlassov V. Brorsson M. Locality-aware task scheduling and data distribution on NUMA systems OpenMP in the Era of Low Power Devices and Accelerators 2013 8122 Berlin, Germany Springer 156 170 Lecture Notes in Computer Science 10.1007/978-3-642-40698-0_12
– reference: Broquedis F. Furmento N. Goglin B. Namyst R. Wacrenier P.-A. Dynamic task and data placement over NUMA architectures: an OpenMP runtime perspective Evolving OpenMP in an Age of Extreme Parallelism 2009 5568 Berlin, Germany Springer 79 92 Lecture Notes in Computer Science 10.1007/978-3-642-02303-3_7
– reference: Tilera Tile Processor User Architecture Manual 2012 http://www.tilera.com/scm/docs/UG101-User-Architecture-Reference.pdf
– reference: Lu Q. Alias C. Bondhugula U. Henretty T. Krishnamoorthy S. Ramanujam J. Rountev A. Sadayappan P. Chen Y. Ngai T.-F. Lin H. Data layout transformation for enhancing data locality on NUCA chip multiprocessors Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT '09) September 2009 IEEE 348 357 10.1109/pact.2009.36 2-s2.0-70449628310
– reference: Duran A. Teruel X. Ferrer R. Martorell X. Ayguade E. Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP Proceedings of the International Conference on Parallel Processing (ICPP '09) September 2009 Vienna, Austria 124 131 10.1109/ICPP.2009.64
– reference: Tousimojarad A. Vanderbauwhede W. A parallel task-based approach to linear algebra Proceedings of the IEEE 13th International Symposium on Parallel and Distributed Computing (ISPDC '14) 2014 IEEE 59 66
– volume: 24
  start-page: 2334
  issue: 12
  year: 2013
  end-page: 2343
  ident: 27
  article-title: Adaptive cache aware bitier work-stealing in multisocket multicore architectures
– reference: Dashti M. Fedorova A. Funston J. Gaud F. Lachaize R. Lepers B. Quéma V. Roth M. Traffic management: a holistic approach to memory placement on NUMA systems Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13) March 2013 ACM 381 394 10.1145/2451116.2451157 2-s2.0-84875650624
– reference: McCurdy C. Vetter J. S. Memphis: finding and fixing NUMA-related performance problems on multi-core platforms Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS '10) March 2010 87 96 10.1109/ispass.2010.5452060 2-s2.0-77952562600
– reference: Wittmann M. Hager G. Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems Computing Research Repository, http://arxiv.org/abs/1101.0093
– reference: Pilla L. L. Ribeiro C. P. Cordeiro D. Méhaut J.-F. Charm++ on NUMA platforms: the impact of SMP optimizations and a NUMA-aware load balancer Proceedings of the 4th Workshop of the INRIA-Illinois Joint Laboratory on Petascale Computing 2010 Urbana, Ill, USA
– reference: Kleen A. A NUMA API for Linux 2005 Kirkland, Wash, USA Novel
– reference: Liu X. Mellor-Crummey J. A tool to analyze the performance of multithreaded programs on NUMA architectures Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14) February 2014 Orlando, Fla, USA ACM 259 271 10.1145/2555243.2555271 2-s2.0-84896838967
– reference: Terboven C. Schmidl D. Cramer T. An Mey D. Assessing OpenMP tasking implementations on NUMA architectures OpenMP in a Heterogeneous World 2012 7312 Berlin, Germany Springer 182 195 Lecture Notes in Computer Science 10.1007/978-3-642-30961-8_14
– reference: Broquedis F. Furmento N. Goglin B. Namyst R. Wacrenier P. Dynamic task and data placement over numa architectures: an openmp runtime perspective Evolving OpenMP in an Age of Extreme Parallelism 2009 5568 Berlin, Germany Springer 79 92 Lecture Notes in Computer Science 10.1007/978-3-642-02303-3_7
– reference: Li Y. Abousamra A. Melhem R. Jones A. K. Compiler-assisted data distribution for chip multiprocessors Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10) September 2010 ACM 501 512 10.1145/1854273.1854335 2-s2.0-78149255516
– volume: 30
  start-page: 16
  issue: 2
  year: 2010
  end-page: 29
  ident: 11
  article-title: Cache hierarchy and memory subsystem of the AMD opteron processor
– reference: Goglin B. Furmento N. Enabling high-performance memory migration for multithreaded applications on linux Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium (IPDPS '09) May 2009 1 9 10.1109/ipdps.2009.5161101 2-s2.0-70450079694
– reference: Olivier S. L. de Supinski B. R. Schulz M. Prins J. F. Characterizing and mitigating work time inflation in task parallel programs Proceedings of the 24th International Conference for High Performance Computing, Networking, Storage and Analysis (SC '12) November 2012 Salt Lake City, Utah, USA 1 12 10.1109/sc.2012.27 2-s2.0-84877698611
– reference: Yoo R. M. Hughes C. J. Kim C. Chen Y.-K. Kozyrakis C. Locality-aware task management for unstructured parallelism: a quantitative limit study Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '13) July 2013 Portland, Ore, USA ACM 315 325 2-s2.0-84883495254
– reference: Muddukrishna A. Podobas A. Brorsson M. Vlassov V. Task scheduling on manycore processors with home caches Euro-Par 2012: Parallel Processing Workshops 2013 Berlin, Germany Springer 357 367 Lecture Notes in Computer Science 10.1007/978-3-642-36949-0_39
– reference: Majo Z. Gross T. R. Matching memory access patterns and data placement for NUMA systems Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO '12) April 2012 230 241 10.1145/2259016.2259046 2-s2.0-84863469851
– volume: 47
  start-page: 3
  issue: 11
  year: 2012
  end-page: 14
  ident: 2
  article-title: Memory management for many-core processors with software configurable locality policies
– reference: AMD BIOS and kernel developer's guide for AMD family 10h processors, 2010
– ident: 21
  doi: 10.1007/978-3-642-02303-3_7
– ident: 38
  doi: 10.1145/1555815.1555779
– year: 2005
  ident: 5
– ident: 9
  doi: 10.1007/978-3-642-36949-0_39
– year: 2012
  ident: 14
– ident: 17
  doi: 10.3233/SPR-2010-0307
– ident: 29
  doi: 10.1007/978-3-642-21487-5_6
– ident: 25
  doi: 10.1177/1094342011434065
– ident: 35
  doi: 10.1109/tc.2010.199
– ident: 6
  doi: 10.1007/978-3-642-30961-8_14
– ident: 2
  doi: 10.1145/2426642.2259000
– ident: 8
  doi: 10.1007/978-3-642-40698-0_12
– ident: 10
  doi: 10.1109/mm.2010.31
– ident: 30
  doi: 10.1007/978-3-642-02303-3_7
– year: 2012
  ident: 15
– ident: 27
  doi: 10.1109/tpds.2012.322
– volume: 18
  start-page: 379
  year: 2013
  ident: 32
  publication-title: Procedia Computer Science
  doi: 10.1016/j.procs.2013.05.201
– ident: 4
  doi: 10.1007/978-3-642-19328-6_27
– ident: 11
  doi: 10.1007/978-3-642-24650-0_15
SSID ssj0018100
Score 2.0942783
Snippet Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing...
SourceID swepub
proquest
crossref
hindawi
emarefa
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms Application programming interfaces (API)
Architectural design
Architectural knowledge
Benchmarking
Computer architecture
Data distribution
Distributing
Improve performance
Many-core processors
Microprocessors
Multiprocessing systems
Multitasking
Network management
Non uniform data
Performance degradation
Performance enhancement
Policies
Processor architectures
Processors
Scheduling
Scheduling algorithms
Scheduling techniques
Software architecture
Task scheduling
Title Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors
URI https://search.emarefa.net/detail/BIM-1076564
https://dx.doi.org/10.1155/2015/981759
https://www.proquest.com/docview/2008026179
https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-166580
https://urn.kb.se/resolve?urn=urn:nbn:se:ri:diva-41881
Volume 2015
WOSCitedRecordID wos000364899300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVWIB
  databaseName: Wiley Online Library Open Access
  customDbUrl:
  eissn: 1875-919X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0018100
  issn: 1875-919X
  databaseCode: 24P
  dateStart: 19920101
  isFulltext: true
  titleUrlDefault: https://authorservices.wiley.com/open-science/open-access/browse-journals.html
  providerName: Wiley-Blackwell
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3fT9swED7RCqS9wNgPKJTK0tAeJkWLEztxHisK4qGtqqlFfbMcO1YrtoKaAP8-58StYJsQb4l9Tiyfne8u9n0HcI6l2uZaBMZkecAQQoM8pSowLspTGxWxOiXLzTAdj8V8nk38Adny3y18RDt0zyn_mQnEuawFLcHdwa1f1_PtXoGgYcM5wHHpIlr5KLy_mr7Cnb3ij8ILBKO9hXN9n5avDcyXpKE10Fx9hH1vIZJ-o9JD2ClWn-Bgk32B-MX4Ge6HDobQiA76T_gaMlXlLdYuEDtciDlRK0MGqlJk4LhxfVorgjYqcYdIRhMyaY5mlQSLx7NRn3j28rrlCD8SjuGS-EiCu3X5BWZXl9OL68DnTwg0i8MqoDaMbKbizDBlYpWg82AUR_tNUJVy5SzDJMwibbkNE6NYpCKRpEUaamstOtnxV2iv7lbFMZDcFjqJTcQKnTEjVK7TWGvKjGWcpoXowI_N6ErtycVdjovfsnYyOJdOFbJRRQfOt8L3DafG_8WOvJpeSKVof7IOfPNqe7t9d6NS6RdmWWfdrFnosfp7o-btMxzT9mB505c4--RttZA4PlyE2N23BNdLyagQ9ORdnTqFD-6u-YnThXa1fijOYFc_Vsty3YNWxCa9eno_Aw238wg
linkProvider Hindawi Publishing
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Locality-aware+Task+Scheduling+and+Data+Distribution+for+OpenMP+Programs+on+NUMA+Systems+and+Manycore+Processors&rft.jtitle=Scientific+programming&rft.au=Muddukrishna%2C+Ananya&rft.au=Jonsson%2C+Peter+A.&rft.au=Brorsson%2C+Mats&rft.date=2015-01-01&rft.issn=1875-919X&rft_id=info:doi/10.1155%2F2015%2F981759&rft.externalDocID=oai_DiVA_org_kth_166580
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1058-9244&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1058-9244&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1058-9244&client=summon