Poster: Performance challenges in modular parallel programs

Saved in:
Bibliographic Details
Title: Poster: Performance challenges in modular parallel programs
Authors: Aksenov, Vitalii, Acar, Umut, A., Charguéraud, Arthur, Rainey, Mike
Contributors: ITMO University Russia, Langages de programmation, types, compilation et preuves (GALLIUM), Centre Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Carnegie Mellon University Pittsburgh (CMU), Laboratoire des sciences de l'ingénieur, de l'informatique et de l'imagerie (ICube), École Nationale du Génie de l'Eau et de l'Environnement de Strasbourg (ENGEES)-Université de Strasbourg (UNISTRA)-Hôpitaux Universitaires de Strasbourg (HUS)-Institut National des Sciences Appliquées - Strasbourg (INSA Strasbourg), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Matériaux et Nanosciences Grand-Est (MNGE), Université de Strasbourg (UNISTRA)-Université de Haute-Alsace (UHA) Mulhouse - Colmar (Université de Haute-Alsace (UHA))-Institut National de la Santé et de la Recherche Médicale (INSERM)-Institut de Chimie - CNRS Chimie (INC-CNRS)-Centre National de la Recherche Scientifique (CNRS)-Université de Strasbourg (UNISTRA)-Université de Haute-Alsace (UHA) Mulhouse - Colmar (Université de Haute-Alsace (UHA))-Institut National de la Santé et de la Recherche Médicale (INSERM)-Institut de Chimie - CNRS Chimie (INC-CNRS)-Centre National de la Recherche Scientifique (CNRS)-Réseau nanophotonique et optique, Université de Strasbourg (UNISTRA)-Université de Haute-Alsace (UHA) Mulhouse - Colmar (Université de Haute-Alsace (UHA))-Centre National de la Recherche Scientifique (CNRS)-Université de Strasbourg (UNISTRA)-Centre National de la Recherche Scientifique (CNRS), European Project: 308246,EC:FP7:ERC,ERC-2012-StG_20111012,DEEPSEA(2013)
Source: PPoPP 2018 - 23rd ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming ; https://inria.hal.science/hal-01887717 ; PPoPP 2018 - 23rd ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, Feb 2018, Vienna, Austria. ⟨10.1145/3178487.3178516⟩
Publisher Information: CCSD
Publication Year: 2018
Collection: Inserm: HAL (Institut national de la santé et de la recherche médicale)
Subject Terms: [INFO.INFO-DC]Computer Science [cs]/Distributed, Parallel, and Cluster Computing [cs.DC]
Subject Geographic: Vienna, Austria
Description: International audience ; Over the past decade, many programming languages and systems for parallel-computing have been developed, including Cilk, Fork/Join Java, Habanero Java, Parallel Haskell, Parallel ML, and X10. Although these systems raise the level of abstraction at which parallel code are written, performance continues to require the programmer to perform extensive optimizations and tuning, often by taking various architectural details into account. One such key optimization is gran-ularity control, which requires the programmer to determine when and how parallel tasks should be sequentialized. In this paper, we briefly describe some of the challenges associated with automatic granularity control when trying to achieve portable performance for parallel programs with arbitrary nesting of parallel constructs. We consider a result from the functional-programming community, whose starting point is to consider an "oracle" that can predict the work of parallel codes, and thereby control granularity. We discuss the challenges in implementing such an oracle and proving that it has the desired theoretical properties under the nested-parallel programming model. Context The proliferation of multicore hardware in the past decade has brought shared-memory parallelism into the mainstream. This change has led to much research on implicit threading, a.k.a. implicit parallelism, which seeks to make parallel programming easier by delegating certain tedious but important details, such as the scheduling of parallel tasks to the compiler and the run-time system. Implementations include: OpenMP, Cilk, TBB, X10, parallel ML.
Document Type: conference object
Language: English
Relation: info:eu-repo/grantAgreement/EC/FP7/308246/EU/Parallelism and Beyond: Dynamic Parallel Computation for Efficiency and High Performance/DEEPSEA
DOI: 10.1145/3178487.3178516
Availability: https://inria.hal.science/hal-01887717
https://inria.hal.science/hal-01887717v1/document
https://inria.hal.science/hal-01887717v1/file/main.pdf
https://doi.org/10.1145/3178487.3178516
Rights: info:eu-repo/semantics/OpenAccess
Accession Number: edsbas.C8E18CBF
Database: BASE
Description
Abstract:International audience ; Over the past decade, many programming languages and systems for parallel-computing have been developed, including Cilk, Fork/Join Java, Habanero Java, Parallel Haskell, Parallel ML, and X10. Although these systems raise the level of abstraction at which parallel code are written, performance continues to require the programmer to perform extensive optimizations and tuning, often by taking various architectural details into account. One such key optimization is gran-ularity control, which requires the programmer to determine when and how parallel tasks should be sequentialized. In this paper, we briefly describe some of the challenges associated with automatic granularity control when trying to achieve portable performance for parallel programs with arbitrary nesting of parallel constructs. We consider a result from the functional-programming community, whose starting point is to consider an "oracle" that can predict the work of parallel codes, and thereby control granularity. We discuss the challenges in implementing such an oracle and proving that it has the desired theoretical properties under the nested-parallel programming model. Context The proliferation of multicore hardware in the past decade has brought shared-memory parallelism into the mainstream. This change has led to much research on implicit threading, a.k.a. implicit parallelism, which seeks to make parallel programming easier by delegating certain tedious but important details, such as the scheduling of parallel tasks to the compiler and the run-time system. Implementations include: OpenMP, Cilk, TBB, X10, parallel ML.
DOI:10.1145/3178487.3178516