XcalableMP PGAS programming language : from programming model to applications

XcalableMP is a directive-based parallel programming language based on Fortran and C, supporting a Partitioned Global Address Space (PGAS) model for distributed memory parallel systems. This open access book presents XcalableMP language from its programming model and basic concept to the experience...

Celý popis

Uloženo v:

Podrobná bibliografie
Hlavní autor:	Sato, Mitsuhisa
Médium:	E-kniha Kniha
Jazyk:	angličtina
Vydáno:	Singapore Springer 2021 Springer Nature Springer Singapore Pte. Limited RIKEN Center for Computational Science
Vydání:	1
Témata:	Coarray Compilers Compilers & interpreters Computer programming / software engineering Computer Technology Computing and Information Technology high performance computing Interpreters Nonfiction Open Access parallel programming language Partitioned Global Address Space model PGAS model Programming & scripting languages: general Programming and scripting languages: general Programming Languages
ISBN:	9789811576829, 9811576823, 9789811576836, 9811576831
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Obsah:

3.4.3 Lower-layer Runtime (LLR) Library -- 3.4.4 Communication Libraries -- 4 Evaluation -- 4.1 Fundamental Performance -- 4.2 Non-blocking Communication -- 4.3 Application Program -- 4.3.1 Coarray Version of the Himeno Benchmark -- 4.3.2 Measurement Result -- 4.3.3 Productivity -- 5 Related Work -- 6 Conclusion -- References -- XcalableACC: An Integration of XcalableMP and OpenACC -- 1 Introduction -- 1.1 Hardware Model -- 1.2 Programming Model -- 1.2.1 XMP Extensions -- 1.2.2 OpenACC Extensions -- 1.3 Execution Model -- 1.4 Data Model -- 2 XcalableACC Language -- 2.1 Data Mapping -- Example -- 2.2 Work Mapping -- Restriction -- Example 1 -- Example 2 -- 2.3 Data Communication and Synchronization -- Example -- 2.4 Coarrays -- Restriction -- Example -- 2.5 Handling Multiple Accelerators -- 2.5.1 devices Directive -- Example -- 2.5.2 on_device Clause -- 2.5.3 layout Clause -- Example -- 2.5.4 shadow Clause -- Example -- 2.5.5 barrier_device Construct -- Example -- 3 Omni XcalableACC Compiler -- 4 Performance of Lattice QCD Application -- 4.1 Overview of Lattice QCD -- 4.2 Implementation -- 5 Performance Evaluation -- 5.1 Result -- 5.2 Discussion -- 6 Productivity Improvement -- 6.1 Requirement for Productive Parallel Language -- 6.2 Quantitative Evaluation by Delta Source Lines of Codes -- 6.3 Discussion -- References -- Mixed-Language Programming with XcalableMP -- 1 Background -- 2 Translation by Omni Compiler -- 3 Functions for Mixed-Language -- 3.1 Function to Call MPI Program from XMP Program -- 3.2 Function to Call XMP Program from MPI Program -- 3.3 Function to Call XMP Program from Python Program -- 3.3.1 From Parallel Python Program -- 3.3.2 From Sequential Python Program -- 4 Application to Order/Degree Problem -- 4.1 What Is Order/Degree Program -- 4.2 Implementation -- 4.3 Evaluation -- 5 Conclusion -- References
2.2 Example of Code Translation -- 2.2.1 Distributed Array -- 2.2.2 Loop Statement -- 2.2.3 Communication -- 3 Installation -- 3.1 Overview -- 3.2 Get Source Code -- 3.2.1 From GitHub -- 3.2.2 From Our Website -- 3.3 Software Dependency -- 3.4 General Installation -- 3.4.1 Build and Install -- 3.4.2 Set PATH -- 3.5 Optional Installation -- 3.5.1 OpenACC -- 3.5.2 XcalableACC -- 3.5.3 One-Sided Library -- 4 Creation of Execution Binary -- 4.1 Compile -- 4.2 Execution -- 4.2.1 XcalableMP and XcalableACC -- 4.2.2 OpenACC -- 4.3 Cooperation with Profiler -- 4.3.1 Scalasca -- 4.3.2 tlog -- 5 Performance Evaluation -- 5.1 Experimental Environment -- 5.2 EP STREAM Triad -- 5.2.1 Design -- 5.2.2 Implementation -- 5.2.3 Evaluation -- 5.3 High-Performance Linpack -- 5.3.1 Design -- 5.3.2 Implementation -- 5.3.3 Evaluation -- 5.4 Global Fast Fourier Transform -- 5.4.1 Design -- 5.4.2 Implementation -- 5.4.3 Evaluation -- 5.5 RandomAccess -- 5.5.1 Design -- 5.5.2 Implementation -- 5.5.3 Evaluation -- 5.6 Discussion -- 6 Conclusion -- References -- Coarrays in the Context of XcalableMP -- 1 Introduction -- 2 Requirements from Language Specifications -- 2.1 Images Mapped to XMP Nodes -- 2.2 Allocation of Coarrays -- 2.3 Communication -- 2.4 Synchronization -- 2.5 Subarrays and Data Contiguity -- 2.6 Coarray C Language Specifications -- 3 Implementation -- 3.1 Omni XMP Compiler Framework -- 3.2 Allocation and Registration -- 3.2.1 Three Methods of Memory Management -- 3.2.2 Initial Allocation for Static Coarrays -- 3.2.3 Runtime Allocation for Allocatable Coarrays -- 3.3 PUT/GET Communication -- 3.3.1 Determining the Possibility of DMA -- 3.3.2 Buffering Communication Methods -- 3.3.3 Non-blocking PUT Communication -- 3.3.4 Optimization of GET Communication -- 3.4 Runtime Libraries -- 3.4.1 Fortran Wrapper -- 3.4.2 Upper-layer Runtime (ULR) Library
Intro -- Preface -- Contents -- XcalableMP Programming Model and Language -- 1 Introduction -- 1.1 Target Hardware -- 1.2 Execution Model -- 1.3 Data Model -- 1.4 Programming Models -- 1.4.1 Partitioned Global Address Space -- 1.4.2 Global-View Programming Model -- 1.4.3 Local-View Programming Model -- 1.4.4 Mixture of Global View and Local View -- 1.5 Base Languages -- 1.5.1 Array Section in XcalableMP C -- 1.5.2 Array Assignment Statement in XcalableMP C -- 1.6 Interoperability -- 2 Data Mapping -- 2.1 nodes Directive -- 2.2 template Directive -- 2.3 distribute Directive -- 2.3.1 Block Distribution -- 2.3.2 Cyclic Distribution -- 2.3.3 Block-Cyclic Distribution -- 2.3.4 Gblock Distribution -- 2.3.5 Distribution of Multi-Dimensional Templates -- 2.4 align Directive -- 2.5 Dynamic Allocation of Distributed Array -- 2.6 template_fix Construct -- 3 Work Mapping -- 3.1 task and tasks Construct -- 3.1.1 task Construct -- 3.1.2 tasks Construct -- 3.2 loop Construct -- 3.2.1 Reduction Computation -- 3.2.2 Parallelizing Nested Loop -- 3.3 array Construct -- 4 Data Communication -- 4.1 shadow Directive and reflect Construct -- 4.1.1 Declaring Shadow -- 4.1.2 Updating Shadow -- 4.2 gmove Construct -- 4.2.1 Collective Mode -- 4.2.2 In Mode -- 4.2.3 Out Mode -- 4.3 barrier Construct -- 4.4 reduction Construct -- 4.5 bcast Construct -- 4.6 wait_async Construct -- 4.7 reduce_shadow Construct -- 5 Local-View Programming -- 5.1 Introduction -- 5.2 Coarray Declaration -- 5.3 Put Communication -- 5.4 Get Communication -- 5.5 Synchronization -- 5.5.1 Sync All -- 5.5.2 Sync Images -- 5.5.3 Sync Memory -- 6 Procedure Interface -- 7 XMPT Tool Interface -- 7.1 Overview -- 7.2 Specification -- 7.2.1 Initialization -- 7.2.2 Events -- References -- Implementation and Performance Evaluation of Omni Compiler -- 1 Overview -- 2 Implementation -- 2.1 Operation Flow
Three-Dimensional Fluid Code with XcalableMP -- 1 Introduction -- 2 Global-View Programming Model -- 2.1 Domain Decomposition Methods -- 2.2 Performance on the K Computer -- 2.2.1 Comparison with Hand-Coded MPI Program -- 2.2.2 Optimization for SIMD -- 2.2.3 Optimization for Allocatable Arrays -- 3 Local-View Programming Model -- 3.1 Communications Using Coarray -- 3.2 Performance on the K Computer -- 4 Summary -- References -- Hybrid-View Programming of Nuclear Fusion Simulation Code in XcalableMP -- 1 Introduction -- 2 Nuclear Fusion Simulation Code -- 2.1 Gyrokinetic PIC Simulation -- 2.2 GTC -- 3 Implementation of GTC-P by Hybrid-view Programming -- 3.1 Hybrid-View Programming Model -- 3.2 Implementation Based on the XMP-Localview Model: XMP-localview -- 3.3 Implementation Based on the XMP-Hybridview Model: XMP-Hybridview -- 4 Performance Evaluation -- 4.1 Experimental Setting -- 4.2 Results -- 4.3 Productivity and Performance -- 5 Related Research -- 6 Conclusion -- References -- Parallelization of Atomic Image Reconstruction from X-ray Fluorescence Holograms with XcalableMP -- 1 Introduction -- 2 X-ray Fluorescence Holography -- 2.1 Reconstruction of Atomic Images -- 2.2 Analysis Procedure of XFH -- 3 Parallelization -- 3.1 Parallelization of Reconstruction of Two-Dimensional Atomic Images by OpenMP -- 3.2 Parallelization of Reconstruction of Three-dimensional Atomic Images by XcalableMP -- 4 Performance Evaluation -- 4.1 Performance Results of Reconstruction of Two-Dimensional Atomic Images -- 4.2 Performance Results of Reconstruction of Three-dimensional Atomic Images -- 4.3 Comparison of Parallelization with MPI -- 5 Conclusion -- References -- Multi-SPMD Programming Model with YML and XcalableMP -- 1 Introduction -- 2 Background: International Collaborations for the Post-Petascale and Exascale Computing -- 3 Multi-SPMD Programming Model
3.1 Overview -- 3.2 YML -- 3.3 OmniRPC-MPI -- 4 Application Development in the mSPMD Programming Environment -- 4.1 Task Generator -- 4.2 Workflow Development -- 4.3 Workflow Execution -- 5 Experiments -- 6 Eigen Solver on the mSPMD Programming Model -- 6.1 Implicitly Restarted Arnoldi Method (IRAM), Multiple Implicitly Restarted Arnoldi Method (MIRAM) and Their Implementations for the mSPMD Programming Model -- 6.2 Experiments -- 7 Fault-Tolerance Features in the mSPMD Programming Model -- 7.1 Overview and Implementation -- 7.2 Experiments -- 8 Runtime Correctness Check for the mSPMD Programming Model -- 8.1 Overview and Implementation -- 8.2 Experiments -- 9 Summary -- References -- XcalableMP 2.0 and Future Directions -- 1 Introduction -- 2 XcalableMP on Fugaku -- 2.1 Performance of XcalableMP Global View Programming -- 2.2 Performance of XcalableMP Local View Programming -- 3 Global Task Parallel Programming -- 3.1 OpenMP and XMP Tasklet Directive -- 3.2 A Proposal for Global Task Parallel Programming -- 3.3 Prototype Design of Code Transformation -- 3.4 Preliminary Performance -- 3.5 Communication Optimization for Manycore Clusters -- 4 Retrospectives and Challenges for Future PGAS Models -- 4.1 Low-Level Communication Layer for PGAS Model -- 4.2 XcalableMP as a DSL for Stencil Applications -- 4.3 XcalableMP API: Compiler-Free Approach -- 4.4 Global Task Parallel Programming Model for Accelerators -- References