Software for Exascale Computing - SPPEXA 2016-2019
This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In th...
Uložené v:
| Hlavní autori: | , , , , |
|---|---|
| Médium: | E-kniha |
| Jazyk: | English |
| Vydavateľské údaje: |
Cham
Springer Nature
2020
Springer International Publishing AG |
| Vydanie: | 1 |
| Edícia: | Lecture Notes in Computational Science and Engineering |
| Predmet: | |
| ISBN: | 3030479560, 9783030479565, 3030479552, 9783030479558 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
Obsah:
- 3.1.2 Region and Halo Specifications -- 3.1.3 Global Boundary Specification -- 3.1.4 Halo Wrapper -- 3.1.5 Stencil Operator and Stencil Iterator -- 3.1.6 Performance Comparison -- 3.2 Parallel Algorithms: Sort -- 3.2.1 Preliminaries -- 3.2.2 Related Work -- 3.2.3 Histogram Sort -- 3.2.4 Evaluation and Conclusion -- 4 Use Cases and Applications -- 4.1 A Productivity Study: The Cowichan Benchmarks -- 4.1.1 The Cowichan Problems -- 4.1.2 The Parallel Programming Approaches Compared -- 4.1.3 Implementation Challenges and DASH Features Used -- 4.1.4 Evaluation -- 4.1.5 Summary -- 4.2 Task-Based Application Study: LULESH -- 5 Outlook and Conclusion -- 5.1 MEPHISTO -- References -- ESSEX: Equipping Sparse Solvers For Exascale -- 1 Introduction -- 2 Summary of the ESSEX-I Software Structure -- 3 Algorithmic Developments -- 3.1 Preconditioners (ppOpen-SOL) -- 3.1.1 Regularization -- 3.1.2 Hierarchical Parallel Reordering -- 3.1.3 Multiplicative Schwarz-Type Block Red-Black Gauß-Seidel Smoother -- 3.2 The BEAST Framework for Interior Definite Generalized Eigenproblems -- 3.2.1 Projector Types -- 3.2.2 Flexibility, Adaptivity and Auto-Tuning -- 3.2.3 Levels of Parallelism -- 3.2.4 A Posteriori Cross-Interval Orthogonalization -- 3.2.5 Robustness and Resilience -- 3.3 Further Progress on Contour Integral-Based Eigensolvers -- 3.3.1 Relationship Among Contour Integral-Based Eigensolvers -- 3.3.2 Extension to Nonlinear Eigenvalue Problems -- 3.4 Recursive Algebraic Coloring Engine (RACE) -- 4 Hardware Efficiency and Scalability -- 4.1 Tall and Skinny Matrix-Matrix Multiplication (TSMM) on GPGPUs -- 4.2 BEAST Performance and Scalability on Modern Hardware -- 4.2.1 Node-Level Performance -- 4.2.2 Massively Parallel Performance -- 5 Scalable and Sustainable Software -- 5.1 PHIST and the Block-ILU -- 5.1.1 Integration of the Block-ILU Preconditioning Technique
- 5.2 BEAST -- 5.3 CRAFT -- 5.4 CRAFT Benchmark Application -- 5.5 ScaMaC -- 6 Application Results -- 6.1 Eigensolvers in Quantum Physics: Graphene, Topological Insulators, and Beyond -- 6.2 New Applications in Nonlinear Dynamical Systems -- 7 International Collaborations -- References -- ExaDG: High-Order Discontinuous Galerkin for the Exa-Scale -- 1 Introduction -- 2 Node-Level Performance Through Matrix-Free Implementation -- 2.1 Implementation of Sum Factorization in the deal.II Library -- 2.2 Efficiency of Matrix-Free Implementation -- 3 Performance-Optimized Conjugate Gradient Methods -- 4 Geometric Multigrid Methods in Distributed Environments -- 5 Fast Tensor Product Schwarz Smoothers -- 5.1 The Laplacian on Cartesian Meshes -- 5.2 General Geometry -- 5.3 Linear Elasticitiy -- 6 High-performance Simulations of Incompressible Flows -- 7 hyper.deal: Extending the Matrix-Free Kernels to Higher Dimensions -- 8 Outlook -- References -- Exa-Dune-Flexible PDE Solvers, Numerical Methodsand Applications -- 1 Introduction -- 2 Asynchronicity and Fault Tolerance -- 2.1 Abstract Layer for Asynchronicity -- 2.2 Parallel C++ Exception Handling -- 2.3 Compressed in-Memory Checkpointing for Linear Solvers -- 2.4 Communication Aware Krylov Solvers -- 3 Hardware-Aware, Robust and Scalable Linear Solvers -- 3.1 Strong Smoothers on the GPU: Fast Approximate Inverses with Conventional and Machine Learning Approaches -- 3.2 Autotuning with Artificial Neural Networks -- 3.3 Further Development of Sum-Factorized Matrix-Free DG Methods -- 3.4 Hybrid Solvers for Discontinuous Galerkin Schemes -- 3.5 Horizontal Vectorization of Block Krylov Methods -- 4 Adaptive Multiscale Methods -- 4.1 Continuous Problem and Discretization -- 4.2 Model Reduction -- 4.3 Implementation -- 5 Uncertainty Quantification -- 6 Land-Surface Flow Application
- Intro -- Preface -- Contents -- Acronyms -- Part I SPPEXA: The Priority Program -- Software for Exascale Computing: Some Remarks on the Priority Program SPPEXA -- 1 Preparation -- 2 Design Principles -- 3 Funded Projects and Internal Structure -- 4 SPPEXA Goes International -- 5 Joint Coordinated Activities -- 6 HPC Goes Data -- 7 Shaping the Landscape -- 8 Concluding Remarks -- Appendix 1: Qualification -- Appendix 2: Software from Project Consortia -- Appendix 3: Project Consortia Key Publications -- A Perspective on the SPPEXA Collaboration from France -- 1 HPC Softwares in Three Phases -- 2 Trilateral Projects inand SPPEXA and Their Impacts -- 3 What Will Be Next? -- A Perspective on the SPPEXA Collaboration from Japan -- ppOpen-HPC and ESSEX-II (Kengo Nakajima) -- Xevolver and ExaFSA (Hiroyuki Takizawa) -- The Role of Japan in HPC Collaborations -- Part II SPPEXA Project Consortia Reports -- ADA-FS-Advanced Data Placement via Ad hoc File Systems at Extreme Scales -- 1 Introduction -- 2 GekkoFS-A Temporary Burst Buffer File System for HPC -- 2.1 Related Work -- 2.1.1 General-Purpose Parallel File Systems -- 2.1.2 Node-Local Burst Buffers -- 2.1.3 Metadata Scalability -- 2.2 Design -- 2.2.1 POSIX Semantics -- 2.2.2 Architecture -- 2.2.3 GekkoFS Client -- 2.2.4 GekkoFS Daemon -- 2.3 Evaluation -- 2.3.1 Experimental Setup -- 2.3.2 Metadata Performance -- 2.3.3 Data Performance -- 3 Scheduling and Deployment -- 3.1 Walltime Prediction -- 3.2 Node Prediction -- 3.3 On Demand Burst Buffer Plugin -- 3.4 Related Work -- 4 Resource and Topology Detection -- 4.1 Design and Implementation -- 5 On Demand File System in HPC Environment -- 5.1 Deploying on Demand File System -- 5.2 Benchmarks -- 5.3 Concurrent Data Staging -- 6 GekkoFS on NVME Based Storage Systems -- 7 Conclusion -- References
- AIMES: Advanced Computation and I/O Methods for Earth-System Simulations -- 1 Introduction -- 1.1 The AIMES Project -- 2 Related Work -- 2.1 Domain-Specific Languages -- 2.2 Compression -- 3 Towards Higher-Level Code Design -- 3.1 Our Approach -- 3.2 Extending Modeling Language -- 3.2.1 Extensions and Domain-Specific Concepts -- 3.3 Code Example -- 3.4 Workflow and Tool Design -- 3.5 Configuration -- 3.6 Estimating DSL Impact on Code Quality and Development Costs -- 4 Evaluating Performance of our DSL -- 4.1 Test Applications -- 4.2 Test Systems -- 4.3 Evaluating Blocking -- 4.4 Evaluating Vectorization and Memory Layout Optimization -- 4.5 Evaluating Inter-Kernel Optimization -- 4.6 Scaling with Multiple-Node Runs -- 4.7 Exploring the Use of Alternative DSLs: GridTools -- 5 Massive I/O and Compression -- 5.1 The Scientific Compression Library (SCIL) -- 5.2 Supported Quantities -- 5.3 Compression Chain -- 5.4 Algorithms -- 6 Evaluation of the Compression Library SCIL -- 6.1 Single Core Performance -- 6.2 Compression in HDF5 and NetCDF -- 7 Standardized Icosahedral Benchmarks -- 7.1 IcoAtmosBenchmark v1: Kernels from Icosahedral Atmospheric Models -- 7.1.1 Documentation -- 8 Summary and Conclusion -- References -- DASH: Distributed Data Structures and Parallel Algorithms in a Global Address Space -- 1 Introduction -- 2 The DASH Runtime System -- 2.1 Tasks with Global Dependencies -- 2.1.1 Distributed Data Dependencies -- 2.1.2 Ordering of Dependencies -- 2.1.3 Implementation -- 2.1.4 Results: Blocked Cholesky Factorization -- 2.1.5 Related Work -- 2.2 Dynamic Hardware Topology -- 2.2.1 Locality-Aware Virtual Process Topology -- 2.2.2 Locality Domain Graph -- 2.2.3 Dynamic Hardware Locality -- 2.2.4 Supporting Portable Efficiency -- 3 DASH C++ Data Structures and Algorithms -- 3.1 Smart Data Structures: Halo -- 3.1.1 Stencil Specification
- 6.1 Modelling and Numerical Approach -- 6.2 Performance Optimisations -- 6.3 Scalability and Performance Tests -- 7 Conclusion -- References -- ExaFSA: Parallel Fluid-Structure-Acoustic Simulation -- 1 Introduction -- 2 Model -- 2.1 Governing Equations -- 3 Solvers and Their Optimization -- 3.1 FASTEST -- 3.2 Ateles -- 3.3 CalculiX -- 4 A Black-Box Partitioned Coupling Approach Using preCICE -- 4.1 (Iterative) Coupling -- 4.2 Data Mapping -- 4.3 Communication -- 4.4 Load Balancing -- 4.5 Isolated Performance of preCICE -- 5 Black-Box Coupling Versus White-Box Coupling with APESMate -- 6 Results -- 6.1 Flow over a Fence Test Case Setup -- 6.2 Fluid-Acoustics Coupling with FASTEST and Ateles -- 6.3 Fluid-Acoustics Coupling with Only Ateles -- 7 Summary and Conclusion -- References -- EXAHD: A Massively Parallel Fault Tolerant Sparse Grid Approach for High-Dimensional Turbulent Plasma Simulations -- 1 Introduction -- 2 Theory and Mathematical Model -- 2.1 The Sparse Grid Combination Technique -- 2.2 Plasma Physics with GENE -- 2.3 Fault Tolerance -- 2.3.1 Fault Tolerant Combination Technique -- 2.3.2 Fault Recovery Algorithms -- 3 Implementation -- 3.1 Parallel Implementation of the Combination Technique -- 3.2 Load Balancing -- 3.3 Fault Tolerant Combination Technique -- 4 Numerical Results -- 4.1 Convergence -- 4.1.1 Linear Runs -- 4.1.2 Nonlinear Runs -- 4.2 Scaling Analysis -- 4.2.1 Load Balancing -- 4.3 Fault Tolerance -- 4.3.1 Fault Tolerant Combination Technique -- 4.3.2 libSpina -- 5 Conclusion -- References -- EXAMAG: Towards Exascale Simulations of the Magnetic Universe -- 1 Introduction -- 2 The IllustrisTNG and Auriga Simulations -- 3 Discontinuous Galerkin Hydrodynamics for Astrophysical Applications -- 3.1 The Discontinuous Galerkin Method on a Cartesian Mesh with Automatic Mesh Refinement
- 3.2 The Discontinuous Galerkin Method on a Moving Mesh

