Asynchronous distributed-memory task-parallel algorithm for compressible flows on unstructured 3D Eulerian grids

•.A finite element method for the simulation of compressible flows using the Charm++ runtime system has been implemented.•Strong and weak scalability up to and computational cells, respectively, have been demonstrated.•The benefits of automatic load balancing in Charm++ have been demonstrated.•The f...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Advances in engineering software (1992) Ročník 160; s. 102962
Hlavní autoři:	Bakosi, J., Bird, R., Gonzalez, F., Junghans, C., Li, W., Luo, H., Pandare, A., Waltz, J.
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	United States Elsevier Ltd 01.10.2021 Elsevier
Témata:	Automatic load balancing Charm ENGINEERING Finite element method Flux-corrected transport MATHEMATICS AND COMPUTING Shock hydrodynamics Finite element method Flux-corrected transport Automatic load balancing Shock hydrodynamics Charm
ISSN:	0965-9978
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	•.A finite element method for the simulation of compressible flows using the Charm++ runtime system has been implemented.•Strong and weak scalability up to and computational cells, respectively, have been demonstrated.•The benefits of automatic load balancing in Charm++ have been demonstrated.•The full source code is available at quinoacomputing.org. We discuss the implementation of a finite element method, used to numerically solve the Euler equations of compressible flows, using an asynchronous runtime system (RTS). The algorithm is implemented for distributed-memory machines, using stationary unstructured 3D meshes, combining data-, and task-parallelism on top of the Charm++ RTS. Charm++’s execution model is asynchronous by default, allowing arbitrary overlap of computation and communication. Task-parallelism allows scheduling parts of an algorithm independently of, or dependent on, each other. Built-in automatic load balancing enables continuous redistribution of computational load by migration of work units based on real-time CPU load measurement. The RTS also features automatic checkpointing, fault tolerance, resilience against hardware failure, and supports power-, and energy-aware computation. We demonstrate scalability up to 25×109 cells at O(104) compute cores and the benefits of automatic load balancing for irregular workloads. The full source code with documentation is available at https://quinoacomputing.org.
Bibliografie:	USDOE Laboratory Directed Research and Development (LDRD) Program 89233218CNA000001; LA-UR-20-21450; LDRD-20170127-ER LA-UR-20-21450; LDRD-20170127-ER
ISSN:	0965-9978
DOI:	10.1016/j.advengsoft.2020.102962