Parallelizable adjoint stencil computations using transposed forward-mode algorithmic differentiation

Algorithmic differentiation (AD) is a tool for generating discrete adjoint solvers, which efficiently compute gradients of functions with many inputs, for example for use in gradient-based optimization. AD is often applied to large computations such as stencil operators, which are an important part...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Optimization methods & software Jg. 33; H. 4-6; S. 672 - 693
Hauptverfasser:	Hückelheim, J.C., Hovland, P.D., Strout, M.M., Müller, J.-D.
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Abingdon Taylor & Francis 02.11.2018 Taylor & Francis Ltd
Schlagworte:	Accelerators algorithmic differentiation Algorithms Computer memory Differentiation discrete adjoints OpenMP Optimization Optimization techniques Parallel processing reverse mode shared-memory parallelism Solvers Tiling
ISSN:	1055-6788, 1029-4937
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Algorithmic differentiation (AD) is a tool for generating discrete adjoint solvers, which efficiently compute gradients of functions with many inputs, for example for use in gradient-based optimization. AD is often applied to large computations such as stencil operators, which are an important part of most structured-mesh PDE solvers. Stencil computations are often parallelized, for example by using OpenMP, and optimized by using techniques such as cache-blocking and tiling to fully utilize multicore CPUs and many-core accelerators and GPUs. Differentiating these codes with conventional reverse-mode AD results in adjoint codes that cannot be expressed as stencil operations and may not be easily parallelizable. They thus leave most of the compute power of modern architectures unused. We present a method that combines forward-mode AD and loop transformation to generate adjoint solvers that use the same memory access pattern as the original computation that they are derived from and can benefit from the same optimization techniques. The effectiveness of this method is demonstrated by generating a scalable adjoint CFD solver for multicore CPUs and Xeon Phi accelerators.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 AC02-06CH11357 National Science Foundation (NSF) European Commission - Community Research and Development Information Service (CORDIS) - Seventh Framework Programme (FP7) USDOE Office of Science - Office of Advanced Scientific Computing Research
ISSN:	1055-6788 1029-4937
DOI:	10.1080/10556788.2018.1435654