Filling the Void: Data-Driven Machine Learning-based Reconstruction of Sampled Spatiotemporal Scientific Simulation Data

As high-performance computing systems continue to advance, the gap between computing performance and I/O capabilities is widening. This bottleneck limits the storage capabilities of increasingly large-scale simulations, which generate data at never-before-seen granularities while only being able to...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis s. 290 - 299
Hlavní autori: Biswas, Ayan, Mishra, Aditi, Majumder, Meghanto, Hazarika, Subhashis, Most, Alexander, Castorena, Juan, Bryan, Christopher, McCormick, Patrick, Ahrens, James, Lawrence, Earl, Hagberg, Aric
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 17.11.2024
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:As high-performance computing systems continue to advance, the gap between computing performance and I/O capabilities is widening. This bottleneck limits the storage capabilities of increasingly large-scale simulations, which generate data at never-before-seen granularities while only being able to store a small subset of the raw data. Recently, strategies for data-driven sampling have been proposed to intelligently sample the data in a way that achieves high data reduction rates while preserving important regions or features with high fidelity. However, a thorough analysis of how such intelligent samples can be used for data reconstruction is lacking. We propose a data-driven machine learning approach based on training neural networks to reconstruct full-scale datasets based on a simulation's sampled output. Compared to current state-of-the-art reconstruction approaches such as Delaunay triangulation-based linear interpolation, we demonstrate that our machine learning-based reconstruction has several advantages, including reconstruction quality, time-to-reconstruct, and knowledge transfer to unseen timesteps and grid resolutions. We propose and evaluate strategies that balance the sampling rates with model training (pretraining and fine-tuning) and data reconstruction time to demonstrate how such machine learning approaches can be tailored for both speed and quality for the reconstruction of grid-based datasets.
DOI:10.1109/SCW63240.2024.00045