Streaming Compression of Scientific Data via Weak-SINDy: Streaming compression of scientific data via weak-SINDy

Saved in:
Bibliographic Details
Title: Streaming Compression of Scientific Data via Weak-SINDy: Streaming compression of scientific data via weak-SINDy
Authors: Benjamin P. Russo, M. Paul Laiu, Richard Archibald
Source: SIAM Journal on Scientific Computing. 47:C207-C234
Publication Status: Preprint
Publisher Information: Society for Industrial & Applied Mathematics (SIAM), 2025.
Publication Year: 2025
Subject Terms: FOS: Computer and information sciences, Artificial intelligence, Computer Science - Machine Learning, Ridge regression, shrinkage estimators (Lasso), Finite element, Rayleigh-Ritz, Galerkin and collocation methods for ordinary differential equations, Dynamical Systems (math.DS), streaming data, online compression, surrogate modeling, Machine Learning (cs.LG), Approximation by polynomials, proper orthogonal decomposition, 37M10, 62J07, 65L60, 41A10, 68T99, 68V99, FOS: Mathematics, Time series analysis of dynamical systems, Mathematics - Dynamical Systems, Computer science support for mathematical research and practice
Description: In this paper a streaming weak-SINDy algorithm is developed specifically for compressing streaming scientific data. The production of scientific data, either via simulation or experiments, is undergoing an stage of exponential growth, which makes data compression important and often necessary for storing and utilizing large scientific data sets. As opposed to classical "offline" compression algorithms that perform compression on a readily available data set, streaming compression algorithms compress data "online" while the data generated from simulation or experiments is still flowing through the system. This feature makes streaming compression algorithms well-suited for scientific data compression, where storing the full data set offline is often infeasible. This work proposes a new streaming compression algorithm, streaming weak-SINDy, which takes advantage of the underlying data characteristics during compression. The streaming weak-SINDy algorithm constructs feature matrices and target vectors in the online stage via a streaming integration method in a memory efficient manner. The feature matrices and target vectors are then used in the offline stage to build a model through a regression process that aims to recover equations that govern the evolution of the data. For compressing high-dimensional streaming data, we adopt a streaming proper orthogonal decomposition (POD) process to reduce the data dimension and then use the streaming weak-SINDy algorithm to compress the temporal data of the POD expansion. We propose modifications to the streaming weak-SINDy algorithm to accommodate the dynamically updated POD basis. By combining the built model from the streaming weak-SINDy algorithm and a small amount of data samples, the full data flow could be reconstructed accurately at a low memory cost, as shown in the numerical tests.
Document Type: Article
File Description: application/xml
Language: English
ISSN: 1095-7197
1064-8275
DOI: 10.1137/23m1599331
DOI: 10.48550/arxiv.2308.14962
Access URL: http://arxiv.org/abs/2308.14962
Rights: arXiv Non-Exclusive Distribution
Accession Number: edsair.doi.dedup.....77a0486f1f89bcca25cccfdc1c427b54
Database: OpenAIRE
Description
Abstract:In this paper a streaming weak-SINDy algorithm is developed specifically for compressing streaming scientific data. The production of scientific data, either via simulation or experiments, is undergoing an stage of exponential growth, which makes data compression important and often necessary for storing and utilizing large scientific data sets. As opposed to classical "offline" compression algorithms that perform compression on a readily available data set, streaming compression algorithms compress data "online" while the data generated from simulation or experiments is still flowing through the system. This feature makes streaming compression algorithms well-suited for scientific data compression, where storing the full data set offline is often infeasible. This work proposes a new streaming compression algorithm, streaming weak-SINDy, which takes advantage of the underlying data characteristics during compression. The streaming weak-SINDy algorithm constructs feature matrices and target vectors in the online stage via a streaming integration method in a memory efficient manner. The feature matrices and target vectors are then used in the offline stage to build a model through a regression process that aims to recover equations that govern the evolution of the data. For compressing high-dimensional streaming data, we adopt a streaming proper orthogonal decomposition (POD) process to reduce the data dimension and then use the streaming weak-SINDy algorithm to compress the temporal data of the POD expansion. We propose modifications to the streaming weak-SINDy algorithm to accommodate the dynamically updated POD basis. By combining the built model from the streaming weak-SINDy algorithm and a small amount of data samples, the full data flow could be reconstructed accurately at a low memory cost, as shown in the numerical tests.
ISSN:10957197
10648275
DOI:10.1137/23m1599331