Survey: Time-Series Data Preprocessing: A Survey and an Empirical Analysis

Uloženo v:
Podrobná bibliografie
Název: Survey: Time-Series Data Preprocessing: A Survey and an Empirical Analysis
Autoři: Tawalkuli, Amal, Havers, Bastian, 1991, Gulisano, Vincenzo Massimiliano, 1984, Kaiser, Daniel, Engel, Thomas
Zdroj: AutoSPADA (Automotive Stream Processing and Distributed Analytics) OODIDA Phase 2 Journal of Engineering Research. 13(2):674-711
Témata: Data Preprocessing, Data Quality
Popis: Data are naturally collected in their raw state and must undergo a series of preprocessing steps to obtain data in their input state for Artificial Intelligence (AI) and other applications. The data preprocessing phase is not only necessary to fit input requirements but also effective in improving AI training efficiency and output accuracy. Data preprocessing is a time consuming and complex phase that lacks a unified and structured approach. We survey data preprocessing techniques under different categories to provide an extended and structured scope of data preprocessing relevant to numerical time-series data. We also provide an empirical analysis of the impact of preprocessing techniques on the quality of the data and on the performance of AI algorithms. In addition, we discuss the feasibility of distributing some of the surveyed techniques to the edge. Leveraging edge computing to distribute data preprocessing reduces the workload on central systems, creates more manageable data lakes, reduces the consumption of resources (e.g., energy) and enables EdgeAI.
Popis souboru: electronic
Přístupová URL adresa: https://research.chalmers.se/publication/540273
https://research.chalmers.se/publication/540495
https://research.chalmers.se/publication/540495/file/540495_Fulltext.pdf
Databáze: SwePub
Popis
Abstrakt:Data are naturally collected in their raw state and must undergo a series of preprocessing steps to obtain data in their input state for Artificial Intelligence (AI) and other applications. The data preprocessing phase is not only necessary to fit input requirements but also effective in improving AI training efficiency and output accuracy. Data preprocessing is a time consuming and complex phase that lacks a unified and structured approach. We survey data preprocessing techniques under different categories to provide an extended and structured scope of data preprocessing relevant to numerical time-series data. We also provide an empirical analysis of the impact of preprocessing techniques on the quality of the data and on the performance of AI algorithms. In addition, we discuss the feasibility of distributing some of the surveyed techniques to the edge. Leveraging edge computing to distribute data preprocessing reduces the workload on central systems, creates more manageable data lakes, reduces the consumption of resources (e.g., energy) and enables EdgeAI.
ISSN:23071885
23071877
DOI:10.1016/j.jer.2024.02.018