Visual Analytics System of Comprehensive Data Quality Improvement for Machine Learning using Data- and Process-driven Strategies

Machine learning (ML) models are used to mine inconspicuous information in big data. The model and data quality influence the performance of a ML model. However, modifying the ML model while measuring performance is impractical, and low-quality data causes biased model training. Therefore, improving...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:2022 IEEE International Conference on Big Data (Big Data) s. 396 - 401
Hlavní autori: Hong, Hyein, Yoo, Sangbong, Jin, Yejin, Yoon, Chanyoung, Yim, Soobin, Choi, Seokhwan, Jang, Yun
Médium: Konferenčný príspevok..
Jazyk:English
Vydavateľské údaje: IEEE 17.12.2022
Predmet:
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Machine learning (ML) models are used to mine inconspicuous information in big data. The model and data quality influence the performance of a ML model. However, modifying the ML model while measuring performance is impractical, and low-quality data causes biased model training. Therefore, improving the data quality is essential. Visual analytics systems supporting DQI (Data Quality Improvement) have been proposed in the past. However, in the studies, it is difficult for users to assess comprehensive data quality improvement methods for machine learning and to determine an appropriate data quality improvement process. In this paper, we propose a novel visual analytics system for managing data quality used in machine learning models.
DOI:10.1109/BigData55660.2022.10020585