Visual Analytics System of Comprehensive Data Quality Improvement for Machine Learning using Data- and Process-driven Strategies

Machine learning (ML) models are used to mine inconspicuous information in big data. The model and data quality influence the performance of a ML model. However, modifying the ML model while measuring performance is impractical, and low-quality data causes biased model training. Therefore, improving...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2022 IEEE International Conference on Big Data (Big Data) S. 396 - 401
Hauptverfasser: Hong, Hyein, Yoo, Sangbong, Jin, Yejin, Yoon, Chanyoung, Yim, Soobin, Choi, Seokhwan, Jang, Yun
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 17.12.2022
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Machine learning (ML) models are used to mine inconspicuous information in big data. The model and data quality influence the performance of a ML model. However, modifying the ML model while measuring performance is impractical, and low-quality data causes biased model training. Therefore, improving the data quality is essential. Visual analytics systems supporting DQI (Data Quality Improvement) have been proposed in the past. However, in the studies, it is difficult for users to assess comprehensive data quality improvement methods for machine learning and to determine an appropriate data quality improvement process. In this paper, we propose a novel visual analytics system for managing data quality used in machine learning models.
DOI:10.1109/BigData55660.2022.10020585