Visual Analytics System of Comprehensive Data Quality Improvement for Machine Learning using Data- and Process-driven Strategies

Machine learning (ML) models are used to mine inconspicuous information in big data. The model and data quality influence the performance of a ML model. However, modifying the ML model while measuring performance is impractical, and low-quality data causes biased model training. Therefore, improving...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2022 IEEE International Conference on Big Data (Big Data) s. 396 - 401
Hlavní autoři: Hong, Hyein, Yoo, Sangbong, Jin, Yejin, Yoon, Chanyoung, Yim, Soobin, Choi, Seokhwan, Jang, Yun
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 17.12.2022
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Machine learning (ML) models are used to mine inconspicuous information in big data. The model and data quality influence the performance of a ML model. However, modifying the ML model while measuring performance is impractical, and low-quality data causes biased model training. Therefore, improving the data quality is essential. Visual analytics systems supporting DQI (Data Quality Improvement) have been proposed in the past. However, in the studies, it is difficult for users to assess comprehensive data quality improvement methods for machine learning and to determine an appropriate data quality improvement process. In this paper, we propose a novel visual analytics system for managing data quality used in machine learning models.
DOI:10.1109/BigData55660.2022.10020585