Visual Analytics System of Comprehensive Data Quality Improvement for Machine Learning using Data- and Process-driven Strategies
Machine learning (ML) models are used to mine inconspicuous information in big data. The model and data quality influence the performance of a ML model. However, modifying the ML model while measuring performance is impractical, and low-quality data causes biased model training. Therefore, improving...
Uložené v:
| Vydané v: | 2022 IEEE International Conference on Big Data (Big Data) s. 396 - 401 |
|---|---|
| Hlavní autori: | , , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
17.12.2022
|
| Predmet: | |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | Machine learning (ML) models are used to mine inconspicuous information in big data. The model and data quality influence the performance of a ML model. However, modifying the ML model while measuring performance is impractical, and low-quality data causes biased model training. Therefore, improving the data quality is essential. Visual analytics systems supporting DQI (Data Quality Improvement) have been proposed in the past. However, in the studies, it is difficult for users to assess comprehensive data quality improvement methods for machine learning and to determine an appropriate data quality improvement process. In this paper, we propose a novel visual analytics system for managing data quality used in machine learning models. |
|---|---|
| DOI: | 10.1109/BigData55660.2022.10020585 |