How Can We Improve Data Quality for Machine Learning? A Visual Analytics System using Data and Process-driven Strategies

ML (Machine learning) models are used to mine inconspicuous information in big data. The model and data quality influence the performance of a machine-learning model. However, it is inefficient to modify the model, which is a black box, and low-quality data tends to cause biased learning of the mode...

Full description

Saved in:
Bibliographic Details
Published in:IEEE Pacific Visualization Symposium pp. 112 - 121
Main Authors: Hong, Hyein, Yoo, Sangbong, Jin, Yejin, Jang, Yun
Format: Conference Proceeding
Language:English
Published: IEEE 01.04.2023
Subjects:
ISSN:2165-8773
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:ML (Machine learning) models are used to mine inconspicuous information in big data. The model and data quality influence the performance of a machine-learning model. However, it is inefficient to modify the model, which is a black box, and low-quality data tends to cause biased learning of the model. Therefore, it is crucial to improve the data quality. Different techniques have been used to improve data quality depending on the data conditions and the data quality issues. Therefore, improving data quality is time-consuming and challenging for users with insufficient knowledge of data. Visual analytics techniques have been proposed to focus on decision support to improve data quality. However, existing studies are complicated for users to consider a comprehensive DQI (Data Quality Improvement) method for generating data suitable for ML models. Also, it remains limited in that users must directly consider all combinations of DQI processes. This paper presents a novel visual analytics system that manages data quality for use in ML models. The proposed system suggests an optimal quality improvement process with visualization techniques such as heatmap, histogram, and scatter plot to support DQI.
ISSN:2165-8773
DOI:10.1109/PacificVis56936.2023.00020