A survey on dataset quality in machine learning

With the rise of big data, the quality of datasets has become a crucial factor affecting the performance of machine learning models. High-quality datasets are essential for the realization of data value. This survey article summarizes the research direction of dataset quality in machine learning, in...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Information and software technology Ročník 162; s. 107268
Hlavní autori: Gong, Youdi, Liu, Guangzhen, Xue, Yunzhi, Li, Rui, Meng, Lingzhong
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 01.10.2023
Predmet:
ISSN:0950-5849, 1873-6025
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:With the rise of big data, the quality of datasets has become a crucial factor affecting the performance of machine learning models. High-quality datasets are essential for the realization of data value. This survey article summarizes the research direction of dataset quality in machine learning, including the definition of related concepts, analysis of quality issues and risks, and a review of dataset quality dimensions and metrics throughout the dataset lifecycle and a review of dataset quality metrics analyzed from a dataset lifecycle perspective and summarized in literatures. Furthermore, this article introduces a comprehensive quality evaluation process, which includes a framework for dataset quality evaluation with dimensions and metrics, computation methods for quality metrics, and assessment models. These studies provide valuable guidance for evaluating dataset quality in the field of machine learning, which can help improve the accuracy, efficiency, and generalization ability of machine learning models, and promote the development and application of artificial intelligence technology.
ISSN:0950-5849
1873-6025
DOI:10.1016/j.infsof.2023.107268