A survey on dataset quality in machine learning

With the rise of big data, the quality of datasets has become a crucial factor affecting the performance of machine learning models. High-quality datasets are essential for the realization of data value. This survey article summarizes the research direction of dataset quality in machine learning, in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information and software technology Jg. 162; S. 107268
Hauptverfasser: Gong, Youdi, Liu, Guangzhen, Xue, Yunzhi, Li, Rui, Meng, Lingzhong
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 01.10.2023
Schlagworte:
ISSN:0950-5849, 1873-6025
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract With the rise of big data, the quality of datasets has become a crucial factor affecting the performance of machine learning models. High-quality datasets are essential for the realization of data value. This survey article summarizes the research direction of dataset quality in machine learning, including the definition of related concepts, analysis of quality issues and risks, and a review of dataset quality dimensions and metrics throughout the dataset lifecycle and a review of dataset quality metrics analyzed from a dataset lifecycle perspective and summarized in literatures. Furthermore, this article introduces a comprehensive quality evaluation process, which includes a framework for dataset quality evaluation with dimensions and metrics, computation methods for quality metrics, and assessment models. These studies provide valuable guidance for evaluating dataset quality in the field of machine learning, which can help improve the accuracy, efficiency, and generalization ability of machine learning models, and promote the development and application of artificial intelligence technology.
AbstractList With the rise of big data, the quality of datasets has become a crucial factor affecting the performance of machine learning models. High-quality datasets are essential for the realization of data value. This survey article summarizes the research direction of dataset quality in machine learning, including the definition of related concepts, analysis of quality issues and risks, and a review of dataset quality dimensions and metrics throughout the dataset lifecycle and a review of dataset quality metrics analyzed from a dataset lifecycle perspective and summarized in literatures. Furthermore, this article introduces a comprehensive quality evaluation process, which includes a framework for dataset quality evaluation with dimensions and metrics, computation methods for quality metrics, and assessment models. These studies provide valuable guidance for evaluating dataset quality in the field of machine learning, which can help improve the accuracy, efficiency, and generalization ability of machine learning models, and promote the development and application of artificial intelligence technology.
ArticleNumber 107268
Author Meng, Lingzhong
Gong, Youdi
Xue, Yunzhi
Liu, Guangzhen
Li, Rui
Author_xml – sequence: 1
  givenname: Youdi
  surname: Gong
  fullname: Gong, Youdi
  organization: Institute of Software Chinese Academy of Sciences, Beijing, 100190, China
– sequence: 2
  givenname: Guangzhen
  surname: Liu
  fullname: Liu, Guangzhen
  organization: Institute of Software Chinese Academy of Sciences, Beijing, 100190, China
– sequence: 3
  givenname: Yunzhi
  surname: Xue
  fullname: Xue, Yunzhi
  organization: Institute of Software Chinese Academy of Sciences, Beijing, 100190, China
– sequence: 4
  givenname: Rui
  surname: Li
  fullname: Li, Rui
  organization: Institute of Software Chinese Academy of Sciences, Beijing, 100190, China
– sequence: 5
  givenname: Lingzhong
  surname: Meng
  fullname: Meng, Lingzhong
  email: lingzhong@iscas.ac.cn
  organization: Institute of Software Chinese Academy of Sciences, Beijing, 100190, China
BookMark eNqFz8FKAzEQxvEgFWyrb-BhX2DbySabzXoQStEqFLzoOaTJRFO2WU3SQt_elvXkQU8DA_8PfhMyCn1AQm4pzChQMd_OfHCpd7MKKnZ6NZWQF2RMZcNKAVU9ImNoayhrydsrMklpC0AbYDAm80WR9vGAx6IPhdVZJ8zF1153Ph8LH4qdNh8-YNGhjsGH92ty6XSX8ObnTsnb48Pr8qlcv6yel4t1aRiIXGrrNozrpkEpWzRccwsNVtw65KJ1vNJgQfBNI7SpUdROVAwoYxKcrOmGsim5G3ZN7FOK6JTxWWffhxy17xQFdaarrRro6kxXA_0U81_xZ_Q7HY__ZfdDhifYwWNUyXgMBq2PaLKyvf974BvZ5Hae
CitedBy_id crossref_primary_10_1016_j_eswa_2025_127018
crossref_primary_10_1186_s42234_024_00156_3
crossref_primary_10_1016_j_dss_2025_114493
crossref_primary_10_1109_TCE_2025_3543209
crossref_primary_10_3390_computers14080327
crossref_primary_10_3390_jmse13030559
crossref_primary_10_1016_j_jhydrol_2025_133955
crossref_primary_10_1007_s44210_025_00055_5
crossref_primary_10_3390_info15060295
crossref_primary_10_1016_j_tws_2025_113014
crossref_primary_10_3390_s24041068
crossref_primary_10_1002_aic_18558
crossref_primary_10_1016_j_applthermaleng_2024_125284
crossref_primary_10_1016_j_cscm_2024_e03211
crossref_primary_10_1016_j_rsurfi_2025_100505
crossref_primary_10_1007_s10853_025_11441_0
crossref_primary_10_1038_s41597_025_05309_w
crossref_primary_10_3390_app15158321
crossref_primary_10_1007_s11629_024_9429_7
crossref_primary_10_1371_journal_pcbi_1012550
crossref_primary_10_1007_s44378_025_00086_9
crossref_primary_10_1109_ACCESS_2025_3548167
crossref_primary_10_1021_acs_est_5c03992
crossref_primary_10_1109_ACCESS_2025_3530927
crossref_primary_10_1109_TON_2025_3526228
crossref_primary_10_3390_informatics12020040
crossref_primary_10_3846_jbem_2023_19775
crossref_primary_10_1016_j_jclepro_2024_144621
crossref_primary_10_1016_j_knosys_2025_112979
crossref_primary_10_1016_j_ces_2025_122218
crossref_primary_10_1016_j_compag_2025_110941
crossref_primary_10_1016_j_apsb_2025_02_009
crossref_primary_10_1016_j_insi_2025_100062
crossref_primary_10_1016_j_future_2025_107719
crossref_primary_10_3390_agriengineering6020103
crossref_primary_10_1016_j_atech_2024_100726
crossref_primary_10_1016_j_eswa_2025_127326
crossref_primary_10_1016_j_chemolab_2024_105278
crossref_primary_10_3390_computers13100253
crossref_primary_10_1016_j_engappai_2024_109404
crossref_primary_10_1080_00051144_2025_2480423
crossref_primary_10_1016_j_est_2024_110560
crossref_primary_10_1038_s41598_025_92223_1
crossref_primary_10_1109_ACCESS_2025_3578528
crossref_primary_10_2478_ijssis_2025_0011
crossref_primary_10_1007_s42979_025_03736_5
crossref_primary_10_1016_j_envres_2024_120683
crossref_primary_10_1038_s41597_024_03574_9
crossref_primary_10_1016_j_heliyon_2024_e38910
crossref_primary_10_1146_annurev_biodatasci_103123_094601
crossref_primary_10_1109_TGRS_2025_3562257
crossref_primary_10_1016_j_biotechadv_2025_108698
crossref_primary_10_1007_s10664_025_10631_3
crossref_primary_10_1007_s10462_025_11275_x
crossref_primary_10_12677_ecl_2025_1462028
crossref_primary_10_1109_JSEN_2024_3453326
crossref_primary_10_22399_ijcesen_3110
crossref_primary_10_1016_j_ijfatigue_2025_108965
crossref_primary_10_3390_math13172859
crossref_primary_10_1016_j_neucom_2024_127493
crossref_primary_10_3390_su152215761
crossref_primary_10_1016_j_intermet_2025_108921
crossref_primary_10_1109_ACCESS_2025_3601031
crossref_primary_10_1007_s12665_024_11600_7
crossref_primary_10_1016_j_enconman_2025_120001
crossref_primary_10_3390_rs15205040
crossref_primary_10_3390_bioengineering12090908
crossref_primary_10_1007_s13042_025_02546_8
crossref_primary_10_1016_j_asr_2024_11_062
crossref_primary_10_3390_electronics14112248
crossref_primary_10_1016_j_atech_2025_100923
crossref_primary_10_3390_info16060474
crossref_primary_10_1016_j_array_2025_100380
crossref_primary_10_3390_electronics14142831
crossref_primary_10_1007_s10845_025_02646_w
crossref_primary_10_1016_j_jbi_2025_104812
crossref_primary_10_1016_j_dib_2024_110821
crossref_primary_10_3390_app15020933
crossref_primary_10_3390_catal15090842
crossref_primary_10_1038_s44303_025_00092_0
crossref_primary_10_3389_frobt_2024_1434351
crossref_primary_10_1038_s41598_024_84673_w
crossref_primary_10_1016_j_hazadv_2025_100699
crossref_primary_10_3390_app142411978
crossref_primary_10_1109_ACCESS_2024_3414651
crossref_primary_10_3389_frai_2025_1621514
crossref_primary_10_3390_eng5040172
crossref_primary_10_1007_s13369_025_10276_w
crossref_primary_10_1016_j_jnucmat_2025_156126
crossref_primary_10_1016_j_iot_2025_101753
crossref_primary_10_1016_j_jss_2024_112058
crossref_primary_10_1016_j_aquaculture_2025_742303
crossref_primary_10_1007_s10710_024_09501_6
crossref_primary_10_3389_frai_2025_1640805
crossref_primary_10_1016_j_actaastro_2025_04_040
crossref_primary_10_1007_s41207_024_00659_0
crossref_primary_10_1109_ACCESS_2024_3411091
crossref_primary_10_1016_j_ymssp_2024_111103
crossref_primary_10_1016_j_engappai_2024_109170
crossref_primary_10_1109_ACCESS_2024_3491856
Cites_doi 10.1109/ASRU.2015.7404808
10.1007/978-3-319-11955-7_72
10.1016/j.dss.2018.03.011
10.1145/3190578
10.1007/978-3-319-10602-1_48
10.1186/s40537-021-00468-0
10.1109/BigDataCongress.2018.00029
10.21437/Interspeech.2016-805
10.1109/CVPR.2009.5206848
10.1007/s10676-021-09608-9
10.1016/j.patter.2021.100241
10.1145/3592616
10.1109/TPAMI.2017.2723009
10.1186/s40537-021-00439-5
10.1145/3592786
10.1007/3-540-45153-6_7
10.1109/TSE.2015.2479217
10.1109/ICBDCI.2019.8686099
10.18653/v1/D19-1018
10.1145/1060745.1060764
10.1007/s11263-009-0275-4
10.1016/j.future.2018.07.014
10.1109/INNOVATIONS.2018.8605945
ContentType Journal Article
Copyright 2023 The Authors
Copyright_xml – notice: 2023 The Authors
DBID 6I.
AAFTH
AAYXX
CITATION
DOI 10.1016/j.infsof.2023.107268
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Business
EISSN 1873-6025
ExternalDocumentID 10_1016_j_infsof_2023_107268
S0950584923001222
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
1B1
1~.
1~5
29I
4.4
457
4G.
5GY
5VS
6I.
7-5
71M
77K
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAFTH
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
AAYOK
ABBOA
ABFNM
ABFRF
ABJNI
ABMAC
ABTAH
ABXDB
ABYKQ
ACDAQ
ACGFO
ACGFS
ACGOD
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
AEBSH
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BKOJK
BKOMP
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
J1W
KOM
LG9
M41
MO0
MS~
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSV
SSZ
T5K
TWZ
UHS
UNMZH
WH7
WUQ
XFK
ZY4
~G-
77I
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c306t-adfb34a77e889ec4a4d07e24dfe469f42a0d064b76ac5e65f623013380f851b13
ISICitedReferencesCount 131
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001035352200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0950-5849
IngestDate Sat Nov 29 07:07:48 EST 2025
Tue Nov 18 21:19:01 EST 2025
Fri Feb 23 02:36:40 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Dataset quality
Machine Learning
Dataset
Language English
License This is an open access article under the CC BY-NC-ND license.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c306t-adfb34a77e889ec4a4d07e24dfe469f42a0d064b76ac5e65f623013380f851b13
OpenAccessLink https://dx.doi.org/10.1016/j.infsof.2023.107268
ParticipantIDs crossref_citationtrail_10_1016_j_infsof_2023_107268
crossref_primary_10_1016_j_infsof_2023_107268
elsevier_sciencedirect_doi_10_1016_j_infsof_2023_107268
PublicationCentury 2000
PublicationDate October 2023
2023-10-00
PublicationDateYYYYMMDD 2023-10-01
PublicationDate_xml – month: 10
  year: 2023
  text: October 2023
PublicationDecade 2020
PublicationTitle Information and software technology
PublicationYear 2023
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Zhang, Zhu, Wright (b42) 2018
Lang (b2) 1995; 1995
M. Abdallah, Big Data Quality Challenges, in: 2019 International Conference on Big Data and Computational Intelligence, ICBDCI, 2019.
Mohan, Jianzhong (b66) 2016
(b18) 2002
He, Yang, Zhang (b67) 2020
Maas, Daly, Pham (b25) 2011
Heinrich, Klier, Schiller, G. (b63) 2018; 110
Luong, Singh, Ramezani (b61) 2019; 3
G.D. Corso, A. Gullí, F. Romani, Ranking a stream of news, in: International Conference on World Wide Web, DBLP, 2005, p. 97.
Mirakhorli, Cleland-Huang (b41) 2016; 42
Chang (b56) 2022
Wook, Hasbullah, Zainudin (b57) 2021; 8
Socher, Perelygin, Wu (b6) 2013
Scantamburlo (b58) 2021; 23
Picard, Chapdelaine, Cappi (b51) 2020
I. Taleb, M.A. Serhani, R. Dssouli, Big Data Quality Assessment Model for Unstructured Data, in: IIT 2018 : 13th International Conference on Innovations in Information Technology, 2018.
Diaz, Bavota, Marcus (b38) 2013
Taleb, Serhani, Bouhaddioui, Dssouli (b35) 2021; 8
.
J. Deng, W. Dong, R. Socher, et al., ImageNet : A Large-Scale Hierarchical Image Database, in: Proc. CVPR, Vol. 2009, 2009.
Hongxun, Honggang, Kun (b53) 2018
I. Taleb, M.A. Serhani, R. Dssouli, Big Data Quality: A Survey, in: Big Data Congress 2018, 2018.
Panayotov, Chen, Povey (b17) 2015
Northcutt, Jiang, Chuang (b28) 2021
Zogaan, Sharma, Mirahkorli (b40) 2017
J. Priem, D. Taraborelli, P. Groth, et al. Altmetrics: A manifesto. [2010-10-26].
Takahashi, Gygli, Pfister (b24) 2016
Ardagna, Cappiello, Samá (b64) 2018; 89
Yulin, Yi, Dexin, Baihao, Jiajie (b48) 2021; 38
Garofolo (b20) 1993
Zog Aa, Sharma, Mirahkorli (b65) 2017
Nene, Nayar, Murase (b15) 1996
Zhou, Lapedriza, Khosla, Oliva, Torralba (b12) 2018; 40
Hooker (b59) 2021; 2
J. Ni, J. Li, J. Mcauley, Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, 2019.
Guo, Jazaery (b32) 2019; 9
Escudero, Novoa, Mahu (b45) 2018
T.Y. Lin, M. Maire, S. Belongie, et al., Microsoft COCO: Common Objects in Context, in: European Conference on Computer Vision, 2014.
Priestley, O’Donnell, Simperl (b54) 2023
Nehmé, Delanoy, Dupont, Farrugia, Callet, Lavoué (b55) 2023
(b37) 2018
Fabbrizzi, Papadopoulos, Ntoutsi (b31) 2021
Christian, Theresia, Simone (b30) 2018; 10
Song, Shuang, Guo (b27) 2013
Cai, Wang, Liu, Zhu (b49) 2020; 31
E. Ruckhaus, M. Vidal, S. Castillo, et al., Analyzing linked data quality with LiQuate, in: Proc. of the European Semantic Web Conf., 2014, pp. 488–493.
Xie, Guo, Gao (b33) 2020; 2020
C. Lin, ROUGE:A package for automatic evaluation of summaries, in: Proc. of the Meeting of the Association for Computational Linguistics, 2004, pp. 74–81.
Gervasi, Zowghi (b39) 2014
Chug, Kaushal, Kumaraguru (b52) 2021
Jin, Wei, Ding (b69) 2004
Shi, Zhang, Ge (b60) 2019
Snyder, Chen, Povey (b22) 2015
Li, Goh, Jin (b62) 2018
Li, Song, Xu (b36) 2020
Li, Lee, Gao, Huang (b5) 2013; vol. 400
Everingham, Van Gool, Williams, Winn, Zisserman (b9) 2010; 88
Ju, chun, Jian (b71) 2001; 21
Rosli, Tempero, Luxton-Reilly (b29) 2018; 24
N. Japkowicz, Concept-Learning in the Presence of Between-Class and Within-Class Imbalances, in: Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence, 2001, pp. 67–77.
GB/T 36344-2018 Information technology—Evaluation indicators for data quality.
Birodkar, Mobahi, Bengio (b68) 2019
Krizhevsky, Hinton (b13) 2009
Liu, Luo, Wang (b16) 2014
N. Ruiz, M. Federico, Phonetically-oriented word error alignment for speech recognition error analysis in speech translation, in: Proc. of the Automatic Speech Recognition and Understanding, 2016, pp. 296–302.
Cai (10.1016/j.infsof.2023.107268_b49) 2020; 31
Garofolo (10.1016/j.infsof.2023.107268_b20) 1993
He (10.1016/j.infsof.2023.107268_b67) 2020
Chug (10.1016/j.infsof.2023.107268_b52) 2021
Liu (10.1016/j.infsof.2023.107268_b16) 2014
Birodkar (10.1016/j.infsof.2023.107268_b68) 2019
Nehmé (10.1016/j.infsof.2023.107268_b55) 2023
Li (10.1016/j.infsof.2023.107268_b5) 2013; vol. 400
Guo (10.1016/j.infsof.2023.107268_b32) 2019; 9
Shi (10.1016/j.infsof.2023.107268_b60) 2019
Wook (10.1016/j.infsof.2023.107268_b57) 2021; 8
10.1016/j.infsof.2023.107268_b70
Picard (10.1016/j.infsof.2023.107268_b51) 2020
Mohan (10.1016/j.infsof.2023.107268_b66) 2016
Chang (10.1016/j.infsof.2023.107268_b56) 2022
Maas (10.1016/j.infsof.2023.107268_b25) 2011
10.1016/j.infsof.2023.107268_b34
Hooker (10.1016/j.infsof.2023.107268_b59) 2021; 2
Diaz (10.1016/j.infsof.2023.107268_b38) 2013
Ardagna (10.1016/j.infsof.2023.107268_b64) 2018; 89
Takahashi (10.1016/j.infsof.2023.107268_b24) 2016
Christian (10.1016/j.infsof.2023.107268_b30) 2018; 10
Escudero (10.1016/j.infsof.2023.107268_b45) 2018
Scantamburlo (10.1016/j.infsof.2023.107268_b58) 2021; 23
Yulin (10.1016/j.infsof.2023.107268_b48) 2021; 38
Lang (10.1016/j.infsof.2023.107268_b2) 1995; 1995
Zhou (10.1016/j.infsof.2023.107268_b12) 2018; 40
Jin (10.1016/j.infsof.2023.107268_b69) 2004
10.1016/j.infsof.2023.107268_b46
10.1016/j.infsof.2023.107268_b43
10.1016/j.infsof.2023.107268_b44
10.1016/j.infsof.2023.107268_b47
10.1016/j.infsof.2023.107268_b19
Mirakhorli (10.1016/j.infsof.2023.107268_b41) 2016; 42
Snyder (10.1016/j.infsof.2023.107268_b22) 2015
Rosli (10.1016/j.infsof.2023.107268_b29) 2018; 24
Song (10.1016/j.infsof.2023.107268_b27) 2013
Zhang (10.1016/j.infsof.2023.107268_b42) 2018
Everingham (10.1016/j.infsof.2023.107268_b9) 2010; 88
Zog Aa (10.1016/j.infsof.2023.107268_b65) 2017
10.1016/j.infsof.2023.107268_b50
10.1016/j.infsof.2023.107268_b10
10.1016/j.infsof.2023.107268_b11
Xie (10.1016/j.infsof.2023.107268_b33) 2020; 2020
Hongxun (10.1016/j.infsof.2023.107268_b53) 2018
10.1016/j.infsof.2023.107268_b14
Taleb (10.1016/j.infsof.2023.107268_b35) 2021; 8
10.1016/j.infsof.2023.107268_b7
10.1016/j.infsof.2023.107268_b8
Heinrich (10.1016/j.infsof.2023.107268_b63) 2018; 110
10.1016/j.infsof.2023.107268_b1
Northcutt (10.1016/j.infsof.2023.107268_b28) 2021
10.1016/j.infsof.2023.107268_b3
(10.1016/j.infsof.2023.107268_b37) 2018
10.1016/j.infsof.2023.107268_b4
Li (10.1016/j.infsof.2023.107268_b36) 2020
Socher (10.1016/j.infsof.2023.107268_b6) 2013
Gervasi (10.1016/j.infsof.2023.107268_b39) 2014
Panayotov (10.1016/j.infsof.2023.107268_b17) 2015
Ju (10.1016/j.infsof.2023.107268_b71) 2001; 21
Li (10.1016/j.infsof.2023.107268_b62) 2018
Krizhevsky (10.1016/j.infsof.2023.107268_b13) 2009
Nene (10.1016/j.infsof.2023.107268_b15) 1996
(10.1016/j.infsof.2023.107268_b18) 2002
Luong (10.1016/j.infsof.2023.107268_b61) 2019; 3
10.1016/j.infsof.2023.107268_b23
10.1016/j.infsof.2023.107268_b21
Fabbrizzi (10.1016/j.infsof.2023.107268_b31) 2021
Priestley (10.1016/j.infsof.2023.107268_b54) 2023
10.1016/j.infsof.2023.107268_b26
Zogaan (10.1016/j.infsof.2023.107268_b40) 2017
References_xml – volume: 40
  start-page: 1452
  year: 2018
  end-page: 1464
  ident: b12
  article-title: Places: A 10 million image database for scene recognition
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– year: 2014
  ident: b16
  article-title: Deep learning face attributes in the wild
– year: 2020
  ident: b36
  article-title: Studies on data quality evaluation index system for internet plus government services in big data era
– volume: 38
  start-page: 170
  year: 2021
  end-page: 179
  ident: b48
  article-title: A new method for measuring the distribution consistency of mixed-attribute datasets
  publication-title: J. Shenzhen Univ. (Sci. Technol. Ed.)
– reference: E. Ruckhaus, M. Vidal, S. Castillo, et al., Analyzing linked data quality with LiQuate, in: Proc. of the European Semantic Web Conf., 2014, pp. 488–493.
– reference: J. Ni, J. Li, J. Mcauley, Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP, 2019.
– year: 2020
  ident: b67
  article-title: Sample-efficient deep learning for COVID-19 diagnosis based on CT scans
– volume: 8
  start-page: 1
  year: 2021
  end-page: 41
  ident: b35
  article-title: Big data quality framework: A holistic approach to continuous quality management
  publication-title: J. Big Data
– volume: 2020
  year: 2020
  ident: b33
  article-title: Conceptual cognitive modeling for fine-grained annotation quality assessment of object detection datasets
  publication-title: Discrete Dyn. Nat. Soc.
– year: 2021
  ident: b28
  article-title: Confident learning: Estimating uncertainty in dataset labels
– year: 2017
  ident: b65
  article-title: Datasets used in fifteen years of automated requirements traceability research
– year: 2002
  ident: b18
  article-title: Linguistic data consortium
– reference: N. Ruiz, M. Federico, Phonetically-oriented word error alignment for speech recognition error analysis in speech translation, in: Proc. of the Automatic Speech Recognition and Understanding, 2016, pp. 296–302.
– year: 2023
  ident: b54
  article-title: A survey of data quality requirements that matter in ML development pipelines
  publication-title: J. Data Inf. Qual.
– year: 2023
  ident: b55
  article-title: Textured mesh quality assessment: Large-scale dataset and deep learning-based quality metric
  publication-title: ACM Trans. Graph.
– year: 2016
  ident: b24
  article-title: Deep convolutional neural networks and data augmentation for acoustic event recognition
  publication-title: Interspeech
– year: 2014
  ident: b39
  article-title: Supporting traceability through affinity mining
  publication-title: Requirements Engineering Conference
– year: 2018
  ident: b42
  article-title: Training set debugging using trusted items
– start-page: 1
  year: 2019
  end-page: 8
  ident: b60
  article-title: An association-based intrinsic quality index for healthcare dataset ranking
  publication-title: 2019 IEEE International Conference on Healthcare Informatics
– reference: I. Taleb, M.A. Serhani, R. Dssouli, Big Data Quality Assessment Model for Unstructured Data, in: IIT 2018 : 13th International Conference on Innovations in Information Technology, 2018.
– volume: 1995
  start-page: 331
  year: 1995
  end-page: 339
  ident: b2
  article-title: NewsWeeder: Learning to filter netnews
  publication-title: Mach. Learn. Proc.
– reference: M. Abdallah, Big Data Quality Challenges, in: 2019 International Conference on Big Data and Computational Intelligence, ICBDCI, 2019.
– year: 2017
  ident: b40
  article-title: Datasets from fifteen years of automated requirements traceability research: Current state, characteristics, and quality
  publication-title: Requirements Engineering Conference
– volume: 31
  start-page: 302
  year: 2020
  end-page: 320
  ident: b49
  article-title: Survey of data annotation
  publication-title: J. Softw.
– year: 2013
  ident: b38
  article-title: Using code ownership to improve IR-based traceability link recovery, program comprehension (ICPC)
  publication-title: 2013 IEEE 21st International Conference on
– volume: 2
  year: 2021
  ident: b59
  article-title: Moving beyond algorithmic bias is a data problem
  publication-title: Patterns
– year: 2018
  ident: b62
  article-title: How textual quality of online reviews affect classification performance:A case of deep learning sentiment analysis
  publication-title: Neural Comput. Appl.
– volume: 88
  start-page: 303
  year: 2010
  end-page: 338
  ident: b9
  publication-title: Int. J. Comput. Vis.
– volume: 9
  year: 2019
  ident: b32
  article-title: Automated cleaning of identity label noise in a large face dataset with quality control
  publication-title: IET Biometrics
– year: 2022
  ident: b56
  article-title: ISO/IEC JTC 1/SC 42(AI)/WG 2(data) data quality for analytics and machine learning (ML)
– year: 2013
  ident: b27
  article-title: Data quality and data cleaning methods
– year: 2021
  ident: b31
  article-title: A survey on bias in visual datasets
– year: 2016
  ident: b66
  article-title: Data currency determination: Key theories and technologies
  publication-title: Intell. Comput. Appl.
– volume: 24
  start-page: 7232
  year: 2018
  end-page: 7239
  ident: b29
  article-title: Evaluating the quality of datasets in software engineering
  publication-title: J. Comput. Theor. Nanosci.
– year: 2015
  ident: b22
  article-title: MUSAN: A music, speech, and noise corpus
  publication-title: Comput. Sci.
– year: 2011
  ident: b25
  article-title: Learning Word Vectors for Sentiment Analysis
– reference: GB/T 36344-2018 Information technology—Evaluation indicators for data quality.
– reference: J. Priem, D. Taraborelli, P. Groth, et al. Altmetrics: A manifesto. [2010-10-26].
– volume: 8
  start-page: 1
  year: 2021
  end-page: 15
  ident: b57
  article-title: Exploring big data traits and data quality dimensions for big data analytics application using partial least squares structural equation modelling
  publication-title: J. Big Data
– year: 2019
  ident: b68
  article-title: Semantic redundancies in image-classification datasets: The 10% you don’t need
– volume: 21
  start-page: 43
  year: 2001
  end-page: 48
  ident: b71
  article-title: New study on determining the weight of index in synthetic weighted mark method
  publication-title: Syst. Eng.-Theory Pract.
– volume: vol. 400
  year: 2013
  ident: b5
  article-title: Semi-supervised text categorization by considering sufficiency and diversity
  publication-title: Natural Language Processing and Chinese Computing
– reference: J. Deng, W. Dong, R. Socher, et al., ImageNet : A Large-Scale Hierarchical Image Database, in: Proc. CVPR, Vol. 2009, 2009.
– volume: 110
  start-page: 95
  year: 2018
  end-page: 106
  ident: b63
  article-title: Assessing data quality–A probability-based metric for semantic consistency
  publication-title: Decis. Support Syst.
– year: 2018
  ident: b45
  article-title: An improved DNN-based spectral feature mapping that removes noise and reverberation for robust automatic speech recognition
– year: 1993
  ident: b20
  article-title: TIMIT acoustic-phonetic continuous speech corpus LDC93s1
– volume: 42
  start-page: 1
  year: 2016
  ident: b41
  article-title: Detecting, tracing, and monitoring architectural tactics in code
  publication-title: IEEE Trans Softw Eng
– year: 2021
  ident: b52
  article-title: Statistical learning to operationalize a domain agnostic data quality scoring
– year: 1996
  ident: b15
  article-title: Columbia Object Image Library (COIL-100)
– reference: C. Lin, ROUGE:A package for automatic evaluation of summaries, in: Proc. of the Meeting of the Association for Computational Linguistics, 2004, pp. 74–81.
– volume: 23
  start-page: 703
  year: 2021
  end-page: 712
  ident: b58
  article-title: Non-empirical problems in fair machine learning
  publication-title: Ethics Inf. Technol.
– year: 2013
  ident: b6
  article-title: Recursive deep models for semantic compositionality over a sentiment treebank
  publication-title: Empirical Methods in Natural Language Processing
– year: 2009
  ident: b13
  article-title: Learning multiple layers of features from tiny images
  publication-title: Handbook of Systemic Autoimmune Diseases, Vol. 1, no. 4
– reference: N. Japkowicz, Concept-Learning in the Presence of Between-Class and Within-Class Imbalances, in: Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence, 2001, pp. 67–77.
– reference: G.D. Corso, A. Gullí, F. Romani, Ranking a stream of news, in: International Conference on World Wide Web, DBLP, 2005, p. 97.
– year: 2015
  ident: b17
  article-title: Librispeech: An ASR corpus based on public domain audio books
  publication-title: ICASSP 2015-2015 IEEE International Conference on Acoustics, Speech and Signal Processing
– year: 2020
  ident: b51
  article-title: Ensuring Dataset Quality for Machine Learning Certification
– reference: I. Taleb, M.A. Serhani, R. Dssouli, Big Data Quality: A Survey, in: Big Data Congress 2018, 2018.
– reference: .
– year: 2018
  ident: b37
  article-title: Construction of big data quality measurement model
  publication-title: Information Studies:Theory and Application
– start-page: 248
  year: 2018
  end-page: 252
  ident: b53
  article-title: Data quality assessment for on-line monitoring and measuring system of power quality based on big data and data provenance theory
– volume: 10
  start-page: 1
  year: 2018
  end-page: 26
  ident: b30
  article-title: Visual interactive creation, customization, and analysis of data quality metrics
  publication-title: J. Data Inf. Qual.
– volume: 89
  start-page: 548
  year: 2018
  end-page: 562
  ident: b64
  article-title: Context-aware data quality assessment for big data
  publication-title: Future Gener. Comput. Syst.
– volume: 3
  start-page: 1
  year: 2019
  end-page: 19
  ident: b61
  article-title: longSil: An evaluation metric to assess quality of clustering longitudinal clinical data
  publication-title: J. Healthc. Inf. Res.
– start-page: 144
  year: 2004
  end-page: 147
  ident: b69
  article-title: Fuzzy comprehensive evaluation model based on improved analytic hierarchy process
  publication-title: J. Hydraul. Eng.
– reference: T.Y. Lin, M. Maire, S. Belongie, et al., Microsoft COCO: Common Objects in Context, in: European Conference on Computer Vision, 2014.
– year: 2020
  ident: 10.1016/j.infsof.2023.107268_b67
– year: 2013
  ident: 10.1016/j.infsof.2023.107268_b27
– ident: 10.1016/j.infsof.2023.107268_b44
  doi: 10.1109/ASRU.2015.7404808
– ident: 10.1016/j.infsof.2023.107268_b43
  doi: 10.1007/978-3-319-11955-7_72
– year: 2016
  ident: 10.1016/j.infsof.2023.107268_b66
  article-title: Data currency determination: Key theories and technologies
  publication-title: Intell. Comput. Appl.
– year: 2018
  ident: 10.1016/j.infsof.2023.107268_b62
  article-title: How textual quality of online reviews affect classification performance:A case of deep learning sentiment analysis
  publication-title: Neural Comput. Appl.
– volume: 110
  start-page: 95
  year: 2018
  ident: 10.1016/j.infsof.2023.107268_b63
  article-title: Assessing data quality–A probability-based metric for semantic consistency
  publication-title: Decis. Support Syst.
  doi: 10.1016/j.dss.2018.03.011
– year: 1993
  ident: 10.1016/j.infsof.2023.107268_b20
– volume: 10
  start-page: 1
  issue: 1
  year: 2018
  ident: 10.1016/j.infsof.2023.107268_b30
  article-title: Visual interactive creation, customization, and analysis of data quality metrics
  publication-title: J. Data Inf. Qual.
  doi: 10.1145/3190578
– year: 2018
  ident: 10.1016/j.infsof.2023.107268_b45
– ident: 10.1016/j.infsof.2023.107268_b14
  doi: 10.1007/978-3-319-10602-1_48
– ident: 10.1016/j.infsof.2023.107268_b70
– year: 2013
  ident: 10.1016/j.infsof.2023.107268_b38
  article-title: Using code ownership to improve IR-based traceability link recovery, program comprehension (ICPC)
– year: 2018
  ident: 10.1016/j.infsof.2023.107268_b42
– volume: 8
  start-page: 1
  issue: 1
  year: 2021
  ident: 10.1016/j.infsof.2023.107268_b35
  article-title: Big data quality framework: A holistic approach to continuous quality management
  publication-title: J. Big Data
  doi: 10.1186/s40537-021-00468-0
– year: 2020
  ident: 10.1016/j.infsof.2023.107268_b36
– year: 2017
  ident: 10.1016/j.infsof.2023.107268_b65
– ident: 10.1016/j.infsof.2023.107268_b19
– volume: 3
  start-page: 1
  issue: 1
  year: 2019
  ident: 10.1016/j.infsof.2023.107268_b61
  article-title: longSil: An evaluation metric to assess quality of clustering longitudinal clinical data
  publication-title: J. Healthc. Inf. Res.
– start-page: 248
  year: 2018
  ident: 10.1016/j.infsof.2023.107268_b53
– ident: 10.1016/j.infsof.2023.107268_b34
  doi: 10.1109/BigDataCongress.2018.00029
– start-page: 1
  year: 2019
  ident: 10.1016/j.infsof.2023.107268_b60
  article-title: An association-based intrinsic quality index for healthcare dataset ranking
– year: 2016
  ident: 10.1016/j.infsof.2023.107268_b24
  article-title: Deep convolutional neural networks and data augmentation for acoustic event recognition
  publication-title: Interspeech
  doi: 10.21437/Interspeech.2016-805
– year: 2022
  ident: 10.1016/j.infsof.2023.107268_b56
– volume: 21
  start-page: 43
  issue: 8
  year: 2001
  ident: 10.1016/j.infsof.2023.107268_b71
  article-title: New study on determining the weight of index in synthetic weighted mark method
  publication-title: Syst. Eng.-Theory Pract.
– ident: 10.1016/j.infsof.2023.107268_b8
– ident: 10.1016/j.infsof.2023.107268_b10
  doi: 10.1109/CVPR.2009.5206848
– volume: 23
  start-page: 703
  issue: 4
  year: 2021
  ident: 10.1016/j.infsof.2023.107268_b58
  article-title: Non-empirical problems in fair machine learning
  publication-title: Ethics Inf. Technol.
  doi: 10.1007/s10676-021-09608-9
– volume: 2
  issue: 4
  year: 2021
  ident: 10.1016/j.infsof.2023.107268_b59
  article-title: Moving beyond algorithmic bias is a data problem
  publication-title: Patterns
  doi: 10.1016/j.patter.2021.100241
– year: 2023
  ident: 10.1016/j.infsof.2023.107268_b54
  article-title: A survey of data quality requirements that matter in ML development pipelines
  publication-title: J. Data Inf. Qual.
  doi: 10.1145/3592616
– volume: 40
  start-page: 1452
  issue: 6
  year: 2018
  ident: 10.1016/j.infsof.2023.107268_b12
  article-title: Places: A 10 million image database for scene recognition
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2017.2723009
– volume: 8
  start-page: 1
  issue: 1
  year: 2021
  ident: 10.1016/j.infsof.2023.107268_b57
  article-title: Exploring big data traits and data quality dimensions for big data analytics application using partial least squares structural equation modelling
  publication-title: J. Big Data
  doi: 10.1186/s40537-021-00439-5
– year: 2023
  ident: 10.1016/j.infsof.2023.107268_b55
  article-title: Textured mesh quality assessment: Large-scale dataset and deep learning-based quality metric
  publication-title: ACM Trans. Graph.
  doi: 10.1145/3592786
– volume: 2020
  year: 2020
  ident: 10.1016/j.infsof.2023.107268_b33
  article-title: Conceptual cognitive modeling for fine-grained annotation quality assessment of object detection datasets
  publication-title: Discrete Dyn. Nat. Soc.
– year: 2014
  ident: 10.1016/j.infsof.2023.107268_b16
– ident: 10.1016/j.infsof.2023.107268_b23
– year: 2011
  ident: 10.1016/j.infsof.2023.107268_b25
– ident: 10.1016/j.infsof.2023.107268_b50
– ident: 10.1016/j.infsof.2023.107268_b7
– year: 1996
  ident: 10.1016/j.infsof.2023.107268_b15
– year: 2013
  ident: 10.1016/j.infsof.2023.107268_b6
  article-title: Recursive deep models for semantic compositionality over a sentiment treebank
– ident: 10.1016/j.infsof.2023.107268_b47
  doi: 10.1007/3-540-45153-6_7
– volume: 42
  start-page: 1
  issue: 3
  year: 2016
  ident: 10.1016/j.infsof.2023.107268_b41
  article-title: Detecting, tracing, and monitoring architectural tactics in code
  publication-title: IEEE Trans Softw Eng
  doi: 10.1109/TSE.2015.2479217
– year: 2002
  ident: 10.1016/j.infsof.2023.107268_b18
– ident: 10.1016/j.infsof.2023.107268_b26
  doi: 10.1109/ICBDCI.2019.8686099
– year: 2021
  ident: 10.1016/j.infsof.2023.107268_b28
– volume: vol. 400
  year: 2013
  ident: 10.1016/j.infsof.2023.107268_b5
  article-title: Semi-supervised text categorization by considering sufficiency and diversity
– year: 2015
  ident: 10.1016/j.infsof.2023.107268_b22
  article-title: MUSAN: A music, speech, and noise corpus
  publication-title: Comput. Sci.
– year: 2020
  ident: 10.1016/j.infsof.2023.107268_b51
– volume: 1995
  start-page: 331
  year: 1995
  ident: 10.1016/j.infsof.2023.107268_b2
  article-title: NewsWeeder: Learning to filter netnews
  publication-title: Mach. Learn. Proc.
– year: 2021
  ident: 10.1016/j.infsof.2023.107268_b31
– year: 2018
  ident: 10.1016/j.infsof.2023.107268_b37
  article-title: Construction of big data quality measurement model
– year: 2017
  ident: 10.1016/j.infsof.2023.107268_b40
  article-title: Datasets from fifteen years of automated requirements traceability research: Current state, characteristics, and quality
– volume: 9
  issue: 1
  year: 2019
  ident: 10.1016/j.infsof.2023.107268_b32
  article-title: Automated cleaning of identity label noise in a large face dataset with quality control
  publication-title: IET Biometrics
– year: 2009
  ident: 10.1016/j.infsof.2023.107268_b13
  article-title: Learning multiple layers of features from tiny images
– ident: 10.1016/j.infsof.2023.107268_b4
  doi: 10.18653/v1/D19-1018
– ident: 10.1016/j.infsof.2023.107268_b11
– start-page: 144
  issue: 2
  year: 2004
  ident: 10.1016/j.infsof.2023.107268_b69
  article-title: Fuzzy comprehensive evaluation model based on improved analytic hierarchy process
  publication-title: J. Hydraul. Eng.
– volume: 38
  start-page: 170
  issue: 02
  year: 2021
  ident: 10.1016/j.infsof.2023.107268_b48
  article-title: A new method for measuring the distribution consistency of mixed-attribute datasets
  publication-title: J. Shenzhen Univ. (Sci. Technol. Ed.)
– ident: 10.1016/j.infsof.2023.107268_b3
  doi: 10.1145/1060745.1060764
– volume: 24
  start-page: 7232
  issue: 10
  year: 2018
  ident: 10.1016/j.infsof.2023.107268_b29
  article-title: Evaluating the quality of datasets in software engineering
  publication-title: J. Comput. Theor. Nanosci.
– year: 2021
  ident: 10.1016/j.infsof.2023.107268_b52
– volume: 88
  start-page: 303
  issue: 2
  year: 2010
  ident: 10.1016/j.infsof.2023.107268_b9
  publication-title: Int. J. Comput. Vis.
  doi: 10.1007/s11263-009-0275-4
– year: 2014
  ident: 10.1016/j.infsof.2023.107268_b39
  article-title: Supporting traceability through affinity mining
– ident: 10.1016/j.infsof.2023.107268_b21
– ident: 10.1016/j.infsof.2023.107268_b46
– volume: 89
  start-page: 548
  issue: DEC.
  year: 2018
  ident: 10.1016/j.infsof.2023.107268_b64
  article-title: Context-aware data quality assessment for big data
  publication-title: Future Gener. Comput. Syst.
  doi: 10.1016/j.future.2018.07.014
– year: 2015
  ident: 10.1016/j.infsof.2023.107268_b17
  article-title: Librispeech: An ASR corpus based on public domain audio books
– ident: 10.1016/j.infsof.2023.107268_b1
  doi: 10.1109/INNOVATIONS.2018.8605945
– year: 2019
  ident: 10.1016/j.infsof.2023.107268_b68
– volume: 31
  start-page: 302
  issue: 2
  year: 2020
  ident: 10.1016/j.infsof.2023.107268_b49
  article-title: Survey of data annotation
  publication-title: J. Softw.
SSID ssj0017030
Score 2.6682303
Snippet With the rise of big data, the quality of datasets has become a crucial factor affecting the performance of machine learning models. High-quality datasets are...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 107268
SubjectTerms Dataset
Dataset quality
Machine Learning
Title A survey on dataset quality in machine learning
URI https://dx.doi.org/10.1016/j.infsof.2023.107268
Volume 162
WOSCitedRecordID wos001035352200001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1873-6025
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017030
  issn: 0950-5849
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwELYoVBWXqvQhoLTyobfKNJt44_i4QvQlhHqg1fYU2Y4DQWDQJgHEr2cc28nSrXgcuESryJ48Pu_s59lvZhD6lKiMU5pIInRCCdWlIFJKQeiICbsAYtlpc_7ssf39bDrlv3xBhbprJ8CMya6u-PmTQg3nAGybOvsIuHujcAI-A-hwBNjh-CDgJ5_rdnYBX3XA1eo_ax1SJ7sUv9NOPalDu4jDeXbqc5OaIFGuwUdfWmlYsxCA_-alvOAtiqpX9VRtF2dvhTm8PhqSzKZtFzb925rro7nBHbxtNR94iAcJ2xBBjAjwF37LmXrf6twh7C1j1zVnwVO7oMGx3V7Aw2zbC2wPw28Xxv7nB6uXEQaF2nHurOTWSu6sPEMrMRtzcHQrkx-705_9X0vWxbkCjO7uQz5lJ_pbvJv_85U5DnLwCr30mwc8caCvoSVtXqMXIXfhDfoywQ57fGawxx577HFlsMceB-zfot9fdw92vhPfEYMo2No1RBSlTKhgTGcZ14oKWkRMx7QoNU15SWMRFcAxJUuFGut0XAK5jWwUIiqBWctR8g4tmzOj1xHmioOROFJFnFCYIthISSWyNCkykRZiAyXhuXPly8XbriUn-V1vfQORfta5K5dyz3gWXmnuKZ-jcjmskztnbj7ySu_R6rCIt9ByM2v1B_RcXTRVPfvoF8kNyq94XA
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+survey+on+dataset+quality+in+machine+learning&rft.jtitle=Information+and+software+technology&rft.au=Gong%2C+Youdi&rft.au=Liu%2C+Guangzhen&rft.au=Xue%2C+Yunzhi&rft.au=Li%2C+Rui&rft.date=2023-10-01&rft.issn=0950-5849&rft.volume=162&rft.spage=107268&rft_id=info:doi/10.1016%2Fj.infsof.2023.107268&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_infsof_2023_107268
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0950-5849&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0950-5849&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0950-5849&client=summon