Assessing the documentation of publicly available medical image and signal datasets and their impact on bias using the BEAMRAD tool

Medical datasets are vital for advancing Artificial Intelligence (AI) in healthcare. Yet biases in these datasets on which deep-learning models are trained can compromise reliability. This study investigates biases stemming from dataset-creation practices. Drawing on existing guidelines, we first de...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Scientific reports Ročník 14; číslo 1; s. 31846 - 15
Hlavní autoři:	Galanty, Maria, Luitse, Dieuwertje, Noteboom, Sijm H., Croon, Philip, Vlaar, Alexander P., Poell, Thomas, Sanchez, Clara I., Blanke, Tobias, Išgum, Ivana
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	London Nature Publishing Group UK 30.12.2024 Nature Publishing Group Nature Portfolio
Témata:	639/705/1046 639/705/117 692/700 Artificial Intelligence Bias Data acquisition Data processing Datasets Datasets as Topic Deep Learning Diagnostic Imaging - methods Documentation Documentation - methods EKG Electrocardiography - methods Humanities and Social Sciences Humans Magnetic resonance imaging Magnetic Resonance Imaging - methods multidisciplinary Photography Reproducibility of Results Science Science (multidisciplinary)
ISSN:	2045-2322, 2045-2322
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Medical datasets are vital for advancing Artificial Intelligence (AI) in healthcare. Yet biases in these datasets on which deep-learning models are trained can compromise reliability. This study investigates biases stemming from dataset-creation practices. Drawing on existing guidelines, we first developed a BEAMRAD tool to assess the documentation of public Magnetic Resonance Imaging (MRI); Color Fundus Photography (CFP), and Electrocardiogram (ECG) datasets. In doing so, we provide an overview of the biases that may emerge due to inadequate dataset documentation. Second, we examine the current state of documentation for public medical images and signal data. Our research reveals that there is substantial variance in the documentation of image and signal datasets, even though guidelines have been developed in medical imaging. This indicates that dataset documentation is subject to individual discretionary decisions. Furthermore, we find that aspects such as hardware and data acquisition details are commonly documented, while information regarding data annotation practices, annotation error quantification, or data limitations are not consistently reported. This risks having considerable implications for the abilities of data users to detect potential sources of bias through these respective aspects and develop reliable and robust models that can be adapted for clinical practice.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-024-83218-5