Quality assessment of real-world data repositories across the data life cycle: A literature review

Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frame...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of the American Medical Informatics Association : JAMIA Jg. 28; H. 7; S. 1591
Hauptverfasser: Liaw, Siaw-Teng, Guo, Jason Guan Nan, Ansari, Sameera, Jonnagaddala, Jitendra, Godinho, Myron Anthony, Borelli, Alder Jose, de Lusignan, Simon, Capurro, Daniel, Liyanage, Harshana, Bhattal, Navreet, Bennett, Vicki, Chan, Jaclyn, Kahn, Michael G
Format: Journal Article
Sprache:Englisch
Veröffentlicht: England 14.07.2021
Schlagworte:
ISSN:1527-974X, 1527-974X
Online-Zugang:Weitere Angaben
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle. The review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached. The 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found. A DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation.
AbstractList Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle.OBJECTIVEData quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle.The review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached.MATERIALS AND METHODSThe review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached.The 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found.RESULTSThe 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found.A DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation.CONCLUSIONSA DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation.
Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle. The review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached. The 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found. A DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation.
Author Ansari, Sameera
Bennett, Vicki
Kahn, Michael G
Guo, Jason Guan Nan
Liaw, Siaw-Teng
Borelli, Alder Jose
Liyanage, Harshana
Chan, Jaclyn
Jonnagaddala, Jitendra
de Lusignan, Simon
Bhattal, Navreet
Godinho, Myron Anthony
Capurro, Daniel
Author_xml – sequence: 1
  givenname: Siaw-Teng
  orcidid: 0000-0001-5989-3614
  surname: Liaw
  fullname: Liaw, Siaw-Teng
  organization: WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia
– sequence: 2
  givenname: Jason Guan Nan
  surname: Guo
  fullname: Guo, Jason Guan Nan
  organization: WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia
– sequence: 3
  givenname: Sameera
  surname: Ansari
  fullname: Ansari, Sameera
  organization: WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia
– sequence: 4
  givenname: Jitendra
  orcidid: 0000-0002-9912-2344
  surname: Jonnagaddala
  fullname: Jonnagaddala, Jitendra
  organization: WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia
– sequence: 5
  givenname: Myron Anthony
  orcidid: 0000-0002-0081-2506
  surname: Godinho
  fullname: Godinho, Myron Anthony
  organization: WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia
– sequence: 6
  givenname: Alder Jose
  surname: Borelli
  fullname: Borelli, Alder Jose
  organization: WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia
– sequence: 7
  givenname: Simon
  orcidid: 0000-0002-8553-2641
  surname: de Lusignan
  fullname: de Lusignan, Simon
  organization: Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, United Kingdom
– sequence: 8
  givenname: Daniel
  surname: Capurro
  fullname: Capurro, Daniel
  organization: Faculty of Engineering and Information Technology, University of Melbourne, Melbourne, Victoria, Australia
– sequence: 9
  givenname: Harshana
  surname: Liyanage
  fullname: Liyanage, Harshana
  organization: Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, United Kingdom
– sequence: 10
  givenname: Navreet
  surname: Bhattal
  fullname: Bhattal, Navreet
  organization: Australian Institute of Health and Welfare, Canberra, Australian Capital Territory, Australia
– sequence: 11
  givenname: Vicki
  surname: Bennett
  fullname: Bennett, Vicki
  organization: Australian Institute of Health and Welfare, Canberra, Australian Capital Territory, Australia
– sequence: 12
  givenname: Jaclyn
  surname: Chan
  fullname: Chan, Jaclyn
  organization: Australian Institute of Health and Welfare, Canberra, Australian Capital Territory, Australia
– sequence: 13
  givenname: Michael G
  orcidid: 0000-0003-4786-6875
  surname: Kahn
  fullname: Kahn, Michael G
  organization: Department of Pediatrics (Section of Informatics and Data Sciences), University of Colorado Anschutz Medical Campus, Denver, Colorado, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/33496785$$D View this record in MEDLINE/PubMed
BookMark eNpNkM1LxDAQxYOsuB969Co5eqmbNGnSeFsWv2BBBAVvJU0n2CVt1iR12f_ewq7gZWZ-zHvv8OZo0vseELqm5I4SxZZb3bV66Y3WjJMzNKNFLjMl-efk3z1F8xi3hFCRs-ICTRnjSsiymKH6bdCuTQesY4QYO-gT9hYH0C7b--Aa3OikR9752CYfWohYm-BjxOkLjk_XWsDmYBzc49VICYJOQ4DR9dPC_hKdW-0iXJ32An08Pryvn7PN69PLerXJDM9JGiejqqSMQGmErBlwKxQRUpS84Sq3IIELWtrG1KUVjIJtCAUmJTDFQZJ8gW6PubvgvweIqeraaMA53YMfYpXzklJSECVG6c1JOtQdNNUutJ0Oh-qvl_wXJJFoQQ
CitedBy_id crossref_primary_10_1007_s10742_023_00319_w
crossref_primary_10_1016_j_csbj_2023_10_006
crossref_primary_10_12677_mse_2025_142050
crossref_primary_10_2196_51560
crossref_primary_10_1016_j_conctc_2024_101354
crossref_primary_10_1136_bmjinnov_2021_000903
crossref_primary_10_1016_j_ijmedinf_2025_105814
crossref_primary_10_1016_j_jbi_2022_104110
crossref_primary_10_1080_20479700_2023_2195197
crossref_primary_10_1016_j_ccc_2023_03_002
crossref_primary_10_1186_s12911_022_01961_z
crossref_primary_10_1093_jamiaopen_ooae062
crossref_primary_10_1177_20539517211019430
crossref_primary_10_1016_j_jval_2024_01_019
crossref_primary_10_1093_jamiaopen_ooae044
crossref_primary_10_1053_j_semvascsurg_2021_10_005
crossref_primary_10_1016_j_jbi_2021_103715
crossref_primary_10_1186_s12911_024_02818_3
crossref_primary_10_2147_CMAR_S441359
crossref_primary_10_1093_jamia_ocac228
crossref_primary_10_1186_s12911_024_02644_7
crossref_primary_10_3390_app12094238
crossref_primary_10_1007_s10926_024_10196_w
crossref_primary_10_1055_s_0042_1760238
crossref_primary_10_1097_NR9_0000000000000077
crossref_primary_10_2196_60244
crossref_primary_10_3389_fphar_2022_988974
crossref_primary_10_2196_45948
crossref_primary_10_1007_s10926_024_10175_1
crossref_primary_10_1016_j_jclinepi_2024_111545
crossref_primary_10_1007_s40615_025_02485_8
crossref_primary_10_1186_s12911_021_01524_8
crossref_primary_10_2196_57615
crossref_primary_10_1007_s00296_023_05354_x
crossref_primary_10_1007_s10660_025_09973_3
crossref_primary_10_1016_j_ijmedinf_2021_104470
crossref_primary_10_1016_j_ijmedinf_2023_105262
crossref_primary_10_1186_s12911_021_01643_2
crossref_primary_10_2196_60709
crossref_primary_10_32604_cmc_2023_031491
crossref_primary_10_1093_jamiaopen_ooae058
crossref_primary_10_3390_cells11132112
crossref_primary_10_1007_s11414_023_09875_y
ContentType Journal Article
Copyright The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Copyright_xml – notice: The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1093/jamia/ocaa340
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Medicine
EISSN 1527-974X
ExternalDocumentID 33496785
Genre Journal Article
Review
Research Support, N.I.H., Extramural
GroupedDBID ---
.DC
0R~
18M
29L
2WC
4.4
48X
53G
5GY
5RE
5WD
6PF
7~T
AABZA
AACZT
AAMVS
AAOGV
AAPQZ
AAPXW
AARHZ
AAUAY
AAVAP
AAWTL
ABDFA
ABEJV
ABEUO
ABGNP
ABIXL
ABJNI
ABNHQ
ABOCM
ABPTD
ABQLI
ABQNK
ABVGC
ABWST
ABXVV
ACGFO
ACGFS
ACGOD
ACHQT
ACUFI
ACUTJ
ACYHN
ADBBV
ADGZP
ADHKW
ADHZD
ADIPN
ADJQC
ADQBN
ADRIX
ADRTK
ADVEK
ADYVW
AEGPL
AEJOX
AEKSI
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFIYH
AFOFC
AFXEN
AGINJ
AGQXC
AGSYK
AGUTN
AHMBA
AHMMS
AJEEA
ALIPV
ALMA_UNASSIGNED_HOLDINGS
ALUQC
APIBT
ATGXG
AVWKF
AXUDD
AYCSE
BAWUL
BAYMD
BCRHZ
BEYMZ
BHONS
BTRTY
BVRKM
C45
CDBKE
CGR
CS3
CUY
CVF
DAKXR
DIK
DILTD
DU5
E3Z
EBD
EBS
ECM
EIF
EMOBN
ENERS
F5P
FDB
FECEO
FLUFQ
FOEOM
FOTVD
FQBLK
G-Q
GAUVT
GJXCC
GX1
H13
HAR
IH2
IHE
J21
KBUDW
KOP
KSI
KSN
LSO
MHKGH
NOMLY
NOYVH
NPM
NQ-
O9-
OAUYM
OAWHX
OCZFY
ODMLO
OJQWA
OJZSN
OK1
OPAEJ
OVD
OWPYF
P2P
PAFKI
PEELM
Q5Y
ROX
ROZ
RPM
RPZ
RUSNO
RWL
RXO
SV3
TAE
TEORI
TJX
TMA
WOW
YAYTL
YKOAZ
YXANX
~S-
77I
7X8
ABPQP
ADNBA
AEMQT
AFXAL
AFYAG
AHGBF
AJBYB
AJNCP
ALXQX
JXSIZ
ID FETCH-LOGICAL-c420t-c43198130e8c67b3e4f69067684d492fe7e4618fdcb8f631efd01e377e394e702
IEDL.DBID 7X8
ISICitedReferencesCount 50
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000685209200028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1527-974X
IngestDate Sun Sep 28 03:02:26 EDT 2025
Wed Feb 19 02:27:18 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 7
Keywords data quality
DQ assessment tools
data stewardship
DQ indicators
DQ measures
data custodianship
literature review
Language English
License The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c420t-c43198130e8c67b3e4f69067684d492fe7e4618fdcb8f631efd01e377e394e702
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
ObjectType-Review-3
content type line 23
ORCID 0000-0002-0081-2506
0000-0003-4786-6875
0000-0001-5989-3614
0000-0002-8553-2641
0000-0002-9912-2344
OpenAccessLink https://www.ncbi.nlm.nih.gov/pmc/articles/8475229
PMID 33496785
PQID 2481105096
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2481105096
pubmed_primary_33496785
PublicationCentury 2000
PublicationDate 2021-07-14
PublicationDateYYYYMMDD 2021-07-14
PublicationDate_xml – month: 07
  year: 2021
  text: 2021-07-14
  day: 14
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
PublicationTitle Journal of the American Medical Informatics Association : JAMIA
PublicationTitleAlternate J Am Med Inform Assoc
PublicationYear 2021
SSID ssj0016235
Score 2.5733886
SecondaryResourceType review_article
Snippet Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 1591
SubjectTerms Animals
Data Accuracy
Life Cycle Stages
Quality Improvement
Title Quality assessment of real-world data repositories across the data life cycle: A literature review
URI https://www.ncbi.nlm.nih.gov/pubmed/33496785
https://www.proquest.com/docview/2481105096
Volume 28
WOSCitedRecordID wos000685209200028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3PS8MwFA7qRLz4-8f8RQSvYW2TJq0XGeLwsrGDQm8laV5gMNrpprD_3pe2206C4CWX0lDSl_d9yZd8j5AHjTkvBpBMcGOYgCBmWhYpiyCQViP-WavrYhNqNEqyLB23G27z9ljlKifWidpWhd8j70UiQaTyZiVPsw_mq0Z5dbUtobFNOhypjI9qlW1UBIT2uPZLjRRD3py1Hpu4iPeuQxPdQ7jQXAS_s8saZQaH__2-I3LQ8kvabwLimGxBeUL2hq2CfkpMY5qxpHptyUkrR5E6Tlntnkr9mVHqpQTsufILaaprKKVIFZuH04kDWiyx_0fap9O1MTNt7sGckffBy9vzK2vrLLBCRMECW5yHCYIZJIVUhoNw3r7YS3RWpJEDBUKGibOFSZzkITgbhMCVAp4KUEF0TnbKqoRLQtNQS2M410gMhVOJ1iBdAUaHobXOxF1yvxq9HOPYixO6hOprnm_Gr0suml-QzxrDjZx7V3uVxFd_ePua7Ef-2In3vhQ3pONwFsMt2S2-F5P5510dINiOxsMf5OTIWg
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Quality+assessment+of+real-world+data+repositories+across+the+data+life+cycle%3A+A+literature+review&rft.jtitle=Journal+of+the+American+Medical+Informatics+Association+%3A+JAMIA&rft.au=Liaw%2C+Siaw-Teng&rft.au=Guo%2C+Jason+Guan+Nan&rft.au=Ansari%2C+Sameera&rft.au=Jonnagaddala%2C+Jitendra&rft.date=2021-07-14&rft.issn=1527-974X&rft.eissn=1527-974X&rft.volume=28&rft.issue=7&rft.spage=1591&rft_id=info:doi/10.1093%2Fjamia%2Focaa340&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1527-974X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1527-974X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1527-974X&client=summon