Quality assessment of real-world data repositories across the data life cycle: A literature review
Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frame...
Gespeichert in:
| Veröffentlicht in: | Journal of the American Medical Informatics Association : JAMIA Jg. 28; H. 7; S. 1591 |
|---|---|
| Hauptverfasser: | , , , , , , , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
England
14.07.2021
|
| Schlagworte: | |
| ISSN: | 1527-974X, 1527-974X |
| Online-Zugang: | Weitere Angaben |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle.
The review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached.
The 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found.
A DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation. |
|---|---|
| AbstractList | Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle.OBJECTIVEData quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle.The review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached.MATERIALS AND METHODSThe review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached.The 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found.RESULTSThe 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found.A DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation.CONCLUSIONSA DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation. Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized for quality improvement across the data production and curation life cycle. We sought to complete a literature review on DQ assessment frameworks, indicators and tools for research, public health, service, and quality improvement across the data life cycle. The review followed PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Databases from health, physical and social sciences were used: Cinahl, Embase, Scopus, ProQuest, Emcare, PsycINFO, Compendex, and Inspec. Embase was used instead of PubMed (an interface to search MEDLINE) because it includes all MeSH (Medical Subject Headings) terms used and journals in MEDLINE as well as additional unique journals and conference abstracts. A combined data life cycle and quality framework guided the search of published and gray literature for DQ frameworks, indicators, and tools. At least 2 authors independently identified articles for inclusion and extracted and categorized DQ concepts and constructs. All authors discussed findings iteratively until consensus was reached. The 120 included articles yielded concepts related to contextual (data source, custodian, and user) and technical (interoperability) factors across the data life cycle. Contextual DQ subcategories included relevance, usability, accessibility, timeliness, and trust. Well-tested computable DQ indicators and assessment tools were also found. A DQ assessment framework that covers intrinsic, technical, and contextual categories across the data life cycle enables assessment and management of RWD repositories to ensure fitness for purpose. Balancing security, privacy, and FAIR principles requires trust and reciprocity, transparent governance, and organizational cultures that value good documentation. |
| Author | Ansari, Sameera Bennett, Vicki Kahn, Michael G Guo, Jason Guan Nan Liaw, Siaw-Teng Borelli, Alder Jose Liyanage, Harshana Chan, Jaclyn Jonnagaddala, Jitendra de Lusignan, Simon Bhattal, Navreet Godinho, Myron Anthony Capurro, Daniel |
| Author_xml | – sequence: 1 givenname: Siaw-Teng orcidid: 0000-0001-5989-3614 surname: Liaw fullname: Liaw, Siaw-Teng organization: WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia – sequence: 2 givenname: Jason Guan Nan surname: Guo fullname: Guo, Jason Guan Nan organization: WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia – sequence: 3 givenname: Sameera surname: Ansari fullname: Ansari, Sameera organization: WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia – sequence: 4 givenname: Jitendra orcidid: 0000-0002-9912-2344 surname: Jonnagaddala fullname: Jonnagaddala, Jitendra organization: WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia – sequence: 5 givenname: Myron Anthony orcidid: 0000-0002-0081-2506 surname: Godinho fullname: Godinho, Myron Anthony organization: WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia – sequence: 6 givenname: Alder Jose surname: Borelli fullname: Borelli, Alder Jose organization: WHO Collaborating Centre on eHealth, School of Population Health, Faculty of Medicine, UNSW Sydney, Sydney, New South Wales, Australia – sequence: 7 givenname: Simon orcidid: 0000-0002-8553-2641 surname: de Lusignan fullname: de Lusignan, Simon organization: Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, United Kingdom – sequence: 8 givenname: Daniel surname: Capurro fullname: Capurro, Daniel organization: Faculty of Engineering and Information Technology, University of Melbourne, Melbourne, Victoria, Australia – sequence: 9 givenname: Harshana surname: Liyanage fullname: Liyanage, Harshana organization: Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, United Kingdom – sequence: 10 givenname: Navreet surname: Bhattal fullname: Bhattal, Navreet organization: Australian Institute of Health and Welfare, Canberra, Australian Capital Territory, Australia – sequence: 11 givenname: Vicki surname: Bennett fullname: Bennett, Vicki organization: Australian Institute of Health and Welfare, Canberra, Australian Capital Territory, Australia – sequence: 12 givenname: Jaclyn surname: Chan fullname: Chan, Jaclyn organization: Australian Institute of Health and Welfare, Canberra, Australian Capital Territory, Australia – sequence: 13 givenname: Michael G orcidid: 0000-0003-4786-6875 surname: Kahn fullname: Kahn, Michael G organization: Department of Pediatrics (Section of Informatics and Data Sciences), University of Colorado Anschutz Medical Campus, Denver, Colorado, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/33496785$$D View this record in MEDLINE/PubMed |
| BookMark | eNpNkM1LxDAQxYOsuB969Co5eqmbNGnSeFsWv2BBBAVvJU0n2CVt1iR12f_ewq7gZWZ-zHvv8OZo0vseELqm5I4SxZZb3bV66Y3WjJMzNKNFLjMl-efk3z1F8xi3hFCRs-ICTRnjSsiymKH6bdCuTQesY4QYO-gT9hYH0C7b--Aa3OikR9752CYfWohYm-BjxOkLjk_XWsDmYBzc49VICYJOQ4DR9dPC_hKdW-0iXJ32An08Pryvn7PN69PLerXJDM9JGiejqqSMQGmErBlwKxQRUpS84Sq3IIELWtrG1KUVjIJtCAUmJTDFQZJ8gW6PubvgvweIqeraaMA53YMfYpXzklJSECVG6c1JOtQdNNUutJ0Oh-qvl_wXJJFoQQ |
| CitedBy_id | crossref_primary_10_1007_s10742_023_00319_w crossref_primary_10_1016_j_csbj_2023_10_006 crossref_primary_10_12677_mse_2025_142050 crossref_primary_10_2196_51560 crossref_primary_10_1016_j_conctc_2024_101354 crossref_primary_10_1136_bmjinnov_2021_000903 crossref_primary_10_1016_j_ijmedinf_2025_105814 crossref_primary_10_1016_j_jbi_2022_104110 crossref_primary_10_1080_20479700_2023_2195197 crossref_primary_10_1016_j_ccc_2023_03_002 crossref_primary_10_1186_s12911_022_01961_z crossref_primary_10_1093_jamiaopen_ooae062 crossref_primary_10_1177_20539517211019430 crossref_primary_10_1016_j_jval_2024_01_019 crossref_primary_10_1093_jamiaopen_ooae044 crossref_primary_10_1053_j_semvascsurg_2021_10_005 crossref_primary_10_1016_j_jbi_2021_103715 crossref_primary_10_1186_s12911_024_02818_3 crossref_primary_10_2147_CMAR_S441359 crossref_primary_10_1093_jamia_ocac228 crossref_primary_10_1186_s12911_024_02644_7 crossref_primary_10_3390_app12094238 crossref_primary_10_1007_s10926_024_10196_w crossref_primary_10_1055_s_0042_1760238 crossref_primary_10_1097_NR9_0000000000000077 crossref_primary_10_2196_60244 crossref_primary_10_3389_fphar_2022_988974 crossref_primary_10_2196_45948 crossref_primary_10_1007_s10926_024_10175_1 crossref_primary_10_1016_j_jclinepi_2024_111545 crossref_primary_10_1007_s40615_025_02485_8 crossref_primary_10_1186_s12911_021_01524_8 crossref_primary_10_2196_57615 crossref_primary_10_1007_s00296_023_05354_x crossref_primary_10_1007_s10660_025_09973_3 crossref_primary_10_1016_j_ijmedinf_2021_104470 crossref_primary_10_1016_j_ijmedinf_2023_105262 crossref_primary_10_1186_s12911_021_01643_2 crossref_primary_10_2196_60709 crossref_primary_10_32604_cmc_2023_031491 crossref_primary_10_1093_jamiaopen_ooae058 crossref_primary_10_3390_cells11132112 crossref_primary_10_1007_s11414_023_09875_y |
| ContentType | Journal Article |
| Copyright | The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com. |
| Copyright_xml | – notice: The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com. |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1093/jamia/ocaa340 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Medicine |
| EISSN | 1527-974X |
| ExternalDocumentID | 33496785 |
| Genre | Journal Article Review Research Support, N.I.H., Extramural |
| GroupedDBID | --- .DC 0R~ 18M 29L 2WC 4.4 48X 53G 5GY 5RE 5WD 6PF 7~T AABZA AACZT AAMVS AAOGV AAPQZ AAPXW AARHZ AAUAY AAVAP AAWTL ABDFA ABEJV ABEUO ABGNP ABIXL ABJNI ABNHQ ABOCM ABPTD ABQLI ABQNK ABVGC ABWST ABXVV ACGFO ACGFS ACGOD ACHQT ACUFI ACUTJ ACYHN ADBBV ADGZP ADHKW ADHZD ADIPN ADJQC ADQBN ADRIX ADRTK ADVEK ADYVW AEGPL AEJOX AEKSI AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFIYH AFOFC AFXEN AGINJ AGQXC AGSYK AGUTN AHMBA AHMMS AJEEA ALIPV ALMA_UNASSIGNED_HOLDINGS ALUQC APIBT ATGXG AVWKF AXUDD AYCSE BAWUL BAYMD BCRHZ BEYMZ BHONS BTRTY BVRKM C45 CDBKE CGR CS3 CUY CVF DAKXR DIK DILTD DU5 E3Z EBD EBS ECM EIF EMOBN ENERS F5P FDB FECEO FLUFQ FOEOM FOTVD FQBLK G-Q GAUVT GJXCC GX1 H13 HAR IH2 IHE J21 KBUDW KOP KSI KSN LSO MHKGH NOMLY NOYVH NPM NQ- O9- OAUYM OAWHX OCZFY ODMLO OJQWA OJZSN OK1 OPAEJ OVD OWPYF P2P PAFKI PEELM Q5Y ROX ROZ RPM RPZ RUSNO RWL RXO SV3 TAE TEORI TJX TMA WOW YAYTL YKOAZ YXANX ~S- 77I 7X8 ABPQP ADNBA AEMQT AFXAL AFYAG AHGBF AJBYB AJNCP ALXQX JXSIZ |
| ID | FETCH-LOGICAL-c420t-c43198130e8c67b3e4f69067684d492fe7e4618fdcb8f631efd01e377e394e702 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 50 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000685209200028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1527-974X |
| IngestDate | Sun Sep 28 03:02:26 EDT 2025 Wed Feb 19 02:27:18 EST 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 7 |
| Keywords | data quality DQ assessment tools data stewardship DQ indicators DQ measures data custodianship literature review |
| Language | English |
| License | The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c420t-c43198130e8c67b3e4f69067684d492fe7e4618fdcb8f631efd01e377e394e702 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 ObjectType-Review-3 content type line 23 |
| ORCID | 0000-0002-0081-2506 0000-0003-4786-6875 0000-0001-5989-3614 0000-0002-8553-2641 0000-0002-9912-2344 |
| OpenAccessLink | https://www.ncbi.nlm.nih.gov/pmc/articles/8475229 |
| PMID | 33496785 |
| PQID | 2481105096 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_2481105096 pubmed_primary_33496785 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-07-14 |
| PublicationDateYYYYMMDD | 2021-07-14 |
| PublicationDate_xml | – month: 07 year: 2021 text: 2021-07-14 day: 14 |
| PublicationDecade | 2020 |
| PublicationPlace | England |
| PublicationPlace_xml | – name: England |
| PublicationTitle | Journal of the American Medical Informatics Association : JAMIA |
| PublicationTitleAlternate | J Am Med Inform Assoc |
| PublicationYear | 2021 |
| SSID | ssj0016235 |
| Score | 2.5733886 |
| SecondaryResourceType | review_article |
| Snippet | Data quality (DQ) must be consistently defined in context. The attributes, metadata, and context of longitudinal real-world data (RWD) have not been formalized... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 1591 |
| SubjectTerms | Animals Data Accuracy Life Cycle Stages Quality Improvement |
| Title | Quality assessment of real-world data repositories across the data life cycle: A literature review |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/33496785 https://www.proquest.com/docview/2481105096 |
| Volume | 28 |
| WOSCitedRecordID | wos000685209200028&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3PS8MwFA7qRLz4-8f8RQSvYW2TJq0XGeLwsrGDQm8laV5gMNrpprD_3pe2206C4CWX0lDSl_d9yZd8j5AHjTkvBpBMcGOYgCBmWhYpiyCQViP-WavrYhNqNEqyLB23G27z9ljlKifWidpWhd8j70UiQaTyZiVPsw_mq0Z5dbUtobFNOhypjI9qlW1UBIT2uPZLjRRD3py1Hpu4iPeuQxPdQ7jQXAS_s8saZQaH__2-I3LQ8kvabwLimGxBeUL2hq2CfkpMY5qxpHptyUkrR5E6Tlntnkr9mVHqpQTsufILaaprKKVIFZuH04kDWiyx_0fap9O1MTNt7sGckffBy9vzK2vrLLBCRMECW5yHCYIZJIVUhoNw3r7YS3RWpJEDBUKGibOFSZzkITgbhMCVAp4KUEF0TnbKqoRLQtNQS2M410gMhVOJ1iBdAUaHobXOxF1yvxq9HOPYixO6hOprnm_Gr0suml-QzxrDjZx7V3uVxFd_ePua7Ef-2In3vhQ3pONwFsMt2S2-F5P5510dINiOxsMf5OTIWg |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Quality+assessment+of+real-world+data+repositories+across+the+data+life+cycle%3A+A+literature+review&rft.jtitle=Journal+of+the+American+Medical+Informatics+Association+%3A+JAMIA&rft.au=Liaw%2C+Siaw-Teng&rft.au=Guo%2C+Jason+Guan+Nan&rft.au=Ansari%2C+Sameera&rft.au=Jonnagaddala%2C+Jitendra&rft.date=2021-07-14&rft.issn=1527-974X&rft.eissn=1527-974X&rft.volume=28&rft.issue=7&rft.spage=1591&rft_id=info:doi/10.1093%2Fjamia%2Focaa340&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1527-974X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1527-974X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1527-974X&client=summon |