Data Harmonization and Data Pooling from Cohort Studies: A Practical Approach for Data Management

Data pooling from pre-existing multiple datasets can be useful to increase study sample size and statistical power to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of population data science Jg. 6; H. 1; S. 1680
Hauptverfasser: Adhikari, Kamala, Patten, Scott B, Patel, Alka B, Premji, Shahirose, Tough, Suzanne, Letourneau, Nicole, Giesbrecht, Gerald, Metcalfe, Amy
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Wales Swansea University 01.01.2021
Schlagworte:
ISSN:2399-4908, 2399-4908
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Data pooling from pre-existing multiple datasets can be useful to increase study sample size and statistical power to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies– the All Our Families and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were synchronized across the datasets considering the frequency of measurement, the timing of measurement, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies. Variable harmonization and pooling provide an opportunity to increase study power and the utility of existing data, permitting researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source.
AbstractList Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies: All Our Families; and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were harmonized or processed under a common format across the datasets considering the frequency of measurement, the timing of measurement, the measurement scale used, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies, which permit researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source.
Data pooling from pre-existing multiple datasets can be useful to increase study sample size and statistical power to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies– the All Our Families and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were synchronized across the datasets considering the frequency of measurement, the timing of measurement, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies. Variable harmonization and pooling provide an opportunity to increase study power and the utility of existing data, permitting researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source.
Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies: All Our Families; and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were harmonized or processed under a common format across the datasets considering the frequency of measurement, the timing of measurement, the measurement scale used, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies, which permit researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source.Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies: All Our Families; and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were harmonized or processed under a common format across the datasets considering the frequency of measurement, the timing of measurement, the measurement scale used, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies, which permit researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source.
Data pooling from pre-existing multiple datasets can be useful to increase study sample size and statistical power to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies– the All Our Families and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were synchronized across the datasets considering the frequency of measurement, the timing of measurement, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies. Variable harmonization and pooling provide an opportunity to increase study power and the utility of existing data, permitting researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source.
Author Adhikari, Kamala
Patten, Scott B
Patel, Alka B
Giesbrecht, Gerald
Tough, Suzanne
Letourneau, Nicole
Metcalfe, Amy
Premji, Shahirose
Author_xml – sequence: 1
  givenname: Kamala
  surname: Adhikari
  fullname: Adhikari, Kamala
– sequence: 2
  givenname: Scott B
  surname: Patten
  fullname: Patten, Scott B
– sequence: 3
  givenname: Alka B
  surname: Patel
  fullname: Patel, Alka B
– sequence: 4
  givenname: Shahirose
  surname: Premji
  fullname: Premji, Shahirose
– sequence: 5
  givenname: Suzanne
  surname: Tough
  fullname: Tough, Suzanne
– sequence: 6
  givenname: Nicole
  surname: Letourneau
  fullname: Letourneau, Nicole
– sequence: 7
  givenname: Gerald
  surname: Giesbrecht
  fullname: Giesbrecht, Gerald
– sequence: 8
  givenname: Amy
  surname: Metcalfe
  fullname: Metcalfe, Amy
BackLink https://www.ncbi.nlm.nih.gov/pubmed/34888420$$D View this record in MEDLINE/PubMed
BookMark eNp1ks1vEzEQxS1UREvpnRPykUuCv9axOSBFaaGViqgEnK1ZfySOdu1gbyrBX9_tplQtEidb4_d-M9a81-go5eQRekvJnHGl9Ie43bk6v5WRzqlU5AU6YVzrmdBEHT25H6OzWreEEEYFW0j6Ch1zoZQSjJwgOIcB8CWUPqf4B4aYE4bk8FS-ybmLaY1DyT1e5U0uA_4-7F309SNe4psCdogWOrzc7UoGu8Ehl4P1KyRY-96n4Q16GaCr_uzhPEU_P1_8WF3Orr99uVotr2dWSDbMpGbeNos2SBla6yl3HiiF1guvrfWuIUHRhW49987aYANQx6ijgiitGtD8FF0duC7D1uxK7KH8NhmimQq5rA2UcdzOm0DaBReOi0a0IgSrg1aCBA4NFZq7dmR9OrB2-7Yf-43fKNA9gz5_SXFj1vnWKMkp13IEvH8AlPxr7-tg-lit7zpIPu-rYZKohgvGxCh997TXY5O_OxoF5CCwJddafHiUUGKmJJgpCeY-CeY-CaNF_mOxcZiWO04bu_8b7wBGa7yG
CitedBy_id crossref_primary_10_1080_19313152_2023_2289290
crossref_primary_10_26633_RPSP_2025_54
crossref_primary_10_3389_fdgth_2024_1329630
crossref_primary_10_1007_s10803_023_06204_2
crossref_primary_10_2196_75608
crossref_primary_10_1289_EHP12901
crossref_primary_10_1007_s40200_024_01491_7
crossref_primary_10_1080_00273171_2023_2229310
crossref_primary_10_1186_s12911_022_02093_0
crossref_primary_10_2196_67047
crossref_primary_10_3389_froh_2025_1592428
crossref_primary_10_1038_s41597_024_02956_3
crossref_primary_10_1002_cl2_70056
ContentType Journal Article
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
DOA
DOI 10.23889/ijpds.v6i1.1680
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
Directory of Open Access Journals (DOAJ)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
CrossRef
MEDLINE - Academic


Database_xml – sequence: 1
  dbid: DOA
  name: Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Economics
EISSN 2399-4908
ExternalDocumentID oai_doaj_org_article_f0b734d3454b4ffc9f9840f3a51493db
PMC8631396
34888420
10_23889_ijpds_v6i1_1680
Genre Journal Article
GeographicLocations Alberta
GeographicLocations_xml – name: Alberta
GroupedDBID AAFWJ
AAYXX
ADBBV
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
GROUPED_DOAJ
M~E
OK1
RPM
CGR
CUY
CVF
ECM
EIF
NPM
7X8
5PM
ID FETCH-LOGICAL-c462t-692ec57bf66fbce13dea11abe4e9cced50f8179be3edccfcfa1d21d1408985a93
IEDL.DBID DOA
ISICitedReferencesCount 38
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000894823500035&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2399-4908
IngestDate Fri Oct 03 12:52:14 EDT 2025
Tue Sep 30 16:57:13 EDT 2025
Wed Oct 01 13:52:10 EDT 2025
Thu Jan 02 22:45:23 EST 2025
Sat Nov 29 06:19:42 EST 2025
Tue Nov 18 22:34:35 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords cohort studies
data pooling or combination
comparable dataset
data harmonization
harmonization strategies
Language English
License http://creativecommons.org/licenses/by/4.0
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c462t-692ec57bf66fbce13dea11abe4e9cced50f8179be3edccfcfa1d21d1408985a93
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Competing interests: The authors declare that they have no competing interests.
OpenAccessLink https://doaj.org/article/f0b734d3454b4ffc9f9840f3a51493db
PMID 34888420
PQID 2608534224
PQPubID 23479
ParticipantIDs doaj_primary_oai_doaj_org_article_f0b734d3454b4ffc9f9840f3a51493db
pubmedcentral_primary_oai_pubmedcentral_nih_gov_8631396
proquest_miscellaneous_2608534224
pubmed_primary_34888420
crossref_primary_10_23889_ijpds_v6i1_1680
crossref_citationtrail_10_23889_ijpds_v6i1_1680
PublicationCentury 2000
PublicationDate 2021-01-01
PublicationDateYYYYMMDD 2021-01-01
PublicationDate_xml – month: 01
  year: 2021
  text: 2021-01-01
  day: 01
PublicationDecade 2020
PublicationPlace Wales
PublicationPlace_xml – name: Wales
PublicationTitle International journal of population data science
PublicationTitleAlternate Int J Popul Data Sci
PublicationYear 2021
Publisher Swansea University
Publisher_xml – name: Swansea University
SSID ssj0002142761
Score 2.3643029
Snippet Data pooling from pre-existing multiple datasets can be useful to increase study sample size and statistical power to answer a research question. However,...
Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However,...
SourceID doaj
pubmedcentral
proquest
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage 1680
SubjectTerms Alberta
Cohort Studies
Data Collection
Data Management
Female
Humans
Population Data Science
Pregnancy
Sample Size
Title Data Harmonization and Data Pooling from Cohort Studies: A Practical Approach for Data Management
URI https://www.ncbi.nlm.nih.gov/pubmed/34888420
https://www.proquest.com/docview/2608534224
https://pubmed.ncbi.nlm.nih.gov/PMC8631396
https://doaj.org/article/f0b734d3454b4ffc9f9840f3a51493db
Volume 6
WOSCitedRecordID wos000894823500035&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: Directory of Open Access Journals
  customDbUrl:
  eissn: 2399-4908
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002142761
  issn: 2399-4908
  databaseCode: DOA
  dateStart: 20170101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2399-4908
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002142761
  issn: 2399-4908
  databaseCode: M~E
  dateStart: 20170101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT9wwEB5RVKm9VNDn8pIr9cIhbBI_YnMDCuIC4lCkvVl-lq2qLNpdOPa3dxxnl90KtZdefHBsZzIee2bi8TcAX7hU3Esvi1hiwSyTuOa8KFQdrCt9bao6J5torq_laKRuVlJ9pZiwDA-cGTeMpW0o85RxHChGp6JCnyRSg5peUW_T7otWz4ozlfbgBCSGDno-l0StJNVw_OPez44exbg6qkRCgVzRQx1c_3M25p-hkiu652IL3vRGIznJxG7DRmjfwqvFneLZOzBfzdyQSzNFoeovVhLTetJV30xSYp7vJN0kIWeTOzS4SR8-eExOSEYscmn8Hl-coCGbuz4Fx7yH24vzb2eXRZ88oXBM1PNCILcdb2wUIloXKuqDqSpjAwvKueB5GSUuRhsofpyLLprK15VHf0sqyY2iH2CznbThExAfEmSU56YMIWUqttzUlPvIbJCo3sMAhgtWatcji6cEFz81ehgd83XHfJ2YrxPzB3C47HGfUTX-0vY0zc6yXcLD7ipQSnQvJfpfUjKAz4u51bh-0qGIacPkYabRn0OLhaElM4CPea6Xr6K4u0lWIwnNmhSs0bL-pB3fdRjdUlC0rcXO_yB-F17XKZKm-_GzB5vz6UPYh5fucT6eTQ_gRTOSB534Y3n16_w3myoOrA
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+Harmonization+and+Data+Pooling+from+Cohort+Studies%3A+A+Practical+Approach+for+Data+Management&rft.jtitle=International+journal+of+population+data+science&rft.au=Adhikari%2C+Kamala&rft.au=Patten%2C+Scott+B&rft.au=Patel%2C+Alka+B&rft.au=Premji%2C+Shahirose&rft.date=2021-01-01&rft.issn=2399-4908&rft.eissn=2399-4908&rft.volume=6&rft.issue=1&rft_id=info:doi/10.23889%2Fijpds.v6i1.1680&rft.externalDBID=n%2Fa&rft.externalDocID=10_23889_ijpds_v6i1_1680
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2399-4908&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2399-4908&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2399-4908&client=summon