Data Harmonization and Data Pooling from Cohort Studies: A Practical Approach for Data Management
Data pooling from pre-existing multiple datasets can be useful to increase study sample size and statistical power to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization...
Gespeichert in:
| Veröffentlicht in: | International journal of population data science Jg. 6; H. 1; S. 1680 |
|---|---|
| Hauptverfasser: | , , , , , , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Wales
Swansea University
01.01.2021
|
| Schlagworte: | |
| ISSN: | 2399-4908, 2399-4908 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Data pooling from pre-existing multiple datasets can be useful to increase study sample size and statistical power to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies– the All Our Families and the Alberta Pregnancy Outcomes and Nutrition.
Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were synchronized across the datasets considering the frequency of measurement, the timing of measurement, and response options. Variables that were completely unmatching could not be harmonized into a single variable.
The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies. Variable harmonization and pooling provide an opportunity to increase study power and the utility of existing data, permitting researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source. |
|---|---|
| AbstractList | Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies: All Our Families; and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were harmonized or processed under a common format across the datasets considering the frequency of measurement, the timing of measurement, the measurement scale used, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies, which permit researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source. Data pooling from pre-existing multiple datasets can be useful to increase study sample size and statistical power to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies– the All Our Families and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were synchronized across the datasets considering the frequency of measurement, the timing of measurement, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies. Variable harmonization and pooling provide an opportunity to increase study power and the utility of existing data, permitting researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source. Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies: All Our Families; and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were harmonized or processed under a common format across the datasets considering the frequency of measurement, the timing of measurement, the measurement scale used, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies, which permit researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source.Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies: All Our Families; and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were harmonized or processed under a common format across the datasets considering the frequency of measurement, the timing of measurement, the measurement scale used, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies, which permit researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source. Data pooling from pre-existing multiple datasets can be useful to increase study sample size and statistical power to answer a research question. However, individual datasets may contain variables that measure the same construct differently, posing challenges for data pooling. Variable harmonization, an approach that can generate comparable datasets from heterogeneous sources, can address this issue in some circumstances. As an illustrative example, this paper describes the data harmonization strategies that helped generate comparable datasets across two Canadian pregnancy cohort studies– the All Our Families and the Alberta Pregnancy Outcomes and Nutrition. Variables were harmonized considering multiple features across the datasets: the construct measured; question asked/response options; the measurement scale used; the frequency of measurement; timing of measurement, and the data structure. Completely matching, partially matching, and completely un-matching variables across the datasets were determined based on these features. Variables that were an exact match were pooled as is. Partially matching variables were synchronized across the datasets considering the frequency of measurement, the timing of measurement, and response options. Variables that were completely unmatching could not be harmonized into a single variable. The variable harmonization strategies that were used to generate comparable cohort datasets for data pooling are applicable to other data sources. Future studies may employ or evaluate these strategies. Variable harmonization and pooling provide an opportunity to increase study power and the utility of existing data, permitting researchers to answer novel research questions in a statistically efficient, timely, and cost-efficient manner that could not be achieved using a single data source. |
| Author | Adhikari, Kamala Patten, Scott B Patel, Alka B Giesbrecht, Gerald Tough, Suzanne Letourneau, Nicole Metcalfe, Amy Premji, Shahirose |
| Author_xml | – sequence: 1 givenname: Kamala surname: Adhikari fullname: Adhikari, Kamala – sequence: 2 givenname: Scott B surname: Patten fullname: Patten, Scott B – sequence: 3 givenname: Alka B surname: Patel fullname: Patel, Alka B – sequence: 4 givenname: Shahirose surname: Premji fullname: Premji, Shahirose – sequence: 5 givenname: Suzanne surname: Tough fullname: Tough, Suzanne – sequence: 6 givenname: Nicole surname: Letourneau fullname: Letourneau, Nicole – sequence: 7 givenname: Gerald surname: Giesbrecht fullname: Giesbrecht, Gerald – sequence: 8 givenname: Amy surname: Metcalfe fullname: Metcalfe, Amy |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/34888420$$D View this record in MEDLINE/PubMed |
| BookMark | eNp1ks1vEzEQxS1UREvpnRPykUuCv9axOSBFaaGViqgEnK1ZfySOdu1gbyrBX9_tplQtEidb4_d-M9a81-go5eQRekvJnHGl9Ie43bk6v5WRzqlU5AU6YVzrmdBEHT25H6OzWreEEEYFW0j6Ch1zoZQSjJwgOIcB8CWUPqf4B4aYE4bk8FS-ybmLaY1DyT1e5U0uA_4-7F309SNe4psCdogWOrzc7UoGu8Ehl4P1KyRY-96n4Q16GaCr_uzhPEU_P1_8WF3Orr99uVotr2dWSDbMpGbeNos2SBla6yl3HiiF1guvrfWuIUHRhW49987aYANQx6ijgiitGtD8FF0duC7D1uxK7KH8NhmimQq5rA2UcdzOm0DaBReOi0a0IgSrg1aCBA4NFZq7dmR9OrB2-7Yf-43fKNA9gz5_SXFj1vnWKMkp13IEvH8AlPxr7-tg-lit7zpIPu-rYZKohgvGxCh997TXY5O_OxoF5CCwJddafHiUUGKmJJgpCeY-CeY-CaNF_mOxcZiWO04bu_8b7wBGa7yG |
| CitedBy_id | crossref_primary_10_1080_19313152_2023_2289290 crossref_primary_10_26633_RPSP_2025_54 crossref_primary_10_3389_fdgth_2024_1329630 crossref_primary_10_1007_s10803_023_06204_2 crossref_primary_10_2196_75608 crossref_primary_10_1289_EHP12901 crossref_primary_10_1007_s40200_024_01491_7 crossref_primary_10_1080_00273171_2023_2229310 crossref_primary_10_1186_s12911_022_02093_0 crossref_primary_10_2196_67047 crossref_primary_10_3389_froh_2025_1592428 crossref_primary_10_1038_s41597_024_02956_3 crossref_primary_10_1002_cl2_70056 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7X8 5PM DOA |
| DOI | 10.23889/ijpds.v6i1.1680 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE CrossRef MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Economics |
| EISSN | 2399-4908 |
| ExternalDocumentID | oai_doaj_org_article_f0b734d3454b4ffc9f9840f3a51493db PMC8631396 34888420 10_23889_ijpds_v6i1_1680 |
| Genre | Journal Article |
| GeographicLocations | Alberta |
| GeographicLocations_xml | – name: Alberta |
| GroupedDBID | AAFWJ AAYXX ADBBV AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION GROUPED_DOAJ M~E OK1 RPM CGR CUY CVF ECM EIF NPM 7X8 5PM |
| ID | FETCH-LOGICAL-c462t-692ec57bf66fbce13dea11abe4e9cced50f8179be3edccfcfa1d21d1408985a93 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 38 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000894823500035&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2399-4908 |
| IngestDate | Fri Oct 03 12:52:14 EDT 2025 Tue Sep 30 16:57:13 EDT 2025 Wed Oct 01 13:52:10 EDT 2025 Thu Jan 02 22:45:23 EST 2025 Sat Nov 29 06:19:42 EST 2025 Tue Nov 18 22:34:35 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Keywords | cohort studies data pooling or combination comparable dataset data harmonization harmonization strategies |
| Language | English |
| License | http://creativecommons.org/licenses/by/4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c462t-692ec57bf66fbce13dea11abe4e9cced50f8179be3edccfcfa1d21d1408985a93 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing interests: The authors declare that they have no competing interests. |
| OpenAccessLink | https://doaj.org/article/f0b734d3454b4ffc9f9840f3a51493db |
| PMID | 34888420 |
| PQID | 2608534224 |
| PQPubID | 23479 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_f0b734d3454b4ffc9f9840f3a51493db pubmedcentral_primary_oai_pubmedcentral_nih_gov_8631396 proquest_miscellaneous_2608534224 pubmed_primary_34888420 crossref_primary_10_23889_ijpds_v6i1_1680 crossref_citationtrail_10_23889_ijpds_v6i1_1680 |
| PublicationCentury | 2000 |
| PublicationDate | 2021-01-01 |
| PublicationDateYYYYMMDD | 2021-01-01 |
| PublicationDate_xml | – month: 01 year: 2021 text: 2021-01-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Wales |
| PublicationPlace_xml | – name: Wales |
| PublicationTitle | International journal of population data science |
| PublicationTitleAlternate | Int J Popul Data Sci |
| PublicationYear | 2021 |
| Publisher | Swansea University |
| Publisher_xml | – name: Swansea University |
| SSID | ssj0002142761 |
| Score | 2.3643029 |
| Snippet | Data pooling from pre-existing multiple datasets can be useful to increase study sample size and statistical power to answer a research question. However,... Data pooling from pre-existing datasets can be useful to increase study sample size and statistical power in order to answer a research question. However,... |
| SourceID | doaj pubmedcentral proquest pubmed crossref |
| SourceType | Open Website Open Access Repository Aggregation Database Index Database Enrichment Source |
| StartPage | 1680 |
| SubjectTerms | Alberta Cohort Studies Data Collection Data Management Female Humans Population Data Science Pregnancy Sample Size |
| Title | Data Harmonization and Data Pooling from Cohort Studies: A Practical Approach for Data Management |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/34888420 https://www.proquest.com/docview/2608534224 https://pubmed.ncbi.nlm.nih.gov/PMC8631396 https://doaj.org/article/f0b734d3454b4ffc9f9840f3a51493db |
| Volume | 6 |
| WOSCitedRecordID | wos000894823500035&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2399-4908 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002142761 issn: 2399-4908 databaseCode: DOA dateStart: 20170101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2399-4908 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002142761 issn: 2399-4908 databaseCode: M~E dateStart: 20170101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT9wwELYoqtReKtpSun0gI_XCISyOH7G5UQriAuIA0t4sP8ZlqyqLdheO_e0dO9ntLqrKpZccHCcZz4zHM_H4G0K-gPbcA85vLQEqAUJVhidWgWlkUMw5KWMpNtFcXurRyFytlPrKOWEdPHDHuGE69A0XkQspvEgpmGQwJknc4UpvePTZ-qLXsxJMZRucgcQwQO_2JXFV0mY4_nEXZwcPaswOmMookCvrUIHr_5uP-ThVcmXtOdsir3qnkR53xL4mG9C-IS8WZ4pnb4n75uaOnrspKlV_sJK6NtLSfDXJhXm-03yShJ5MbtHhpn364BE9ph1iUcjv7_HFKTqy3aN_kmO2yc3Z6fXJedUXT6iCUPW8UqaGIBuflEo-AOMRHGPOgwATAkR5mDRORg8cBxdSSI7FmkWMt7TR0hn-jmy2kxbeE2qErKGW3CSNIuDcCzQCIqAtME7WRg_IcMFKG3pk8Vzg4qfFCKMw3xbm28x8m5k_IPvLJ-46VI1_9P2apbPsl_GwSwNqie21xD6lJQOyt5CtxfmTN0VcC5P7mcV4Dj0WHJEYkJ1O1stPcbRuWtRIQrOmBWu0rN9px7cFo1srjr61-vA_iP9IXtY5k6b8-PlENufTe_hMnoeH-Xg23SXPmpHeLeqP14tfp78BF9UN9A |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+harmonization+and+data+pooling+from+cohort+studies%3A+a+practical+approach+for+data+management&rft.jtitle=International+journal+of+population+data+science&rft.au=Adhikari%2C+Kamala&rft.au=Patten%2C+Scott+B&rft.au=Patel%2C+Alka+B&rft.au=Premji%2C+Shahirose&rft.date=2021-01-01&rft.eissn=2399-4908&rft.volume=6&rft.issue=1&rft.spage=1680&rft_id=info:doi/10.23889%2Fijpds.v6i1.1680&rft_id=info%3Apmid%2F34888420&rft.externalDocID=34888420 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2399-4908&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2399-4908&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2399-4908&client=summon |