Administrative data linkage to Census 2021 in Wales, UK: A cross-sectional study examining completeness and representativeness for population analytics

Introduction Measuring population representativeness is an important methodological step in public health and epidemiological studies. Objectives To explore the representativeness of Census 2021 data linkage when compared with the Welsh Demographic Service Dataset (WDSD) within the Secure Anonymised...

Full description

Saved in:
Bibliographic Details
Published in:International journal of population data science Vol. 10; no. 1
Main Authors: Jane Lyons, Rhodri Johnson, Michael Edwards, Samantha Turner, Richard Fry, Lucy J Griffiths, Ronan Lyons
Format: Journal Article
Language:English
Published: Swansea University 01.11.2025
Subjects:
ISSN:2399-4908
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Introduction Measuring population representativeness is an important methodological step in public health and epidemiological studies. Objectives To explore the representativeness of Census 2021 data linkage when compared with the Welsh Demographic Service Dataset (WDSD) within the Secure Anonymised Information Linkage (SAIL) Databank for research on the population of Wales, UK. To understand the characteristics of individuals linked and not linked and which subgroups of the population are disproportionately represented in data linkage population-wide studies. Methods An observational, population-wide cross-sectional comparison study, utilising administrative demographic data and decennial survey data held in SAIL. Two data sources, the WDSD and Census 2021, were used to create and compare two cohorts of the resident population of Wales, UK, on 21st March 2021. The two cohorts were linked to understand how many individuals from Census 2021 can be successfully linked within SAIL, in WDSD and not in Census 2021, and found across both sources. Logistic regression models analysed the variation in the linkability of the survey data within SAIL by various demographic and household characteristics. Results The central analytical cohort contained 2,440,191 individuals present in both data sources. WDSD contained 3,090,976 individuals with 2,965,196 individuals in Census data. With a positively classed outcome indicating non-linkage from WDS to Census the characteristics associated with the highest odds of individuals being registered in WDS but not linked to Census (in SAIL) are male (aOR = 1.28 [95%CI 1.28,1.32]), 75+ years of age (aOR = 1.27 [95%CI 1.25,1.29]), of Asian ethnicity (aOR = 1.27 [95%CI 1.24,1.30]), a more recent migrant (arriving to UK after 2000) (aOR = 1.30 [95%CI 1.28,1.32]), a member of the LGBTQ+ community (aOR = 1.29 [95%CI 1.25,1.29]) or not disclosing LGBTQ+ status (aOR = 1.41 [95%CI 1.39,1.43]), being separated, divorced or widowed (aOR = 1.28 [95%CI 1.27,1.29]), or living in rental accommodation (aOR = 1.47 [95%CI 1.45,1.48]). Conclusions Results show that certain personal characteristics and sub-groups of the population of Wales are disproportionately represented when combining population estimates and utilising Census data in data linkage population-wide studies in SAIL.
ISSN:2399-4908
DOI:10.23889/ijpds.v10i1.2994