Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes.
Saved in:
| Title: | Evaluating geographic imputation approaches for zip code level data: an application to a study of pediatric diabetes. |
|---|---|
| Authors: | Hibbert JD; Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC, USA. hibbert@sc.edu, Liese AD, Lawson A, Porter DE, Puett RC, Standiford D, Liu L, Dabelea D |
| Source: | International journal of health geographics [Int J Health Geogr] 2009 Oct 08; Vol. 8, pp. 54. Date of Electronic Publication: 2009 Oct 08. |
| Publication Type: | Journal Article; Research Support, N.I.H., Extramural; Validation Study |
| Language: | English |
| Journal Info: | Publisher: BioMed Central Country of Publication: England NLM ID: 101152198 Publication Model: Electronic Cited Medium: Internet ISSN: 1476-072X (Electronic) Linking ISSN: 1476072X NLM ISO Abbreviation: Int J Health Geogr Subsets: MEDLINE |
| Imprint Name(s): | Original Publication: London : BioMed Central, [2002]- |
| MeSH Terms: | Data Collection/*standards , Diabetes Mellitus, Type 1/*epidemiology , Diabetes Mellitus, Type 2/*epidemiology , Geographic Information Systems/*standards , Postal Service/*statistics & numerical data, Adolescent ; Chi-Square Distribution ; Child ; Child, Preschool ; Cluster Analysis ; Humans ; Infant ; Infant, Newborn ; Reproducibility of Results ; Stochastic Processes ; United States/epidemiology ; Young Adult |
| Abstract: | Background: There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution). Methods: We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic) and four random (stochastic) allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic. Results: At the individual level, population-weighted (total or under age 20) fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%). The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90). However, significant differences in distributions based on fixed allocation methods were found (p-value < 0.0003). Conclusion: Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies focusing on spatial distribution of disease, random methods seemed superior, as they most closely replicated the true spatial distribution. When selecting an imputation approach, researchers should consider carefully the study aims. |
| References: | Epidemiology. 2003 Jul;14(4):386-91. (PMID: 12843760) Int J Health Geogr. 2008 Jan 23;7:3. (PMID: 18215308) Demography. 2006 May;43(2):383-99. (PMID: 16889134) Am J Prev Med. 2006 Feb;30(2 Suppl):S16-24. (PMID: 16458786) Epidemiology. 2003 Jul;14(4):408-12. (PMID: 12843763) Control Clin Trials. 2004 Oct;25(5):458-71. (PMID: 15465616) Am J Prev Med. 2006 Feb;30(2 Suppl):S77-87. (PMID: 16458794) Am J Public Health. 2002 Jul;92(7):1100-2. (PMID: 12084688) Int J Health Geogr. 2004 Aug 3;3(1):17. (PMID: 15291960) Epidemiology. 2005 Jul;16(4):542-7. (PMID: 15951673) Int J Health Geogr. 2009 Jun 17;8:33. (PMID: 19531266) Int J Health Geogr. 2006 Dec 13;5:58. (PMID: 17166283) Int J Health Geogr. 2003 Dec 19;2(1):10. (PMID: 14687425) Int J Health Geogr. 2008 Nov 26;7:60. (PMID: 19032791) |
| Grant Information: | R01 DK077131 United States DK NIDDK NIH HHS; R01DK077131 United States DK NIDDK NIH HHS |
| Entry Date(s): | Date Created: 20091010 Date Completed: 20100105 Latest Revision: 20240418 |
| Update Code: | 20250114 |
| PubMed Central ID: | PMC2763852 |
| DOI: | 10.1186/1476-072X-8-54 |
| PMID: | 19814809 |
| Database: | MEDLINE |
| Abstract: | Background: There is increasing interest in the study of place effects on health, facilitated in part by geographic information systems. Incomplete or missing address information reduces geocoding success. Several geographic imputation methods have been suggested to overcome this limitation. Accuracy evaluation of these methods can be focused at the level of individuals and at higher group-levels (e.g., spatial distribution).<br />Methods: We evaluated the accuracy of eight geo-imputation methods for address allocation from ZIP codes to census tracts at the individual and group level. The spatial apportioning approaches underlying the imputation methods included four fixed (deterministic) and four random (stochastic) allocation methods using land area, total population, population under age 20, and race/ethnicity as weighting factors. Data included more than 2,000 geocoded cases of diabetes mellitus among youth aged 0-19 in four U.S. regions. The imputed distribution of cases across tracts was compared to the true distribution using a chi-squared statistic.<br />Results: At the individual level, population-weighted (total or under age 20) fixed allocation showed the greatest level of accuracy, with correct census tract assignments averaging 30.01% across all regions, followed by the race/ethnicity-weighted random method (23.83%). The true distribution of cases across census tracts was that 58.2% of tracts exhibited no cases, 26.2% had one case, 9.5% had two cases, and less than 3% had three or more. This distribution was best captured by random allocation methods, with no significant differences (p-value > 0.90). However, significant differences in distributions based on fixed allocation methods were found (p-value < 0.0003).<br />Conclusion: Fixed imputation methods seemed to yield greatest accuracy at the individual level, suggesting use for studies on area-level environmental exposures. Fixed methods result in artificial clusters in single census tracts. For studies focusing on spatial distribution of disease, random methods seemed superior, as they most closely replicated the true spatial distribution. When selecting an imputation approach, researchers should consider carefully the study aims. |
|---|---|
| ISSN: | 1476-072X |
| DOI: | 10.1186/1476-072X-8-54 |
Full Text Finder
Nájsť tento článok vo Web of Science