Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda

Effective disease monitoring provides a foundation for effective public health systems. This has historically been accomplished with patient contact and bureaucratic aggregation, which tends to be slow and expensive. Recent internet-based approaches promise to be real-time and cheap, with few parame...

Full description

Saved in:
Bibliographic Details
Published in:CSCW : proceedings of the Conference on Computer-Supported Cooperative Work. Conference on Computer-Supported Cooperative Work Vol. 2017; p. 1812
Main Authors: Priedhorsky, Reid, Osthus, Dave, Daughton, Ashlynn R, Moran, Kelly R, Generous, Nicholas, Fairchild, Geoffrey, Deshpande, Alina, Del Valle, Sara Y
Format: Journal Article
Language:English
Published: United States 01.01.2017
Subjects:
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Effective disease monitoring provides a foundation for effective public health systems. This has historically been accomplished with patient contact and bureaucratic aggregation, which tends to be slow and expensive. Recent internet-based approaches promise to be real-time and cheap, with few parameters. However, the question of these approaches work remains open. We addressed this question using Wikipedia access logs and category links. Our experiments, replicable and extensible using our open source code and data, test the effect of semantic article filtering, amount of training data, forecast horizon, and model staleness by comparing across 6 diseases and 4 countries using thousands of individual models. We found that our minimal-configuration, language-agnostic article selection process based on semantic relatedness is effective for improving predictions, and that our approach is relatively insensitive to the amount and age of training data. We also found, in contrast to prior work, very little forecasting value, and we argue that this is consistent with theoretical considerations about the nature of forecasting. These mixed results lead us to propose that the currently observational field of internet-based disease surveillance must pivot to include theoretical models of information flow as well as controlled experiments based on simulations of disease.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
DOI:10.1145/2998181.2998183