A Machine Learning-Based Computational Methodology for Predicting Acute Respiratory Infections Using Social Media Data

We study the relationship between tweets referencing Acute Respiratory Infections (ARI) or COVID-19 symptoms and confirmed cases of these diseases. Additionally, we propose a computational methodology for selecting and applying Machine Learning (ML) algorithms to predict public health indicators usi...

Full description

Saved in:
Bibliographic Details
Published in:Computation Vol. 13; no. 4; p. 86
Main Authors: Ramos-Varela, Jose Manuel, Cuevas-Tello, Juan C., Noyola, Daniel E.
Format: Journal Article
Language:English
Published: Basel MDPI AG 01.04.2025
Subjects:
ISSN:2079-3197, 2079-3197
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We study the relationship between tweets referencing Acute Respiratory Infections (ARI) or COVID-19 symptoms and confirmed cases of these diseases. Additionally, we propose a computational methodology for selecting and applying Machine Learning (ML) algorithms to predict public health indicators using social media data. To achieve this, a novel pipeline was developed, integrating three distinct models to predict confirmed cases of ARI and COVID-19. The dataset contains tweets related to respiratory diseases, published between 2020 and 2022 in the state of San Luis Potosí, Mexico, obtained via the Twitter API (now X). The methodology is composed of three stages, and it involves tools such as Dataiku and Python with ML libraries. The first two stages focuses on identifying the best-performing predictive models, while the third stage includes Natural Language Processing (NLP) algorithms for tweet selection. One of our key findings is that tweets contributed to improved predictions of ARI confirmed cases but did not enhance COVID-19 time series predictions. The best-performing NLP approach is the combination of Word2Vec algorithm with the KMeans model for tweet selection. Furthermore, predictions for both time series improved by 3% in the second half of 2020 when tweets were included as a feature, where the best prediction algorithm is DeepAR.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2079-3197
2079-3197
DOI:10.3390/computation13040086