The Technological Bridge: R Programming’s Utility in Converting Social Media Data for Quantitative Financial Analysis

Saved in:
Bibliographic Details
Title: The Technological Bridge: R Programming’s Utility in Converting Social Media Data for Quantitative Financial Analysis
Authors: Alexey Litvinenko, Saarinen Samuli, Anna Litvinenko
Source: Economics and Culture, Vol 22, Iss 1, Pp 70-80 (2025)
Publisher Information: Walter de Gruyter GmbH, 2025.
Publication Year: 2025
Subject Terms: r programming, Economics as a science, econometric analysis, HF5001-6182, capm, g14, c58, Business, price non-synchronization, c87, social media data, HB71-74
Description: Research purpose. This study explores whether R programming can transform unstructured qualitative social media data into a quantitative format suitable for econometric modelling. It specifically examines how elements such as text, emojis, and sentiment from Reddit and X (formerly Twitter) can be converted into variables for regression analysis. With the aim to enhance the predictive power of traditional financial models using alternative data sources, the paper outlines comprehensive guidelines with specific technical steps, from scripting an API to extracting data from Reddit and X, through cleaning and tokenising to incorporating the data into regression models using R programming. The study addresses the growing need in financial economics to incorporate alternative data streams by offering a structured, replicable process for transforming high-volume, unstructured online content into statistically valid variables, thereby bridging the gap between qualitative market sentiment and quantitative modelling. Design / Methodology / Approach. Focusing on the methodology and R scripts, this research adopts a quantitative approach, transforming qualitative social media data into a format suitable for multiple linear and instrumental variable regression models to assess the effect of social media signals on asset prices, with GameStop (GME) and Best Buy (BBY) as case studies. The process ensures reproducibility and includes open-source code, enhancing transparency and applicability for both academic and professional financial data analysis contexts. Findings. The findings demonstrate that qualitative social media data can be quantified for financial analysis. It was effectively extracted, cleaned, and used for regression analysis. Results show that traditional market indicators fail to explain GME’s price shifts, while the frequency of rocket emojis (interpreted as speculative sentiment) was statistically significant. BBY’s returns, however, aligned more closely with market and industry indices, suggesting a lower influence of private sentiment. Originality / Value / Practical implications. The research provides a replicable method for integrating social media data into econometric models, contributing new tools for analysing market sentiment. By adapting classical financial models to modern data sources, the paper opens new directions for asset pricing research. The paper provides technical tools created in R for use in econometric analysis, useful both for academics and practitioners.
Document Type: Article
Language: English
ISSN: 2256-0173
DOI: 10.2478/jec-2025-0006
Access URL: https://doaj.org/article/f414f6e35be345379ddbf149dfabef22
Rights: CC BY NC ND
Accession Number: edsair.doi.dedup.....ec3e98429c05255e98126f2c328f9bfb
Database: OpenAIRE
Description
Abstract:Research purpose. This study explores whether R programming can transform unstructured qualitative social media data into a quantitative format suitable for econometric modelling. It specifically examines how elements such as text, emojis, and sentiment from Reddit and X (formerly Twitter) can be converted into variables for regression analysis. With the aim to enhance the predictive power of traditional financial models using alternative data sources, the paper outlines comprehensive guidelines with specific technical steps, from scripting an API to extracting data from Reddit and X, through cleaning and tokenising to incorporating the data into regression models using R programming. The study addresses the growing need in financial economics to incorporate alternative data streams by offering a structured, replicable process for transforming high-volume, unstructured online content into statistically valid variables, thereby bridging the gap between qualitative market sentiment and quantitative modelling. Design / Methodology / Approach. Focusing on the methodology and R scripts, this research adopts a quantitative approach, transforming qualitative social media data into a format suitable for multiple linear and instrumental variable regression models to assess the effect of social media signals on asset prices, with GameStop (GME) and Best Buy (BBY) as case studies. The process ensures reproducibility and includes open-source code, enhancing transparency and applicability for both academic and professional financial data analysis contexts. Findings. The findings demonstrate that qualitative social media data can be quantified for financial analysis. It was effectively extracted, cleaned, and used for regression analysis. Results show that traditional market indicators fail to explain GME’s price shifts, while the frequency of rocket emojis (interpreted as speculative sentiment) was statistically significant. BBY’s returns, however, aligned more closely with market and industry indices, suggesting a lower influence of private sentiment. Originality / Value / Practical implications. The research provides a replicable method for integrating social media data into econometric models, contributing new tools for analysing market sentiment. By adapting classical financial models to modern data sources, the paper opens new directions for asset pricing research. The paper provides technical tools created in R for use in econometric analysis, useful both for academics and practitioners.
ISSN:22560173
DOI:10.2478/jec-2025-0006