Analyzing Large-Scale Twitter Real Time Streaming Data with Manifold Machine Learning Algorithms in Apache SPARK

With the rapid growth of social media platforms, Twitter has emerged as a valuable source that provides real-time data on public opinion, sentiment, and trends through approximately 7500 tweets per second, enabling individuals, brand organizations, and public influencers to express thoughts, opinion...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2023 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI) s. 1 - 9
Hlavní autoři: Abhineswari, M, Priyadarshini, R
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 21.12.2023
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:With the rapid growth of social media platforms, Twitter has emerged as a valuable source that provides real-time data on public opinion, sentiment, and trends through approximately 7500 tweets per second, enabling individuals, brand organizations, and public influencers to express thoughts, opinions, and updates in a concise manner. This research paper combines Apache Spark, machine learning techniques, and data visualization methods to analyze sentiment patterns in streaming Twitter data. The implementation leverages distributed computing capabilities and real-time streaming processing techniques to handle large amounts of data. Hashtag counting, noun counting, sentiment lexicon, and Jaccard similarity are used for the trend analysis of tweets, and Logistic regression, Naive Bayes, Multinomial Naive Bayes, Support Vector Machine (SVM), Random Forest, and Decision Trees are used to classify tweets. One of the interactive visualization tools, like Tableau, generates dynamic representations of sentiment trends and distributions across topics or time periods. The generation of response time is combined with the measurement of accuracy, precision, recall, and F1 score as key performance indicators, indicating the effectiveness and scalability of the proposed system. This research contributes to the future advancements of sentiment analysis techniques for streaming social media data and opens avenues for further research in this domain.
DOI:10.1109/ICDSAAI59313.2023.10452549