Improving the Accuracy of Social Media Sentiment Classification with the Combination of TF-IDF Method and Random Forest Algorithm
Uloženo v:
| Název: | Improving the Accuracy of Social Media Sentiment Classification with the Combination of TF-IDF Method and Random Forest Algorithm |
|---|---|
| Autoři: | null Siti Mutmainah, null Fathir, null Erin Eka Citra |
| Zdroj: | Journix: Journal of Informatics and Computing. 1:30-40 |
| Informace o vydavateli: | Yayasan Ran Edu Center, 2025. |
| Rok vydání: | 2025 |
| Popis: | Sentiment classification on social media text data is one of the main challenges in public opinion analysis. The large volume of data and the diversity of informal languages make sentiment analysis a challenge in itself, especially in the context of Indonesian. This research aims to improve the accuracy of social media sentiment classification by combining Term Frequency-Inverse Document Frequency (TF-IDF) method as a text representation technique and Random Forest algorithm as a classification model. The dataset used consists of 20,000 Indonesian opinion data collected from Twitter and Instagram, and has been labeled into three sentiment categories: positive, negative, and neutral. This data went through a preprocessing stage, including text cleaning, tokenization, stopword removal, stemming, and normalization. Experimental results show that the combination of TF-IDF and Random Forest yields an accuracy of 91.2% with average precision, recall, and F1-score values above 0.90. The confusion matrix analysis revealed that the model was highly effective in classifying positive and negative sentiments, although there were challenges in distinguishing neutral sentiments. These findings indicate that the approach used is quite reliable and can be used as a foundation for the development of sentiment analysis systems on an industrial scale as well as further research. |
| Druh dokumentu: | Article |
| ISSN: | 3090-6784 |
| DOI: | 10.63866/journix.v1i1.2 |
| Rights: | CC BY SA |
| Přístupové číslo: | edsair.doi...........933e6cbe01f4579c3ba4253d71cd170e |
| Databáze: | OpenAIRE |
| FullText | Text: Availability: 0 CustomLinks: – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Mutmainah%20nS Name: ISI Category: fullText Text: Nájsť tento článok vo Web of Science Icon: https://imagesrvr.epnet.com/ls/20docs.gif MouseOverText: Nájsť tento článok vo Web of Science |
|---|---|
| Header | DbId: edsair DbLabel: OpenAIRE An: edsair.doi...........933e6cbe01f4579c3ba4253d71cd170e RelevancyScore: 1032 AccessLevel: 3 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 1032.48937988281 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Improving the Accuracy of Social Media Sentiment Classification with the Combination of TF-IDF Method and Random Forest Algorithm – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22null+Siti+Mutmainah%22">null Siti Mutmainah</searchLink><br /><searchLink fieldCode="AR" term="%22null+Fathir%22">null Fathir</searchLink><br /><searchLink fieldCode="AR" term="%22null+Erin+Eka+Citra%22">null Erin Eka Citra</searchLink> – Name: TitleSource Label: Source Group: Src Data: <i>Journix: Journal of Informatics and Computing</i>. 1:30-40 – Name: Publisher Label: Publisher Information Group: PubInfo Data: Yayasan Ran Edu Center, 2025. – Name: DatePubCY Label: Publication Year Group: Date Data: 2025 – Name: Abstract Label: Description Group: Ab Data: Sentiment classification on social media text data is one of the main challenges in public opinion analysis. The large volume of data and the diversity of informal languages make sentiment analysis a challenge in itself, especially in the context of Indonesian. This research aims to improve the accuracy of social media sentiment classification by combining Term Frequency-Inverse Document Frequency (TF-IDF) method as a text representation technique and Random Forest algorithm as a classification model. The dataset used consists of 20,000 Indonesian opinion data collected from Twitter and Instagram, and has been labeled into three sentiment categories: positive, negative, and neutral. This data went through a preprocessing stage, including text cleaning, tokenization, stopword removal, stemming, and normalization. Experimental results show that the combination of TF-IDF and Random Forest yields an accuracy of 91.2% with average precision, recall, and F1-score values above 0.90. The confusion matrix analysis revealed that the model was highly effective in classifying positive and negative sentiments, although there were challenges in distinguishing neutral sentiments. These findings indicate that the approach used is quite reliable and can be used as a foundation for the development of sentiment analysis systems on an industrial scale as well as further research. – Name: TypeDocument Label: Document Type Group: TypDoc Data: Article – Name: ISSN Label: ISSN Group: ISSN Data: 3090-6784 – Name: DOI Label: DOI Group: ID Data: 10.63866/journix.v1i1.2 – Name: Copyright Label: Rights Group: Cpyrght Data: CC BY SA – Name: AN Label: Accession Number Group: ID Data: edsair.doi...........933e6cbe01f4579c3ba4253d71cd170e |
| PLink | https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsair&AN=edsair.doi...........933e6cbe01f4579c3ba4253d71cd170e |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.63866/journix.v1i1.2 Languages: – Text: Undetermined PhysicalDescription: Pagination: PageCount: 11 StartPage: 30 Titles: – TitleFull: Improving the Accuracy of Social Media Sentiment Classification with the Combination of TF-IDF Method and Random Forest Algorithm Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: null Siti Mutmainah – PersonEntity: Name: NameFull: null Fathir – PersonEntity: Name: NameFull: null Erin Eka Citra IsPartOfRelationships: – BibEntity: Dates: – D: 30 M: 04 Type: published Y: 2025 Identifiers: – Type: issn-print Value: 30906784 – Type: issn-locals Value: edsair – Type: issn-locals Value: edsairFT Numbering: – Type: volume Value: 1 Titles: – TitleFull: Journix: Journal of Informatics and Computing Type: main |
| ResultId | 1 |
Nájsť tento článok vo Web of Science