Improving the Accuracy of Social Media Sentiment Classification with the Combination of TF-IDF Method and Random Forest Algorithm

Uloženo v:
Podrobná bibliografie
Název: Improving the Accuracy of Social Media Sentiment Classification with the Combination of TF-IDF Method and Random Forest Algorithm
Autoři: null Siti Mutmainah, null Fathir, null Erin Eka Citra
Zdroj: Journix: Journal of Informatics and Computing. 1:30-40
Informace o vydavateli: Yayasan Ran Edu Center, 2025.
Rok vydání: 2025
Popis: Sentiment classification on social media text data is one of the main challenges in public opinion analysis. The large volume of data and the diversity of informal languages make sentiment analysis a challenge in itself, especially in the context of Indonesian. This research aims to improve the accuracy of social media sentiment classification by combining Term Frequency-Inverse Document Frequency (TF-IDF) method as a text representation technique and Random Forest algorithm as a classification model. The dataset used consists of 20,000 Indonesian opinion data collected from Twitter and Instagram, and has been labeled into three sentiment categories: positive, negative, and neutral. This data went through a preprocessing stage, including text cleaning, tokenization, stopword removal, stemming, and normalization. Experimental results show that the combination of TF-IDF and Random Forest yields an accuracy of 91.2% with average precision, recall, and F1-score values above 0.90. The confusion matrix analysis revealed that the model was highly effective in classifying positive and negative sentiments, although there were challenges in distinguishing neutral sentiments. These findings indicate that the approach used is quite reliable and can be used as a foundation for the development of sentiment analysis systems on an industrial scale as well as further research.
Druh dokumentu: Article
ISSN: 3090-6784
DOI: 10.63866/journix.v1i1.2
Rights: CC BY SA
Přístupové číslo: edsair.doi...........933e6cbe01f4579c3ba4253d71cd170e
Databáze: OpenAIRE
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Mutmainah%20nS
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edsair
DbLabel: OpenAIRE
An: edsair.doi...........933e6cbe01f4579c3ba4253d71cd170e
RelevancyScore: 1032
AccessLevel: 3
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1032.48937988281
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Improving the Accuracy of Social Media Sentiment Classification with the Combination of TF-IDF Method and Random Forest Algorithm
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22null+Siti+Mutmainah%22">null Siti Mutmainah</searchLink><br /><searchLink fieldCode="AR" term="%22null+Fathir%22">null Fathir</searchLink><br /><searchLink fieldCode="AR" term="%22null+Erin+Eka+Citra%22">null Erin Eka Citra</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <i>Journix: Journal of Informatics and Computing</i>. 1:30-40
– Name: Publisher
  Label: Publisher Information
  Group: PubInfo
  Data: Yayasan Ran Edu Center, 2025.
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2025
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Sentiment classification on social media text data is one of the main challenges in public opinion analysis. The large volume of data and the diversity of informal languages make sentiment analysis a challenge in itself, especially in the context of Indonesian. This research aims to improve the accuracy of social media sentiment classification by combining Term Frequency-Inverse Document Frequency (TF-IDF) method as a text representation technique and Random Forest algorithm as a classification model. The dataset used consists of 20,000 Indonesian opinion data collected from Twitter and Instagram, and has been labeled into three sentiment categories: positive, negative, and neutral. This data went through a preprocessing stage, including text cleaning, tokenization, stopword removal, stemming, and normalization. Experimental results show that the combination of TF-IDF and Random Forest yields an accuracy of 91.2% with average precision, recall, and F1-score values above 0.90. The confusion matrix analysis revealed that the model was highly effective in classifying positive and negative sentiments, although there were challenges in distinguishing neutral sentiments. These findings indicate that the approach used is quite reliable and can be used as a foundation for the development of sentiment analysis systems on an industrial scale as well as further research.
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Article
– Name: ISSN
  Label: ISSN
  Group: ISSN
  Data: 3090-6784
– Name: DOI
  Label: DOI
  Group: ID
  Data: 10.63866/journix.v1i1.2
– Name: Copyright
  Label: Rights
  Group: Cpyrght
  Data: CC BY SA
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsair.doi...........933e6cbe01f4579c3ba4253d71cd170e
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsair&AN=edsair.doi...........933e6cbe01f4579c3ba4253d71cd170e
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.63866/journix.v1i1.2
    Languages:
      – Text: Undetermined
    PhysicalDescription:
      Pagination:
        PageCount: 11
        StartPage: 30
    Titles:
      – TitleFull: Improving the Accuracy of Social Media Sentiment Classification with the Combination of TF-IDF Method and Random Forest Algorithm
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: null Siti Mutmainah
      – PersonEntity:
          Name:
            NameFull: null Fathir
      – PersonEntity:
          Name:
            NameFull: null Erin Eka Citra
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 30
              M: 04
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 30906784
            – Type: issn-locals
              Value: edsair
            – Type: issn-locals
              Value: edsairFT
          Numbering:
            – Type: volume
              Value: 1
          Titles:
            – TitleFull: Journix: Journal of Informatics and Computing
              Type: main
ResultId 1