Multi-label text classification on unbalanced Twitter with monolingual model and hyperparameter optimization for hate speech and abusive language detection

The increase in hate speech and abusive language on social media leads to uncomfortable interactions among users. Many datasets available publicly that address hate speech and abusive language are not balanced, particularly those from Indonesian Twitter. To develop a more effective classification mo...

Full description

Saved in:
Bibliographic Details
Published in:International journal of advanced and applied sciences Vol. 11; no. 5; pp. 177 - 185
Main Authors: Alzahrani, Ahmad A., Bramantoro, Arif, Permana, Asep
Format: Journal Article
Language:English
Published: 01.05.2024
ISSN:2313-626X, 2313-3724
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The increase in hate speech and abusive language on social media leads to uncomfortable interactions among users. Many datasets available publicly that address hate speech and abusive language are not balanced, particularly those from Indonesian Twitter. To develop a more effective classification model that also considers minority classes, we needed to optimize the hyperparameters of a monolingual model, use four different data preprocessing scenarios, and improve the treatment of slang words. We assessed the model's effectiveness by its accuracy, achieving 81.38%. This result came from optimizing hyperparameters, processing data without stemming and removing stop words, and enhancing the slang word data. The optimal hyperparameters were a learning rate of 4e-5, a batch size of 16, and a dropout rate of 0.1. However, using too much dropout can decrease the model’s performance and its ability to predict less common categories, such as physical- and gender-related hate speech.
AbstractList The increase in hate speech and abusive language on social media leads to uncomfortable interactions among users. Many datasets available publicly that address hate speech and abusive language are not balanced, particularly those from Indonesian Twitter. To develop a more effective classification model that also considers minority classes, we needed to optimize the hyperparameters of a monolingual model, use four different data preprocessing scenarios, and improve the treatment of slang words. We assessed the model's effectiveness by its accuracy, achieving 81.38%. This result came from optimizing hyperparameters, processing data without stemming and removing stop words, and enhancing the slang word data. The optimal hyperparameters were a learning rate of 4e-5, a batch size of 16, and a dropout rate of 0.1. However, using too much dropout can decrease the model’s performance and its ability to predict less common categories, such as physical- and gender-related hate speech.
Author Alzahrani, Ahmad A.
Permana, Asep
Bramantoro, Arif
Author_xml – sequence: 1
  givenname: Ahmad A.
  surname: Alzahrani
  fullname: Alzahrani, Ahmad A.
– sequence: 2
  givenname: Arif
  orcidid: 0000-0003-2772-9427
  surname: Bramantoro
  fullname: Bramantoro, Arif
– sequence: 3
  givenname: Asep
  surname: Permana
  fullname: Permana, Asep
BookMark eNotkMlqwzAQhkVpoWmaF-hJL-BUi-XlWEI3SOklhd7MWBrHCvKCJbdNX6UvW2eBgX8O_3wD3w25bLsWCbnjbCl4JuW93QH4pWAiXjK1ZDy_IDMhuYxkKuLL856I5POaLLzfMca4yhRP8hn5extdsJGDEh0N-BOoduC9rayGYLuWTjO2JThoNRq6-bYh4ECnqGnTtZ2z7XYEN-1mAkBraL3vcehhgAYPza4PtrG_J1jVDbSGgNT3iLo-9qEcvf1COn2YSFukZrrTh_otuarAeVycc04-nh43q5do_f78unpYR1rEIkSJiqWSIs2MMpkWWjBTqcpwUWqZA8_KFFmGAsDkxqSclUrICkFUWWJykLmcE3Hi6qHzfsCq6AfbwLAvOCuOhouj4eJguGCqmAzLfx2idzA
Cites_doi 10.1109/ICoDSA50139.2020.9212992
10.1109/ICEEI.2015.7352552
10.7717/peerj-cs.884
10.1109/IALP.2018.8629262
10.7717/peerj-cs.998
10.1007/978-3-319-98074-4
10.18653/v1/W19-3506
10.1109/ICAICTA49861.2020.9429038
10.5120/20083-1666
10.1109/ICACSIS.2017.8355039
10.21833/ijaas.2021.12.001
10.1186/s40537-019-0192-5
10.1109/CIS.2019.00025
10.1007/978-3-030-39627-5_6
10.1007/s42979-021-00457-3
10.1109/ICITACEE.2019.8904425
10.18653/v1/2020.aacl-main.85
10.1109/ICoDSA50139.2020.9212962
ContentType Journal Article
CorporateAuthor School of Computing and Informatics, Universiti Teknologi Brunei, Bandar Seri Begawan, Brunei
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
Faculty of Information Technology, Universitas Budi Luhur, Jakarta, Indonesia
CorporateAuthor_xml – name: Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
– name: Faculty of Information Technology, Universitas Budi Luhur, Jakarta, Indonesia
– name: School of Computing and Informatics, Universiti Teknologi Brunei, Bandar Seri Begawan, Brunei
DBID AAYXX
CITATION
DOI 10.21833/ijaas.2024.05.019
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Sciences (General)
EISSN 2313-3724
EndPage 185
ExternalDocumentID 10_21833_ijaas_2024_05_019
GroupedDBID 5VS
AAYXX
ADBBV
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
KQ8
M~E
ID FETCH-LOGICAL-c242t-654353278d5d8c2c20df5fd12bc39a18b7e08e2aad9dd710b523fea2f86d9a393
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001258637500019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2313-626X
IngestDate Sat Nov 29 08:19:21 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c242t-654353278d5d8c2c20df5fd12bc39a18b7e08e2aad9dd710b523fea2f86d9a393
ORCID 0000-0003-2772-9427
OpenAccessLink https://science-gate.com/IJAAS/Articles/2024/2024-11-05/1021833ijaas202405019.pdf
PageCount 9
ParticipantIDs crossref_primary_10_21833_ijaas_2024_05_019
PublicationCentury 2000
PublicationDate 2024-5-00
PublicationDateYYYYMMDD 2024-05-01
PublicationDate_xml – month: 05
  year: 2024
  text: 2024-5-00
PublicationDecade 2020
PublicationTitle International journal of advanced and applied sciences
PublicationYear 2024
References ref13
ref12
ref23
ref15
ref14
ref20
ref11
ref10
ref21
ref0
ref2
ref1
ref17
Wilie (ref22) 2020
ref16
ref19
ref18
ref8
ref7
ref9
ref4
ref3
ref6
ref5
References_xml – ident: ref4
  doi: 10.1109/ICoDSA50139.2020.9212992
– ident: ref23
  doi: 10.1109/ICEEI.2015.7352552
– ident: ref15
  doi: 10.7717/peerj-cs.884
– ident: ref18
  doi: 10.1109/IALP.2018.8629262
– ident: ref6
– ident: ref20
– ident: ref1
  doi: 10.7717/peerj-cs.998
– ident: ref3
  doi: 10.1007/978-3-319-98074-4
– ident: ref7
  doi: 10.18653/v1/W19-3506
– ident: ref16
  doi: 10.1109/ICAICTA49861.2020.9429038
– ident: ref21
– ident: ref2
  doi: 10.5120/20083-1666
– ident: ref0
  doi: 10.1109/ICACSIS.2017.8355039
– ident: ref12
  doi: 10.21833/ijaas.2021.12.001
– ident: ref8
  doi: 10.1186/s40537-019-0192-5
– ident: ref9
– ident: ref11
  doi: 10.1109/CIS.2019.00025
– ident: ref19
– ident: ref13
  doi: 10.1007/978-3-030-39627-5_6
– ident: ref10
  doi: 10.1007/s42979-021-00457-3
– ident: ref14
  doi: 10.1109/ICITACEE.2019.8904425
– ident: ref17
– year: 2020
  ident: ref22
  article-title: IndoNLU: Benchmark and resources for evaluating Indonesian natural language understanding
  publication-title: In the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing Suzhou China
  doi: 10.18653/v1/2020.aacl-main.85
– ident: ref5
  doi: 10.1109/ICoDSA50139.2020.9212962
SSID ssj0001585169
Score 2.2618005
Snippet The increase in hate speech and abusive language on social media leads to uncomfortable interactions among users. Many datasets available publicly that address...
SourceID crossref
SourceType Index Database
StartPage 177
Title Multi-label text classification on unbalanced Twitter with monolingual model and hyperparameter optimization for hate speech and abusive language detection
Volume 11
WOSCitedRecordID wos001258637500019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2313-3724
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001585169
  issn: 2313-626X
  databaseCode: M~E
  dateStart: 20140101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lq9NAFB7K1YUb8frAN7NwoYRoOkmaybKI4saLiwrdhXnaSptb0t56ceEf8Vf6DzxnZppMrwjehVBCGcIhyfmY85jvnEPIi8xqLgpZpIW1FVykSOtKVenESgv2xRYls27YRHV2xufz-tNo9OtQC7NfVW3LLy_rzX9VNayBsrF09hrq7oXCAvwHpcMV1A7Xf1K8K6lNQblmlSCtI1HoICMjKHiHbXLRSmQ04tH_7NsS63l8PhaeEWf4fHElJTgixx0tLCBU7bBF-BqpM8k5bDLrUL3pSIoLcFeT7cYY5avkBA623Js-FZpos3OErzb2hI9TkVEDi56V4GQFJzkY6j4AmK6-i0Xn51El08Va6GT6OuIHrHE2cijh6ZZ2MABYI-FTyVuziVMerBgIhn5nBJ80TyESm3sjNqzllS_I7rf2cQThMtqnx2F2jDf5Yz816Ko1Qe8R21osvwqBnd1Z4Zu81oPtPPAFrpjUnugIIZaT0jgZDcposrLJsFHtDVaVNbIQP_6I0oJ4TOsGMfYv6Uu9nJg3fzxK5E5FftHsDrkdAho69UA8JSPT3iWnwWRs6cvQ1_zVPfIzQiZFZNJjZFL4DcikAZkUkUkjZFKHTArooMfIpDEyKSCTIjKpR6a7PyCTHpBJe2TeJ5_fv5u9_ZCG2SCpAqdyl2JJdJmziutSc8UUy7QtrR4zqfJajLmsTMYNE0LXWoMXLUuWWyOY5RNdi7zOH5CT9rw1DwmtM6smEHaoEj6rkhlOYMg1Z1IUleQif0SSwyduNr4FTPN3vT6-1t1PyK0B4E_Jya67MM_ITbXfLbfdcweN384vthI
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multi-label+text+classification+on+unbalanced+Twitter+with+monolingual+model+and+hyperparameter+optimization+for+hate+speech+and+abusive+language+detection&rft.jtitle=International+journal+of+advanced+and+applied+sciences&rft.au=Alzahrani%2C+Ahmad+A.&rft.au=Bramantoro%2C+Arif&rft.au=Permana%2C+Asep&rft.date=2024-05-01&rft.issn=2313-626X&rft.eissn=2313-3724&rft.volume=11&rft.issue=5&rft.spage=177&rft.epage=185&rft_id=info:doi/10.21833%2Fijaas.2024.05.019&rft.externalDBID=n%2Fa&rft.externalDocID=10_21833_ijaas_2024_05_019
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2313-626X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2313-626X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2313-626X&client=summon