Multi-label text classification on unbalanced Twitter with monolingual model and hyperparameter optimization for hate speech and abusive language detection
The increase in hate speech and abusive language on social media leads to uncomfortable interactions among users. Many datasets available publicly that address hate speech and abusive language are not balanced, particularly those from Indonesian Twitter. To develop a more effective classification mo...
Saved in:
| Published in: | International journal of advanced and applied sciences Vol. 11; no. 5; pp. 177 - 185 |
|---|---|
| Main Authors: | , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
01.05.2024
|
| ISSN: | 2313-626X, 2313-3724 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The increase in hate speech and abusive language on social media leads to uncomfortable interactions among users. Many datasets available publicly that address hate speech and abusive language are not balanced, particularly those from Indonesian Twitter. To develop a more effective classification model that also considers minority classes, we needed to optimize the hyperparameters of a monolingual model, use four different data preprocessing scenarios, and improve the treatment of slang words. We assessed the model's effectiveness by its accuracy, achieving 81.38%. This result came from optimizing hyperparameters, processing data without stemming and removing stop words, and enhancing the slang word data. The optimal hyperparameters were a learning rate of 4e-5, a batch size of 16, and a dropout rate of 0.1. However, using too much dropout can decrease the model’s performance and its ability to predict less common categories, such as physical- and gender-related hate speech. |
|---|---|
| AbstractList | The increase in hate speech and abusive language on social media leads to uncomfortable interactions among users. Many datasets available publicly that address hate speech and abusive language are not balanced, particularly those from Indonesian Twitter. To develop a more effective classification model that also considers minority classes, we needed to optimize the hyperparameters of a monolingual model, use four different data preprocessing scenarios, and improve the treatment of slang words. We assessed the model's effectiveness by its accuracy, achieving 81.38%. This result came from optimizing hyperparameters, processing data without stemming and removing stop words, and enhancing the slang word data. The optimal hyperparameters were a learning rate of 4e-5, a batch size of 16, and a dropout rate of 0.1. However, using too much dropout can decrease the model’s performance and its ability to predict less common categories, such as physical- and gender-related hate speech. |
| Author | Alzahrani, Ahmad A. Permana, Asep Bramantoro, Arif |
| Author_xml | – sequence: 1 givenname: Ahmad A. surname: Alzahrani fullname: Alzahrani, Ahmad A. – sequence: 2 givenname: Arif orcidid: 0000-0003-2772-9427 surname: Bramantoro fullname: Bramantoro, Arif – sequence: 3 givenname: Asep surname: Permana fullname: Permana, Asep |
| BookMark | eNotkMlqwzAQhkVpoWmaF-hJL-BUi-XlWEI3SOklhd7MWBrHCvKCJbdNX6UvW2eBgX8O_3wD3w25bLsWCbnjbCl4JuW93QH4pWAiXjK1ZDy_IDMhuYxkKuLL856I5POaLLzfMca4yhRP8hn5extdsJGDEh0N-BOoduC9rayGYLuWTjO2JThoNRq6-bYh4ECnqGnTtZ2z7XYEN-1mAkBraL3vcehhgAYPza4PtrG_J1jVDbSGgNT3iLo-9qEcvf1COn2YSFukZrrTh_otuarAeVycc04-nh43q5do_f78unpYR1rEIkSJiqWSIs2MMpkWWjBTqcpwUWqZA8_KFFmGAsDkxqSclUrICkFUWWJykLmcE3Hi6qHzfsCq6AfbwLAvOCuOhouj4eJguGCqmAzLfx2idzA |
| Cites_doi | 10.1109/ICoDSA50139.2020.9212992 10.1109/ICEEI.2015.7352552 10.7717/peerj-cs.884 10.1109/IALP.2018.8629262 10.7717/peerj-cs.998 10.1007/978-3-319-98074-4 10.18653/v1/W19-3506 10.1109/ICAICTA49861.2020.9429038 10.5120/20083-1666 10.1109/ICACSIS.2017.8355039 10.21833/ijaas.2021.12.001 10.1186/s40537-019-0192-5 10.1109/CIS.2019.00025 10.1007/978-3-030-39627-5_6 10.1007/s42979-021-00457-3 10.1109/ICITACEE.2019.8904425 10.18653/v1/2020.aacl-main.85 10.1109/ICoDSA50139.2020.9212962 |
| ContentType | Journal Article |
| CorporateAuthor | School of Computing and Informatics, Universiti Teknologi Brunei, Bandar Seri Begawan, Brunei Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia Faculty of Information Technology, Universitas Budi Luhur, Jakarta, Indonesia |
| CorporateAuthor_xml | – name: Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia – name: Faculty of Information Technology, Universitas Budi Luhur, Jakarta, Indonesia – name: School of Computing and Informatics, Universiti Teknologi Brunei, Bandar Seri Begawan, Brunei |
| DBID | AAYXX CITATION |
| DOI | 10.21833/ijaas.2024.05.019 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Sciences (General) |
| EISSN | 2313-3724 |
| EndPage | 185 |
| ExternalDocumentID | 10_21833_ijaas_2024_05_019 |
| GroupedDBID | 5VS AAYXX ADBBV ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION KQ8 M~E |
| ID | FETCH-LOGICAL-c242t-654353278d5d8c2c20df5fd12bc39a18b7e08e2aad9dd710b523fea2f86d9a393 |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001258637500019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2313-626X |
| IngestDate | Sat Nov 29 08:19:21 EST 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 5 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c242t-654353278d5d8c2c20df5fd12bc39a18b7e08e2aad9dd710b523fea2f86d9a393 |
| ORCID | 0000-0003-2772-9427 |
| OpenAccessLink | https://science-gate.com/IJAAS/Articles/2024/2024-11-05/1021833ijaas202405019.pdf |
| PageCount | 9 |
| ParticipantIDs | crossref_primary_10_21833_ijaas_2024_05_019 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-5-00 |
| PublicationDateYYYYMMDD | 2024-05-01 |
| PublicationDate_xml | – month: 05 year: 2024 text: 2024-5-00 |
| PublicationDecade | 2020 |
| PublicationTitle | International journal of advanced and applied sciences |
| PublicationYear | 2024 |
| References | ref13 ref12 ref23 ref15 ref14 ref20 ref11 ref10 ref21 ref0 ref2 ref1 ref17 Wilie (ref22) 2020 ref16 ref19 ref18 ref8 ref7 ref9 ref4 ref3 ref6 ref5 |
| References_xml | – ident: ref4 doi: 10.1109/ICoDSA50139.2020.9212992 – ident: ref23 doi: 10.1109/ICEEI.2015.7352552 – ident: ref15 doi: 10.7717/peerj-cs.884 – ident: ref18 doi: 10.1109/IALP.2018.8629262 – ident: ref6 – ident: ref20 – ident: ref1 doi: 10.7717/peerj-cs.998 – ident: ref3 doi: 10.1007/978-3-319-98074-4 – ident: ref7 doi: 10.18653/v1/W19-3506 – ident: ref16 doi: 10.1109/ICAICTA49861.2020.9429038 – ident: ref21 – ident: ref2 doi: 10.5120/20083-1666 – ident: ref0 doi: 10.1109/ICACSIS.2017.8355039 – ident: ref12 doi: 10.21833/ijaas.2021.12.001 – ident: ref8 doi: 10.1186/s40537-019-0192-5 – ident: ref9 – ident: ref11 doi: 10.1109/CIS.2019.00025 – ident: ref19 – ident: ref13 doi: 10.1007/978-3-030-39627-5_6 – ident: ref10 doi: 10.1007/s42979-021-00457-3 – ident: ref14 doi: 10.1109/ICITACEE.2019.8904425 – ident: ref17 – year: 2020 ident: ref22 article-title: IndoNLU: Benchmark and resources for evaluating Indonesian natural language understanding publication-title: In the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing Suzhou China doi: 10.18653/v1/2020.aacl-main.85 – ident: ref5 doi: 10.1109/ICoDSA50139.2020.9212962 |
| SSID | ssj0001585169 |
| Score | 2.2618005 |
| Snippet | The increase in hate speech and abusive language on social media leads to uncomfortable interactions among users. Many datasets available publicly that address... |
| SourceID | crossref |
| SourceType | Index Database |
| StartPage | 177 |
| Title | Multi-label text classification on unbalanced Twitter with monolingual model and hyperparameter optimization for hate speech and abusive language detection |
| Volume | 11 |
| WOSCitedRecordID | wos001258637500019&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2313-3724 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001585169 issn: 2313-626X databaseCode: M~E dateStart: 20140101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lq9NAFB7K1YUb8frAN7NwoYRoOkmaybKI4saLiwrdhXnaSptb0t56ceEf8Vf6DzxnZppMrwjehVBCGcIhyfmY85jvnEPIi8xqLgpZpIW1FVykSOtKVenESgv2xRYls27YRHV2xufz-tNo9OtQC7NfVW3LLy_rzX9VNayBsrF09hrq7oXCAvwHpcMV1A7Xf1K8K6lNQblmlSCtI1HoICMjKHiHbXLRSmQ04tH_7NsS63l8PhaeEWf4fHElJTgixx0tLCBU7bBF-BqpM8k5bDLrUL3pSIoLcFeT7cYY5avkBA623Js-FZpos3OErzb2hI9TkVEDi56V4GQFJzkY6j4AmK6-i0Xn51El08Va6GT6OuIHrHE2cijh6ZZ2MABYI-FTyVuziVMerBgIhn5nBJ80TyESm3sjNqzllS_I7rf2cQThMtqnx2F2jDf5Yz816Ko1Qe8R21osvwqBnd1Z4Zu81oPtPPAFrpjUnugIIZaT0jgZDcposrLJsFHtDVaVNbIQP_6I0oJ4TOsGMfYv6Uu9nJg3fzxK5E5FftHsDrkdAho69UA8JSPT3iWnwWRs6cvQ1_zVPfIzQiZFZNJjZFL4DcikAZkUkUkjZFKHTArooMfIpDEyKSCTIjKpR6a7PyCTHpBJe2TeJ5_fv5u9_ZCG2SCpAqdyl2JJdJmziutSc8UUy7QtrR4zqfJajLmsTMYNE0LXWoMXLUuWWyOY5RNdi7zOH5CT9rw1DwmtM6smEHaoEj6rkhlOYMg1Z1IUleQif0SSwyduNr4FTPN3vT6-1t1PyK0B4E_Jya67MM_ITbXfLbfdcweN384vthI |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Multi-label+text+classification+on+unbalanced+Twitter+with+monolingual+model+and+hyperparameter+optimization+for+hate+speech+and+abusive+language+detection&rft.jtitle=International+journal+of+advanced+and+applied+sciences&rft.au=Alzahrani%2C+Ahmad+A.&rft.au=Bramantoro%2C+Arif&rft.au=Permana%2C+Asep&rft.date=2024-05-01&rft.issn=2313-626X&rft.eissn=2313-3724&rft.volume=11&rft.issue=5&rft.spage=177&rft.epage=185&rft_id=info:doi/10.21833%2Fijaas.2024.05.019&rft.externalDBID=n%2Fa&rft.externalDocID=10_21833_ijaas_2024_05_019 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2313-626X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2313-626X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2313-626X&client=summon |