A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model
Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to discriminate offensive content. Moreover, social media users are from linguistically different communities. This study aims to tackle the Mul...
Gespeichert in:
| Veröffentlicht in: | Journal of King Saud University. Computer and information sciences Jg. 34; H. 8; S. 6048 - 6056 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
Springer
01.09.2022
|
| Schlagworte: | |
| ISSN: | 1319-1578 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to discriminate offensive content. Moreover, social media users are from linguistically different communities. This study aims to tackle the Multilingual Offensive Language Detection (MOLD) task using transfer learning models and the fine-tuning phase. We propose an effective approach based on the Bidirectional Encoder Representations from Transformers (BERT) that has shown great potential in capturing the semantics and contextual information within texts. The proposed system consists of several stages: (1) Preprocessing, (2) Text representation using BERT models, and (3) Classification into two categories: Offensive and non-offensive. To handle multilingualism, we explore different techniques such as the joint-multilingual and translation-based ones. The first consists in developing one classification system for different languages, and the second involves the translation phase to transform all texts into one universal language then classify them. We conduct several experiments on a bilingual dataset extracted from the Semi-supervised Offensive Language Identification Dataset (SOLID). The experimental findings show that the translation-based method in conjunction with Arabic BERT (AraBERT) achieves over 93% and 91% in terms of F1-score and accuracy, respectively. |
|---|---|
| AbstractList | Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to discriminate offensive content. Moreover, social media users are from linguistically different communities. This study aims to tackle the Multilingual Offensive Language Detection (MOLD) task using transfer learning models and the fine-tuning phase. We propose an effective approach based on the Bidirectional Encoder Representations from Transformers (BERT) that has shown great potential in capturing the semantics and contextual information within texts. The proposed system consists of several stages: (1) Preprocessing, (2) Text representation using BERT models, and (3) Classification into two categories: Offensive and non-offensive. To handle multilingualism, we explore different techniques such as the joint-multilingual and translation-based ones. The first consists in developing one classification system for different languages, and the second involves the translation phase to transform all texts into one universal language then classify them. We conduct several experiments on a bilingual dataset extracted from the Semi-supervised Offensive Language Identification Dataset (SOLID). The experimental findings show that the translation-based method in conjunction with Arabic BERT (AraBERT) achieves over 93% and 91% in terms of F1-score and accuracy, respectively. |
| Author | Said Ouatik El Alaoui Fatima-zahra El-Alami Noureddine En Nahnahi |
| Author_xml | – sequence: 1 fullname: Fatima-zahra El-Alami organization: Laboratory of Informatics, Signals, Automatic and Cognitivism, FSDM, Sidi Mohamed Ben Abdellah University, Fez, Morocco; Corresponding author – sequence: 2 fullname: Said Ouatik El Alaoui organization: Laboratory of Informatics, Signals, Automatic and Cognitivism, FSDM, Sidi Mohamed Ben Abdellah University, Fez, Morocco; Engineering Sciences Laboratory, National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco – sequence: 3 fullname: Noureddine En Nahnahi organization: Laboratory of Informatics, Signals, Automatic and Cognitivism, FSDM, Sidi Mohamed Ben Abdellah University, Fez, Morocco |
| BookMark | eNotj8tqwzAURLVIoWmaP-hCP2BXD1u2liH0EQh0066NLF2lcmUpyHKhf183zWqYc2Bg7tAqxAAIPVBSUkLF41AOX9OsXckIoyVpSkL5Cq0pp7KgddPeou00DYQQ2oi64mKNzjs8zj4778JpVh5HayFM7huwV3_kBNhABp1dDHiE_BkN7tUEBi89JxUmCwl7UCksC9imOF5xTONirAtQ5Pkix2jA36Mbq_wE22tu0Mfz0_v-tTi-vRz2u2OhK9rmwvSWNNAAs73pueKC10IxwxhvlOR9tdytVc1qsTwREkjVUlozJhrNBLTA-QYd_ndNVEN3Tm5U6aeLynUXENOpUyk77aHrScMlIZpLTivZK9maviJaWqZpRRnnv1kAbAo |
| CitedBy_id | crossref_primary_10_1111_exsy_13826 crossref_primary_10_1177_1088467X251348350 crossref_primary_10_1016_j_compeleceng_2025_110131 crossref_primary_10_1016_j_procs_2025_03_304 crossref_primary_10_1145_3631391 crossref_primary_10_1007_s13198_022_01763_6 crossref_primary_10_1016_j_im_2025_104153 crossref_primary_10_1109_ACCESS_2023_3320062 crossref_primary_10_1007_s00500_023_08384_6 crossref_primary_10_7717_peerj_cs_1617 crossref_primary_10_7717_peerj_cs_1966 crossref_primary_10_1111_exsy_13172 crossref_primary_10_7717_peerj_cs_1934 crossref_primary_10_1134_S105466182470072X crossref_primary_10_1109_ACCESS_2023_3310244 crossref_primary_10_3390_app12115720 crossref_primary_10_7717_peerj_cs_3017 crossref_primary_10_1186_s40537_025_01268_6 crossref_primary_10_1109_ACCESS_2024_3470901 crossref_primary_10_1007_s12115_025_01119_3 crossref_primary_10_1080_17538947_2024_2348668 |
| ContentType | Journal Article |
| DBID | DOA |
| DOI | 10.1016/j.jksuci.2021.07.013 |
| DatabaseName | DOAJ Directory of Open Access Journals |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EndPage | 6056 |
| ExternalDocumentID | oai_doaj_org_article_b073900c393149ba98db40c9f2c14123 |
| GroupedDBID | --K 0R~ 4.4 457 5VS AAEDT AAEDW AAIKJ AAJSJ AALRI AASML AAXUO AAYWO ABMAC ACGFS ADBBV ADEZE ADVLN AEXQZ AFGXO AFJKZ AFTJW AGHFR AITUG ALMA_UNASSIGNED_HOLDINGS AMRAJ APXCP BCNDV C6C EBS FDB GROUPED_DOAJ IXB KQ8 M41 O-L O9- OK1 ROL SES SSZ XH2 |
| ID | FETCH-LOGICAL-c418t-dbf07e7e2fbdb3a36356a2d2237a93b41015a525617669e0481152267c26e8e33 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 43 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000862930600005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1319-1578 |
| IngestDate | Mon Nov 03 22:07:12 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 8 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c418t-dbf07e7e2fbdb3a36356a2d2237a93b41015a525617669e0481152267c26e8e33 |
| OpenAccessLink | https://doaj.org/article/b073900c393149ba98db40c9f2c14123 |
| PageCount | 9 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_b073900c393149ba98db40c9f2c14123 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-09-01 |
| PublicationDateYYYYMMDD | 2022-09-01 |
| PublicationDate_xml | – month: 09 year: 2022 text: 2022-09-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of King Saud University. Computer and information sciences |
| PublicationYear | 2022 |
| Publisher | Springer |
| Publisher_xml | – name: Springer |
| SSID | ssj0001765436 |
| Score | 2.482713 |
| Snippet | Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to... |
| SourceID | doaj |
| SourceType | Open Website |
| StartPage | 6048 |
| SubjectTerms | Multilingual Natural language processing Offensive language detection Social media Text classification Transfer learning |
| Title | A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model |
| URI | https://doaj.org/article/b073900c393149ba98db40c9f2c14123 |
| Volume | 34 |
| WOSCitedRecordID | wos000862930600005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV27TsMwFLVQxcDCG_GWB9aIOLbjeCyIiqliAKlb5CcqSGnVJnw_9zqplI2FMbZiJbbje-zccw4hD0ZKiKqBZwawAGxQYBrb0ssMwpFxPngRhE1mE2o-rxYL_Tay-sKcsF4euO-4R4u_kvLccc0BzFujK29F7nQsHBOw7OLqC6hntJlKpysKOZOJWoQsHQbzcsebS8ldX9_bzi1he1iwpN2J7gYjzf4UXGbH5HBAhXTaP80J2QvNKTnaOS7Q4QM8I-spTRmAyCHv4IZVjH0COt2dO1If2pRd1dDeHJpinPIUrtuEUaG5wSnikyK3ZCgG6Ao1ETBn1napMnnknJOP2cv782s2eCZkTrCqzbyNuQoqFNF6yw1H-TlTeAABymhuBby_NBJwDgpD6oBqMQwhmHJFGarA-QWZNKsmXBLKrFOWaSG108IJZ0y0ZeV9qcsgZG6vyBP2WL3uZTFqFKpOBTB89TB89V_Dd_0fjdyQgwJZCSn165ZM2k0X7si--2mX2819mhm_TXC83w |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+multilingual+offensive+language+detection+method+based+on+transfer+learning+from+transformer+fine-tuning+model&rft.jtitle=Journal+of+King+Saud+University.+Computer+and+information+sciences&rft.au=Fatima-zahra+El-Alami&rft.au=Said+Ouatik+El+Alaoui&rft.au=Noureddine+En+Nahnahi&rft.date=2022-09-01&rft.pub=Springer&rft.issn=1319-1578&rft.volume=34&rft.issue=8&rft.spage=6048&rft.epage=6056&rft_id=info:doi/10.1016%2Fj.jksuci.2021.07.013&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_b073900c393149ba98db40c9f2c14123 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1319-1578&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1319-1578&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1319-1578&client=summon |