A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model

Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to discriminate offensive content. Moreover, social media users are from linguistically different communities. This study aims to tackle the Mul...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of King Saud University. Computer and information sciences Jg. 34; H. 8; S. 6048 - 6056
Hauptverfasser: Fatima-zahra El-Alami, Said Ouatik El Alaoui, Noureddine En Nahnahi
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Springer 01.09.2022
Schlagworte:
ISSN:1319-1578
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to discriminate offensive content. Moreover, social media users are from linguistically different communities. This study aims to tackle the Multilingual Offensive Language Detection (MOLD) task using transfer learning models and the fine-tuning phase. We propose an effective approach based on the Bidirectional Encoder Representations from Transformers (BERT) that has shown great potential in capturing the semantics and contextual information within texts. The proposed system consists of several stages: (1) Preprocessing, (2) Text representation using BERT models, and (3) Classification into two categories: Offensive and non-offensive. To handle multilingualism, we explore different techniques such as the joint-multilingual and translation-based ones. The first consists in developing one classification system for different languages, and the second involves the translation phase to transform all texts into one universal language then classify them. We conduct several experiments on a bilingual dataset extracted from the Semi-supervised Offensive Language Identification Dataset (SOLID). The experimental findings show that the translation-based method in conjunction with Arabic BERT (AraBERT) achieves over 93% and 91% in terms of F1-score and accuracy, respectively.
AbstractList Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to discriminate offensive content. Moreover, social media users are from linguistically different communities. This study aims to tackle the Multilingual Offensive Language Detection (MOLD) task using transfer learning models and the fine-tuning phase. We propose an effective approach based on the Bidirectional Encoder Representations from Transformers (BERT) that has shown great potential in capturing the semantics and contextual information within texts. The proposed system consists of several stages: (1) Preprocessing, (2) Text representation using BERT models, and (3) Classification into two categories: Offensive and non-offensive. To handle multilingualism, we explore different techniques such as the joint-multilingual and translation-based ones. The first consists in developing one classification system for different languages, and the second involves the translation phase to transform all texts into one universal language then classify them. We conduct several experiments on a bilingual dataset extracted from the Semi-supervised Offensive Language Identification Dataset (SOLID). The experimental findings show that the translation-based method in conjunction with Arabic BERT (AraBERT) achieves over 93% and 91% in terms of F1-score and accuracy, respectively.
Author Said Ouatik El Alaoui
Fatima-zahra El-Alami
Noureddine En Nahnahi
Author_xml – sequence: 1
  fullname: Fatima-zahra El-Alami
  organization: Laboratory of Informatics, Signals, Automatic and Cognitivism, FSDM, Sidi Mohamed Ben Abdellah University, Fez, Morocco; Corresponding author
– sequence: 2
  fullname: Said Ouatik El Alaoui
  organization: Laboratory of Informatics, Signals, Automatic and Cognitivism, FSDM, Sidi Mohamed Ben Abdellah University, Fez, Morocco; Engineering Sciences Laboratory, National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco
– sequence: 3
  fullname: Noureddine En Nahnahi
  organization: Laboratory of Informatics, Signals, Automatic and Cognitivism, FSDM, Sidi Mohamed Ben Abdellah University, Fez, Morocco
BookMark eNotj8tqwzAURLVIoWmaP-hCP2BXD1u2liH0EQh0066NLF2lcmUpyHKhf183zWqYc2Bg7tAqxAAIPVBSUkLF41AOX9OsXckIoyVpSkL5Cq0pp7KgddPeou00DYQQ2oi64mKNzjs8zj4778JpVh5HayFM7huwV3_kBNhABp1dDHiE_BkN7tUEBi89JxUmCwl7UCksC9imOF5xTONirAtQ5Pkix2jA36Mbq_wE22tu0Mfz0_v-tTi-vRz2u2OhK9rmwvSWNNAAs73pueKC10IxwxhvlOR9tdytVc1qsTwREkjVUlozJhrNBLTA-QYd_ndNVEN3Tm5U6aeLynUXENOpUyk77aHrScMlIZpLTivZK9maviJaWqZpRRnnv1kAbAo
CitedBy_id crossref_primary_10_1111_exsy_13826
crossref_primary_10_1177_1088467X251348350
crossref_primary_10_1016_j_compeleceng_2025_110131
crossref_primary_10_1016_j_procs_2025_03_304
crossref_primary_10_1145_3631391
crossref_primary_10_1007_s13198_022_01763_6
crossref_primary_10_1016_j_im_2025_104153
crossref_primary_10_1109_ACCESS_2023_3320062
crossref_primary_10_1007_s00500_023_08384_6
crossref_primary_10_7717_peerj_cs_1617
crossref_primary_10_7717_peerj_cs_1966
crossref_primary_10_1111_exsy_13172
crossref_primary_10_7717_peerj_cs_1934
crossref_primary_10_1134_S105466182470072X
crossref_primary_10_1109_ACCESS_2023_3310244
crossref_primary_10_3390_app12115720
crossref_primary_10_7717_peerj_cs_3017
crossref_primary_10_1186_s40537_025_01268_6
crossref_primary_10_1109_ACCESS_2024_3470901
crossref_primary_10_1007_s12115_025_01119_3
crossref_primary_10_1080_17538947_2024_2348668
ContentType Journal Article
DBID DOA
DOI 10.1016/j.jksuci.2021.07.013
DatabaseName DOAJ Directory of Open Access Journals
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 6056
ExternalDocumentID oai_doaj_org_article_b073900c393149ba98db40c9f2c14123
GroupedDBID --K
0R~
4.4
457
5VS
AAEDT
AAEDW
AAIKJ
AAJSJ
AALRI
AASML
AAXUO
AAYWO
ABMAC
ACGFS
ADBBV
ADEZE
ADVLN
AEXQZ
AFGXO
AFJKZ
AFTJW
AGHFR
AITUG
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
APXCP
BCNDV
C6C
EBS
FDB
GROUPED_DOAJ
IXB
KQ8
M41
O-L
O9-
OK1
ROL
SES
SSZ
XH2
ID FETCH-LOGICAL-c418t-dbf07e7e2fbdb3a36356a2d2237a93b41015a525617669e0481152267c26e8e33
IEDL.DBID DOA
ISICitedReferencesCount 43
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000862930600005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1319-1578
IngestDate Mon Nov 03 22:07:12 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 8
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c418t-dbf07e7e2fbdb3a36356a2d2237a93b41015a525617669e0481152267c26e8e33
OpenAccessLink https://doaj.org/article/b073900c393149ba98db40c9f2c14123
PageCount 9
ParticipantIDs doaj_primary_oai_doaj_org_article_b073900c393149ba98db40c9f2c14123
PublicationCentury 2000
PublicationDate 2022-09-01
PublicationDateYYYYMMDD 2022-09-01
PublicationDate_xml – month: 09
  year: 2022
  text: 2022-09-01
  day: 01
PublicationDecade 2020
PublicationTitle Journal of King Saud University. Computer and information sciences
PublicationYear 2022
Publisher Springer
Publisher_xml – name: Springer
SSID ssj0001765436
Score 2.482713
Snippet Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to...
SourceID doaj
SourceType Open Website
StartPage 6048
SubjectTerms Multilingual
Natural language processing
Offensive language detection
Social media
Text classification
Transfer learning
Title A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model
URI https://doaj.org/article/b073900c393149ba98db40c9f2c14123
Volume 34
WOSCitedRecordID wos000862930600005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV27TsMwFLVQxcDCG_GWB9aIOLbjeCyIiqliAKlb5CcqSGnVJnw_9zqplI2FMbZiJbbje-zccw4hD0ZKiKqBZwawAGxQYBrb0ssMwpFxPngRhE1mE2o-rxYL_Tay-sKcsF4euO-4R4u_kvLccc0BzFujK29F7nQsHBOw7OLqC6hntJlKpysKOZOJWoQsHQbzcsebS8ldX9_bzi1he1iwpN2J7gYjzf4UXGbH5HBAhXTaP80J2QvNKTnaOS7Q4QM8I-spTRmAyCHv4IZVjH0COt2dO1If2pRd1dDeHJpinPIUrtuEUaG5wSnikyK3ZCgG6Ao1ETBn1napMnnknJOP2cv782s2eCZkTrCqzbyNuQoqFNF6yw1H-TlTeAABymhuBby_NBJwDgpD6oBqMQwhmHJFGarA-QWZNKsmXBLKrFOWaSG108IJZ0y0ZeV9qcsgZG6vyBP2WL3uZTFqFKpOBTB89TB89V_Dd_0fjdyQgwJZCSn165ZM2k0X7si--2mX2819mhm_TXC83w
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+multilingual+offensive+language+detection+method+based+on+transfer+learning+from+transformer+fine-tuning+model&rft.jtitle=Journal+of+King+Saud+University.+Computer+and+information+sciences&rft.au=Fatima-zahra+El-Alami&rft.au=Said+Ouatik+El+Alaoui&rft.au=Noureddine+En+Nahnahi&rft.date=2022-09-01&rft.pub=Springer&rft.issn=1319-1578&rft.volume=34&rft.issue=8&rft.spage=6048&rft.epage=6056&rft_id=info:doi/10.1016%2Fj.jksuci.2021.07.013&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_b073900c393149ba98db40c9f2c14123
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1319-1578&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1319-1578&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1319-1578&client=summon