A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model

Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to discriminate offensive content. Moreover, social media users are from linguistically different communities. This study aims to tackle the Mul...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of King Saud University. Computer and information sciences Jg. 34; H. 8; S. 6048 - 6056
Hauptverfasser:	Fatima-zahra El-Alami, Said Ouatik El Alaoui, Noureddine En Nahnahi
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Springer 01.09.2022
Schlagworte:	Multilingual Natural language processing Offensive language detection Social media Text classification Transfer learning
ISSN:	1319-1578
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Abstract	Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to discriminate offensive content. Moreover, social media users are from linguistically different communities. This study aims to tackle the Multilingual Offensive Language Detection (MOLD) task using transfer learning models and the fine-tuning phase. We propose an effective approach based on the Bidirectional Encoder Representations from Transformers (BERT) that has shown great potential in capturing the semantics and contextual information within texts. The proposed system consists of several stages: (1) Preprocessing, (2) Text representation using BERT models, and (3) Classification into two categories: Offensive and non-offensive. To handle multilingualism, we explore different techniques such as the joint-multilingual and translation-based ones. The first consists in developing one classification system for different languages, and the second involves the translation phase to transform all texts into one universal language then classify them. We conduct several experiments on a bilingual dataset extracted from the Semi-supervised Offensive Language Identification Dataset (SOLID). The experimental findings show that the translation-based method in conjunction with Arabic BERT (AraBERT) achieves over 93% and 91% in terms of F1-score and accuracy, respectively.
AbstractList	Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to discriminate offensive content. Moreover, social media users are from linguistically different communities. This study aims to tackle the Multilingual Offensive Language Detection (MOLD) task using transfer learning models and the fine-tuning phase. We propose an effective approach based on the Bidirectional Encoder Representations from Transformers (BERT) that has shown great potential in capturing the semantics and contextual information within texts. The proposed system consists of several stages: (1) Preprocessing, (2) Text representation using BERT models, and (3) Classification into two categories: Offensive and non-offensive. To handle multilingualism, we explore different techniques such as the joint-multilingual and translation-based ones. The first consists in developing one classification system for different languages, and the second involves the translation phase to transform all texts into one universal language then classify them. We conduct several experiments on a bilingual dataset extracted from the Semi-supervised Offensive Language Identification Dataset (SOLID). The experimental findings show that the translation-based method in conjunction with Arabic BERT (AraBERT) achieves over 93% and 91% in terms of F1-score and accuracy, respectively.
Author	Said Ouatik El Alaoui Fatima-zahra El-Alami Noureddine En Nahnahi
Author_xml	– sequence: 1 fullname: Fatima-zahra El-Alami organization: Laboratory of Informatics, Signals, Automatic and Cognitivism, FSDM, Sidi Mohamed Ben Abdellah University, Fez, Morocco; Corresponding author – sequence: 2 fullname: Said Ouatik El Alaoui organization: Laboratory of Informatics, Signals, Automatic and Cognitivism, FSDM, Sidi Mohamed Ben Abdellah University, Fez, Morocco; Engineering Sciences Laboratory, National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco – sequence: 3 fullname: Noureddine En Nahnahi organization: Laboratory of Informatics, Signals, Automatic and Cognitivism, FSDM, Sidi Mohamed Ben Abdellah University, Fez, Morocco
BookMark	eNotj8tqwzAURLVIoWmaP-hCP2BXD1u2liH0EQh0066NLF2lcmUpyHKhf183zWqYc2Bg7tAqxAAIPVBSUkLF41AOX9OsXckIoyVpSkL5Cq0pp7KgddPeou00DYQQ2oi64mKNzjs8zj4778JpVh5HayFM7huwV3_kBNhABp1dDHiE_BkN7tUEBi89JxUmCwl7UCksC9imOF5xTONirAtQ5Pkix2jA36Mbq_wE22tu0Mfz0_v-tTi-vRz2u2OhK9rmwvSWNNAAs73pueKC10IxwxhvlOR9tdytVc1qsTwREkjVUlozJhrNBLTA-QYd_ndNVEN3Tm5U6aeLynUXENOpUyk77aHrScMlIZpLTivZK9maviJaWqZpRRnnv1kAbAo
CitedBy_id	crossref_primary_10_1111_exsy_13826 crossref_primary_10_1177_1088467X251348350 crossref_primary_10_1016_j_compeleceng_2025_110131 crossref_primary_10_1016_j_procs_2025_03_304 crossref_primary_10_1145_3631391 crossref_primary_10_1007_s13198_022_01763_6 crossref_primary_10_1016_j_im_2025_104153 crossref_primary_10_1109_ACCESS_2023_3320062 crossref_primary_10_1007_s00500_023_08384_6 crossref_primary_10_7717_peerj_cs_1617 crossref_primary_10_7717_peerj_cs_1966 crossref_primary_10_1111_exsy_13172 crossref_primary_10_7717_peerj_cs_1934 crossref_primary_10_1134_S105466182470072X crossref_primary_10_1109_ACCESS_2023_3310244 crossref_primary_10_3390_app12115720 crossref_primary_10_7717_peerj_cs_3017 crossref_primary_10_1186_s40537_025_01268_6 crossref_primary_10_1109_ACCESS_2024_3470901 crossref_primary_10_1007_s12115_025_01119_3 crossref_primary_10_1080_17538947_2024_2348668
ContentType	Journal Article
DBID	DOA
DOI	10.1016/j.jksuci.2021.07.013
DatabaseName	DOAJ Directory of Open Access Journals
DatabaseTitleList
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EndPage	6056
ExternalDocumentID	oai_doaj_org_article_b073900c393149ba98db40c9f2c14123
GroupedDBID	--K 0R~ 4.4 457 5VS AAEDT AAEDW AAIKJ AAJSJ AALRI AASML AAXUO AAYWO ABMAC ACGFS ADBBV ADEZE ADVLN AEXQZ AFGXO AFJKZ AFTJW AGHFR AITUG ALMA_UNASSIGNED_HOLDINGS AMRAJ APXCP BCNDV C6C EBS FDB GROUPED_DOAJ IXB KQ8 M41 O-L O9- OK1 ROL SES SSZ XH2
ID	FETCH-LOGICAL-c418t-dbf07e7e2fbdb3a36356a2d2237a93b41015a525617669e0481152267c26e8e33
IEDL.DBID	DOA
ISICitedReferencesCount	43
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000862930600005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	1319-1578
IngestDate	Mon Nov 03 22:07:12 EST 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	8
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c418t-dbf07e7e2fbdb3a36356a2d2237a93b41015a525617669e0481152267c26e8e33
OpenAccessLink	https://doaj.org/article/b073900c393149ba98db40c9f2c14123
PageCount	9
ParticipantIDs	doaj_primary_oai_doaj_org_article_b073900c393149ba98db40c9f2c14123
PublicationCentury	2000
PublicationDate	2022-09-01
PublicationDateYYYYMMDD	2022-09-01
PublicationDate_xml	– month: 09 year: 2022 text: 2022-09-01 day: 01
PublicationDecade	2020
PublicationTitle	Journal of King Saud University. Computer and information sciences
PublicationYear	2022
Publisher	Springer
Publisher_xml	– name: Springer
SSID	ssj0001765436
Score	2.482713
Snippet	Offensive communications have invaded social media content. One of the most effective solutions to cope with this problem is using computational techniques to...
SourceID	doaj
SourceType	Open Website
StartPage	6048
SubjectTerms	Multilingual Natural language processing Offensive language detection Social media Text classification Transfer learning
Title	A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model
URI	https://doaj.org/article/b073900c393149ba98db40c9f2c14123
Volume	34
WOSCitedRecordID	wos000862930600005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV27TsMwFLVQxcDCG_GWB9aIOLbjeCyIiqliAKlb5CcqSGnVJnw_9zqplI2FMbZiJbbje-zccw4hD0ZKiKqBZwawAGxQYBrb0ssMwpFxPngRhE1mE2o-rxYL_Tay-sKcsF4euO-4R4u_kvLccc0BzFujK29F7nQsHBOw7OLqC6hntJlKpysKOZOJWoQsHQbzcsebS8ldX9_bzi1he1iwpN2J7gYjzf4UXGbH5HBAhXTaP80J2QvNKTnaOS7Q4QM8I-spTRmAyCHv4IZVjH0COt2dO1If2pRd1dDeHJpinPIUrtuEUaG5wSnikyK3ZCgG6Ao1ETBn1napMnnknJOP2cv782s2eCZkTrCqzbyNuQoqFNF6yw1H-TlTeAABymhuBby_NBJwDgpD6oBqMQwhmHJFGarA-QWZNKsmXBLKrFOWaSG108IJZ0y0ZeV9qcsgZG6vyBP2WL3uZTFqFKpOBTB89TB89V_Dd_0fjdyQgwJZCSn165ZM2k0X7si--2mX2819mhm_TXC83w
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+multilingual+offensive+language+detection+method+based+on+transfer+learning+from+transformer+fine-tuning+model&rft.jtitle=Journal+of+King+Saud+University.+Computer+and+information+sciences&rft.au=Fatima-zahra+El-Alami&rft.au=Said+Ouatik+El+Alaoui&rft.au=Noureddine+En+Nahnahi&rft.date=2022-09-01&rft.pub=Springer&rft.issn=1319-1578&rft.volume=34&rft.issue=8&rft.spage=6048&rft.epage=6056&rft_id=info:doi/10.1016%2Fj.jksuci.2021.07.013&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_b073900c393149ba98db40c9f2c14123
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1319-1578&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1319-1578&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1319-1578&client=summon