Integrating implicit and explicit linguistic phenomena via multi-task learning for offensive language detection

The analysis and detection of offensive content in textual information have become a great challenge for the Natural Language Processing community. Most of the research conducted so far on offensive language detection have addressed this task as a sole optimization objective. However, other linguist...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Knowledge-based systems Ročník 258; s. 109965
Hlavní autori: Plaza-del-Arco, Flor Miriam, Molina-González, M. Dolores, Ureña-López, L. Alfonso, Martín-Valdivia, María-Teresa
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 22.12.2022
Predmet:
ISSN:0950-7051, 1872-7409
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:The analysis and detection of offensive content in textual information have become a great challenge for the Natural Language Processing community. Most of the research conducted so far on offensive language detection have addressed this task as a sole optimization objective. However, other linguistic phenomena that are arguably correlated with offensive language and therefore could be beneficial to recognize this type of problematic content on the Web, have not been explored in depth so far. Thus, the goal of this study is to investigate whether explicit and implicit concepts involved in the expression of offensive language help in the detection of this phenomenon and how to incorporate these concepts in a computational system. We propose a multi-task learning approach that includes such concepts according to the relevance shown by a feature selection method called mutual information. Our experiments show that some phenomena such as constructiveness, target group and person, figurative language (sarcasm and mockery), insults, improper language, and emotions combined together help to optimize the offensive language detection task, outperforming a state-of-the-art method (the transformer BETO) that we use as our baseline to compare the results. •Addressing offensive language detection for Spanish texts.•Studying implicit and explicit linguistic phenomena for offensive language.•Assessing the impact of including phenomena via multi-task learning.•Performance comparison of multi-task learning models with a well-known Transformer.•Analyzing the knowledge transfer of the explored phenomena.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2022.109965