Integrating implicit and explicit linguistic phenomena via multi-task learning for offensive language detection

The analysis and detection of offensive content in textual information have become a great challenge for the Natural Language Processing community. Most of the research conducted so far on offensive language detection have addressed this task as a sole optimization objective. However, other linguist...

Full description

Saved in:
Bibliographic Details
Published in:Knowledge-based systems Vol. 258; p. 109965
Main Authors: Plaza-del-Arco, Flor Miriam, Molina-González, M. Dolores, Ureña-López, L. Alfonso, Martín-Valdivia, María-Teresa
Format: Journal Article
Language:English
Published: Elsevier B.V 22.12.2022
Subjects:
ISSN:0950-7051, 1872-7409
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The analysis and detection of offensive content in textual information have become a great challenge for the Natural Language Processing community. Most of the research conducted so far on offensive language detection have addressed this task as a sole optimization objective. However, other linguistic phenomena that are arguably correlated with offensive language and therefore could be beneficial to recognize this type of problematic content on the Web, have not been explored in depth so far. Thus, the goal of this study is to investigate whether explicit and implicit concepts involved in the expression of offensive language help in the detection of this phenomenon and how to incorporate these concepts in a computational system. We propose a multi-task learning approach that includes such concepts according to the relevance shown by a feature selection method called mutual information. Our experiments show that some phenomena such as constructiveness, target group and person, figurative language (sarcasm and mockery), insults, improper language, and emotions combined together help to optimize the offensive language detection task, outperforming a state-of-the-art method (the transformer BETO) that we use as our baseline to compare the results. •Addressing offensive language detection for Spanish texts.•Studying implicit and explicit linguistic phenomena for offensive language.•Assessing the impact of including phenomena via multi-task learning.•Performance comparison of multi-task learning models with a well-known Transformer.•Analyzing the knowledge transfer of the explored phenomena.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2022.109965