Cyberbullying Detection Based on Semantic-Enhanced Marginalized Denoising Auto-Encoder

As a side effect of increasingly popular social media, cyberbullying has emerged as a serious problem afflicting children, adolescents and young adults. Machine learning techniques make automatic detection of bullying messages in social media possible, and this could help to construct a healthy and...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE transactions on affective computing Ročník 8; číslo 3; s. 328 - 339
Hlavní autori:	Zhao, Rui, Mao, Kezhi
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Piscataway IEEE 01.07.2017 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	Adolescents Adults Analytical models Bullying Children Cyberbullying Cyberbullying detection Digital media Feature extraction Machine learning Mathematical models Media Messages Noise reduction Numerical models representation learning Representations Robustness Robustness (mathematics) Semantics Short message service Social networks stacked denoising autoencoders Teaching methods text mining word embedding
ISSN:	1949-3045, 1949-3045
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	As a side effect of increasingly popular social media, cyberbullying has emerged as a serious problem afflicting children, adolescents and young adults. Machine learning techniques make automatic detection of bullying messages in social media possible, and this could help to construct a healthy and safe social media environment. In this meaningful research area, one critical issue is robust and discriminative numerical representation learning of text messages. In this paper, we propose a new representation learning method to tackle this problem. Our method named semantic-enhanced marginalized denoising auto-encoder (smSDA) is developed via semantic extension of the popular deep learning model stacked denoising autoencoder (SDA). The semantic extension consists of semantic dropout noise and sparsity constraints, where the semantic dropout noise is designed based on domain knowledge and the word embedding technique. Our proposed method is able to exploit the hidden feature structure of bullying information and learn a robust and discriminative representation of text. Comprehensive experiments on two public cyberbullying corpora (Twitter and MySpace) are conducted, and the results show that our proposed approaches outperform other baseline text representation learning methods.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1949-3045 1949-3045
DOI:	10.1109/TAFFC.2016.2531682