Multi-lingual opinion mining on YouTube

•We designed the first model for effectively carrying out opinion mining on YouTube comments.•We propose kernel methods applied to a robust shallow syntactic structure, which improves accuracy for both languages.•Our approach greatly outperforms other basic models on cross-domain settings.•We create...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Information processing & management Ročník 52; číslo 1; s. 46 - 60
Hlavní autoři:	Severyn, Aliaksei, Moschitti, Alessandro, Uryupina, Olga, Plank, Barbara, Filippova, Katja
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Oxford Elsevier Ltd 01.01.2016 Elsevier Science Ltd
Témata:	Classifiers Data mining Domains English language Feature extraction Internet Italian language Language processing Languages Mass media Mining Mining industry Natural Language Processing Opinion mining Polarity Programming languages Sentiment analysis Social media Studies Syntactic structures Syntax User generated content Websites Social media Natural Language Processing Opinion mining
ISSN:	0306-4573, 1873-5371
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	•We designed the first model for effectively carrying out opinion mining on YouTube comments.•We propose kernel methods applied to a robust shallow syntactic structure, which improves accuracy for both languages.•Our approach greatly outperforms other basic models on cross-domain settings.•We created a YouTube corpus (in Italian and English) and made it available for the research community.•Comments must be classified in subcategories to make opinion mining effective on YouTube. In order to successfully apply opinion mining (OM) to the large amounts of user-generated content produced every day, we need robust models that can handle the noisy input well yet can easily be adapted to a new domain or language. We here focus on opinion mining for YouTube by (i) modeling classifiers that predict the type of a comment and its polarity, while distinguishing whether the polarity is directed towards the product or video; (ii) proposing a robust shallow syntactic structure (STRUCT) that adapts well when tested across domains; and (iii) evaluating the effectiveness on the proposed structure on two languages, English and Italian. We rely on tree kernels to automatically extract and learn features with better generalization power than traditionally used bag-of-word models. Our extensive empirical evaluation shows that (i) STRUCT outperforms the bag-of-words model both within the same domain (up to 2.6% and 3% of absolute improvement for Italian and English, respectively); (ii) it is particularly useful when tested across domains (up to more than 4% absolute improvement for both languages), especially when little training data is available (up to 10% absolute improvement) and (iii) the proposed structure is also effective in a lower-resource language scenario, where only less accurate linguistic processing tools are available.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0306-4573 1873-5371
DOI:	10.1016/j.ipm.2015.03.002