DualGNN: Dual Graph Neural Network for Multimedia Recommendation

One of the important factors affecting micro-video recommender systems is to model the multi-modal user preference on the micro-video. Despite the remarkable performance of prior arts, they are still limited by fusing the user preference derived from different modalities in a unified manner, ignorin...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on multimedia Vol. 25; pp. 1074 - 1084
Main Authors:	Wang, Qifan, Wei, Yinwei, Yin, Jianhua, Wu, Jianlong, Song, Xuemeng, Nie, Liqiang
Format:	Journal Article
Language:	English
Published:	Piscataway IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Acoustics graph neural network Graph neural networks Learning Micro-video recommender systems Modules multi-modal fusion Multimedia Neural networks Preferences Recommender systems Representation learning Representations Task analysis Video Videos Visualization
ISSN:	1520-9210, 1941-0077
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	One of the important factors affecting micro-video recommender systems is to model the multi-modal user preference on the micro-video. Despite the remarkable performance of prior arts, they are still limited by fusing the user preference derived from different modalities in a unified manner, ignoring the users tend to place different emphasis on different modalities. Furthermore, modality-missing is ubiquity and unavoidable in the micro-video recommendation, some modalities information of micro-videos are lacked in many cases, which negatively affects the multi-modal fusion operations. To overcome these disadvantages, we propose a novel framework for the micro-video recommendation, dubbed Dual Graph Neural Network (DualGNN), upon the user-microvideo bipartite and user co-occurrence graphs, which leverages the correlation between users to collaboratively mine the particular fusion pattern for each user. Specifically, we first introduce a single-modal representation learning module, which performs graph operations on the user-microvideo graph in each modality to capture single-modal user preferences on different modalities. And then, we devise a multi-modal representation learning module to explicitly model the user's attentions over different modalities and inductively learn the multi-modal user preference. Finally, we propose a prediction module to rank the potential micro-videos for users. Extensive experiments on two public datasets demonstrate the significant superiority of our DualGNN over state-of-the-arts methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2021.3138298