A Crossmodal Approach to Multimodal Fusion in Video Hyperlinking

With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled multimodal data, recommendation systems and multimodal retrieval systems based on continuous representation spaces and deep learning methods are becoming of great interest. Multimodal representations...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	IEEE multimedia Ročník 25; číslo 2; s. 11 - 23
Hlavní autoři:	Vukotic, Vedran, Raymond, Christian, Gravier, Guillaume
Médium:	Magazine Article
Jazyk:	angličtina
Vydáno:	New York IEEE 01.04.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Institute of Electrical and Electronics Engineers
Témata:	Artificial neural networks Coders Computer architecture Computer Science Computer Vision and Pattern Recognition crossmodal deep learning Hypertext systems Information Retrieval Machine learning Multimedia multimodal autoencoders multimodal fusion multimodal retrieval Neural and Evolutionary Computing Neural networks Recommender systems Representations Retrieval Streaming media Task analysis Training unsupervised representation learning video hyperlinking video retrieval Visualization deep learning bidirectional learning multimodal fusion tied weights shared weights multimodal autoencoders unsupervised representation learning video retrieval neural networks multimodal retrieval video hyperlinking crossmodal
ISSN:	1070-986X, 1941-0166
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled multimodal data, recommendation systems and multimodal retrieval systems based on continuous representation spaces and deep learning methods are becoming of great interest. Multimodal representations are typically obtained with autoencoders that reconstruct multimodal data. In this article, we describe an alternative method to perform high-level multimodal fusion that leverages crossmodal translation by means of symmetrical encoders cast into a bidirectional deep neural network (BiDNN). Using the lessons learned from multimodal retrieval, we present a BiDNN-based system that performs video hyperlinking and recommends interesting video segments to a viewer. Results established using TRECVIDs 2016 video hyperlinking benchmarking initiative show that our method obtained the best score, thus defining the state of the art.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1070-986X 1941-0166
DOI:	10.1109/MMUL.2018.023121161