A Crossmodal Approach to Multimodal Fusion in Video Hyperlinking

With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled multimodal data, recommendation systems and multimodal retrieval systems based on continuous representation spaces and deep learning methods are becoming of great interest. Multimodal representations...

Full description

Saved in:
Bibliographic Details
Published in:IEEE multimedia Vol. 25; no. 2; pp. 11 - 23
Main Authors: Vukotic, Vedran, Raymond, Christian, Gravier, Guillaume
Format: Magazine Article
Language:English
Published: New York IEEE 01.04.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Institute of Electrical and Electronics Engineers
Subjects:
ISSN:1070-986X, 1941-0166
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled multimodal data, recommendation systems and multimodal retrieval systems based on continuous representation spaces and deep learning methods are becoming of great interest. Multimodal representations are typically obtained with autoencoders that reconstruct multimodal data. In this article, we describe an alternative method to perform high-level multimodal fusion that leverages crossmodal translation by means of symmetrical encoders cast into a bidirectional deep neural network (BiDNN). Using the lessons learned from multimodal retrieval, we present a BiDNN-based system that performs video hyperlinking and recommends interesting video segments to a viewer. Results established using TRECVIDs 2016 video hyperlinking benchmarking initiative show that our method obtained the best score, thus defining the state of the art.
AbstractList With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled multimodal data, recommendation systems and multimodal retrieval systems based on continuous representation spaces and deep learning methods are becoming of great interest. Multimodal representations are typically obtained with autoencoders that reconstruct multimodal data. In this article, we describe an alternative method to perform high-level multimodal fusion that leverages crossmodal translation by means of symmetrical encoders cast into a bidirectional deep neural network (BiDNN). Using the lessons learned from multimodal retrieval, we present a BiDNN-based system that performs video hyperlinking and recommends interesting video segments to a viewer. Results established using TRECVIDs 2016 video hyperlinking benchmarking initiative show that our method obtained the best score, thus defining the state of the art.
With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms became very popular for organizing and retrieving large video collections in a task defined as video hyperlinking. Information stored as videos typically contain two modalities, namely an audio and a visual one, that are used conjointly in multimodal systems by undergoing fusion. Multimodal autoencoders have been long used for performing multimodal fusion. In this work, we start by evaluating different initial, single-modal representations for automatic speech transcripts and for video keyframes. We progress to evaluating different autoencoding methods of performing multimodal fusion in an offline setup. The best performing setup is then evaluated in a live setup at TRECVID's 2016 video hyperlinking task. As in offline evaluations, we show that focusing on crossmodal translations as a way of performing multimodal fusion yields improved multimodal representations and that our simple system, trained in an unsupervised manner, with no external information information, defines the new state of the art in a live video hyperlinking setup. We conclude by performing an analysis on data gathered after the live evaluations at TRECVID 2016 and express our thoughts on the overall performance of our proposed system.
Author Raymond, Christian
Vukotic, Vedran
Gravier, Guillaume
Author_xml – sequence: 1
  givenname: Vedran
  surname: Vukotic
  fullname: Vukotic, Vedran
  email: vedran.vukotic@irisa.fr
  organization: INRIA/IRISA Rennes and INSA Rennes
– sequence: 2
  givenname: Christian
  surname: Raymond
  fullname: Raymond, Christian
  email: christian.raymond@irisa.fr
  organization: INRIA/IRISA Rennes and INSA Rennes
– sequence: 3
  givenname: Guillaume
  surname: Gravier
  fullname: Gravier, Guillaume
  email: guillaume.gravier@irisa.fr
  organization: INRIA/IRISA Rennes and CNRS
BackLink https://inria.hal.science/hal-01848539$$DView record in HAL
BookMark eNp9kDtPwzAYRS0EEm3hF3SJxMSQ4lf82IgqSpFSsVDEZjmJQ13SOOSB1H-Po0AHBhbb-nSP9d0zBeeVqwwAcwQXCEF5t9lskwWGSCwgJggjxNAZmCBJUQgRY-f-DTkMpWBvl2DatnsIIWGST8B9HCwb17YHl-syiOu6cTrbBZ0LNn3Z2XG86lvrqsBWwavNjQvWx9o0pa0-bPV-BS4KXbbm-ueege3q4WW5DpPnx6dlnIQZ4awLteFMs0ySHFNtUpEzWnDJI62jiIrCpBGNDPWnyYsUQYjTKJUkLRCDVCCKyQzcjv_udKnqxh50c1ROW7WOEzXMfHsqIiK_huzNmPVtPnvTdmrv-qby6ynvhiOMKR9Sckxlg4DGFCqzne58067RtlQIqkGuGuSqQa46yfUs-cP-rvQ_NR8pa4w5EYJiKjAj3wCxhcY
CODEN IEMUE4
CitedBy_id crossref_primary_10_1007_s13735_019_00173_y
crossref_primary_10_1007_s13735_018_00166_3
crossref_primary_10_1155_2022_3317234
crossref_primary_10_1007_s00779_019_01232_1
Cites_doi 10.1109/CVPRW.2014.131
10.1007/978-3-642-37444-9_34
10.1007/978-3-540-85287-2_2
10.1145/2983563.2983567
10.1145/2911996.2912064
10.1145/2647868.2654902
10.1145/3065386
10.1007/978-3-319-24033-6_29
ContentType Magazine Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
F28
FR3
JQ2
L7M
L~C
L~D
1XC
VOOES
DOI 10.1109/MMUL.2018.023121161
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library (IEL) (UW System Shared)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database


Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1941-0166
EndPage 23
ExternalDocumentID oai:HAL:hal-01848539v2
10_1109_MMUL_2018_023121161
8424826
Genre orig-research
GroupedDBID -~X
.DC
0R~
1OL
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFS
ACIWK
AENEX
AETIX
AFOGA
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
AZLTO
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
E.L
EBS
EJD
HZ~
H~9
ICLAB
IEDLZ
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RZB
TN5
VH1
ZY4
AAYXX
CITATION
7SC
7SP
8FD
F28
FR3
JQ2
L7M
L~C
L~D
RIG
1XC
VOOES
ID FETCH-LOGICAL-c376t-ae76a6c93d24aeb8d64f7975aa5548feb545e4b54edfb1002b5b93bf160481423
IEDL.DBID RIE
ISICitedReferencesCount 7
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000440854300002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1070-986X
IngestDate Tue Oct 14 20:31:51 EDT 2025
Sun Jun 29 15:18:20 EDT 2025
Sat Nov 29 02:46:21 EST 2025
Tue Nov 18 20:56:41 EST 2025
Wed Aug 27 02:40:12 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords deep learning
bidirectional learning
multimodal fusion
tied weights
shared weights
multimodal autoencoders
unsupervised representation learning
video retrieval
neural networks
multimodal retrieval
video hyperlinking
crossmodal
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c376t-ae76a6c93d24aeb8d64f7975aa5548feb545e4b54edfb1002b5b93bf160481423
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-2266-5682
OpenAccessLink https://inria.hal.science/hal-01848539
PQID 2117122472
PQPubID 75746
PageCount 13
ParticipantIDs hal_primary_oai_HAL_hal_01848539v2
crossref_primary_10_1109_MMUL_2018_023121161
ieee_primary_8424826
proquest_journals_2117122472
crossref_citationtrail_10_1109_MMUL_2018_023121161
PublicationCentury 2000
PublicationDate 2018-04-01
PublicationDateYYYYMMDD 2018-04-01
PublicationDate_xml – month: 04
  year: 2018
  text: 2018-04-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE multimedia
PublicationTitleAbbrev MUL-M
PublicationYear 2018
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Institute of Electrical and Electronics Engineers
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
– name: Institute of Electrical and Electronics Engineers
References ref13
eskevich (ref12) 0
ref15
chen (ref9) 0; 86
ref11
ngiam (ref1) 0
ref2
demirdelen (ref4) 0
awad (ref18) 0
ref16
over (ref14) 0
guinaudeau (ref10) 2013
ref7
glorot (ref17) 0
ref6
ref5
awad (ref3) 0
simonyan (ref8) 0
References_xml – ident: ref6
  doi: 10.1109/CVPRW.2014.131
– ident: ref13
  doi: 10.1007/978-3-642-37444-9_34
– volume: 86
  year: 0
  ident: ref9
  article-title: Audio-visual Integration in Multimodal Communication
  publication-title: Proc IEEE
– ident: ref15
  doi: 10.1007/978-3-540-85287-2_2
– ident: ref16
  doi: 10.1145/2983563.2983567
– year: 0
  ident: ref8
  article-title: Very Deep Convolutional Networks for Large-Scale Image Recognition
  publication-title: Proc Int'l Conf Learning Representations (ICLR 15)
– ident: ref11
  doi: 10.1145/2911996.2912064
– ident: ref2
  doi: 10.1145/2647868.2654902
– year: 0
  ident: ref4
  article-title: IRISA at TRECVid 2017: Beyond Crossmodal and Multimodal Models for Video Hyperlinking
  publication-title: Working Notes of the TRECVid 2017 Workshop
– start-page: 249
  year: 0
  ident: ref17
  article-title: Understanding the Difficulty of Training Deep Feedforward Neural Networks
  publication-title: Proc 13th Intl Conf Artificial Intelligence and Statistics (AISTATS 10)
– year: 0
  ident: ref14
  article-title: TRECVID 2014-An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics
  publication-title: Proceedings of TRECVID 2014
– ident: ref7
  doi: 10.1145/3065386
– year: 2013
  ident: ref10
  article-title: HITS and IRISA at MediaEval 2013: Search and Hyperlinking Task
  publication-title: Working Notes Proc of the MediaEval 2013 Workshop
– ident: ref5
  doi: 10.1007/978-3-319-24033-6_29
– start-page: 689
  year: 0
  ident: ref1
  article-title: Multimodal Deep Learning
  publication-title: Proc 28th Int Conf Mach Learn (ICML-11)
– year: 0
  ident: ref3
  article-title: Trecvid 2017: Evaluating Ad-hoc and Instance Video Search, Events Detection, Video Captioning, and Hyperlinking
  publication-title: Proceedings of TRECVID 2017
– year: 0
  ident: ref18
  article-title: TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking
  publication-title: Proceedings of TRECVID 2016
– year: 0
  ident: ref12
  article-title: The Search and Hyperlinking Task at MediaEval2014
  publication-title: Working Notes Proceedings of the Mediaeval 2014 Workshop
SSID ssj0003697
Score 1.2956861
Snippet With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled multimodal data, recommendation systems and multimodal...
With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms became very popular...
SourceID hal
proquest
crossref
ieee
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 11
SubjectTerms Artificial neural networks
Coders
Computer architecture
Computer Science
Computer Vision and Pattern Recognition
crossmodal
deep learning
Hypertext systems
Information Retrieval
Machine learning
Multimedia
multimodal autoencoders
multimodal fusion
multimodal retrieval
Neural and Evolutionary Computing
Neural networks
Recommender systems
Representations
Retrieval
Streaming media
Task analysis
Training
unsupervised representation learning
video hyperlinking
video retrieval
Visualization
Title A Crossmodal Approach to Multimodal Fusion in Video Hyperlinking
URI https://ieeexplore.ieee.org/document/8424826
https://www.proquest.com/docview/2117122472
https://inria.hal.science/hal-01848539
Volume 25
WOSCitedRecordID wos000440854300002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fT9swED5RxAN76fillTFkoT0SSBwT22-rEFUfAPEwpr5FtnPRKrEGlZa_n7vEjTYxTdpLFFnnyM539p3t83cAXwMb5aLOkxQzTMhC-MRJFRJjbCCNCinK0Cab0Pf3ZjazD1tw3t-FQcQ2-Awv-LU9y6-asOatskujpCJ3eAADrYvurlY_6-YxkQqpcGJNMYsMQ1lqL-_uHm85iMtcMNkZrXiK7A8rNPjJMZBtcpV3M3JrZibD_2vgRxhu6KHFuFOAPdjCxT4MN7kaRBy6-_DhN-LBA_g2FtfcsF9N5Z7EOPKKi1Uj2gu5XfFkzTtpYr4QP-YVNmJKS9ZlzLVwCI-Tm-_X0yTmUkgCTSGrxKEuXBFsXknl0JuqULW2-so58idMjZ48KVT0xKr2TMvqr7zNfZ0VTChDPtcRbC-aBX4CkZvUkZuZS-2VCjJzlTdZ7ZC6ToMdcQRy82_LEInGOd_FU9kuOFJbMiAlA1L2gIzgvK_03PFs_Fv8jEDrJZkjezq-LbmM5BT5IPZVjuCAIeqlIjojONlgXMYB-1LSVzUfMmp5_Pdan2GXm9AF7ZzA9mq5xi-wE15X85flaauLbx5a2E0
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Rb9MwED5tA4nxUtiYKGzDQjwuW-J4if1GNVEV0VY8bKhvlu1ctEpbg7p2v393qRuBQEi8RJF1juzc2Xe2z98H8CmwUy7qPEkxw4Q8hE-cVCHR2gSyqJCiDC3ZRDmd6tnMfN-Bs-4uDCK2yWd4zq_tWX7VhDVvlV1oJRWFw7vwjJmz4m2tbt7NI5UKGXFidDGLGENZai4mk5sxp3Hpc4Y7ozVPkf3mh3ZvOQuypVf5Y05uHc2w939NfAW9LUC0GGxM4DXs4OIAelu2BhEH7wG8_AV68BA-D8QVN-y-qdydGERkcbFqRHsld1M8XPNempgvxI95hY0Y0aJ1GdkW3sDN8Mv11SiJbApJoElklTgsC1cEk1dSOfS6KlRdmvLSOYoodI2eYilU9MSq9gzM6i-9yX2dFQwpQ1HXEewtmgW-BZHr1FGgmcvSKxVk5iqvs9ohdZ2GO2If5Pbf2hChxpnx4s62S47UWFaIZYXYTiF9OOsq_dwgbfxb_CMprZNklOzRYGy5jOQURSHmUfbhkFXUSUXt9OF4q2Mbh-yDpa-WfMxYynd_r_UBXoyuJ2M7_jr99h72uTmbFJ5j2Fst13gCz8Pjav6wPG3t8gmOXtuW
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Crossmodal+Approach+to+Multimodal+Fusion+in+Video+Hyperlinking&rft.jtitle=IEEE+multimedia&rft.au=Vukotic%2C+Vedran&rft.au=Raymond%2C+Christian&rft.au=Gravier%2C+Guillaume&rft.date=2018-04-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1070-986X&rft.eissn=1941-0166&rft.volume=25&rft.issue=2&rft.spage=11&rft_id=info:doi/10.1109%2FMMUL.2018.023121161&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1070-986X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1070-986X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1070-986X&client=summon