A Crossmodal Approach to Multimodal Fusion in Video Hyperlinking
With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled multimodal data, recommendation systems and multimodal retrieval systems based on continuous representation spaces and deep learning methods are becoming of great interest. Multimodal representations...
Saved in:
| Published in: | IEEE multimedia Vol. 25; no. 2; pp. 11 - 23 |
|---|---|
| Main Authors: | , , |
| Format: | Magazine Article |
| Language: | English |
| Published: |
New York
IEEE
01.04.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Institute of Electrical and Electronics Engineers |
| Subjects: | |
| ISSN: | 1070-986X, 1941-0166 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled multimodal data, recommendation systems and multimodal retrieval systems based on continuous representation spaces and deep learning methods are becoming of great interest. Multimodal representations are typically obtained with autoencoders that reconstruct multimodal data. In this article, we describe an alternative method to perform high-level multimodal fusion that leverages crossmodal translation by means of symmetrical encoders cast into a bidirectional deep neural network (BiDNN). Using the lessons learned from multimodal retrieval, we present a BiDNN-based system that performs video hyperlinking and recommends interesting video segments to a viewer. Results established using TRECVIDs 2016 video hyperlinking benchmarking initiative show that our method obtained the best score, thus defining the state of the art. |
|---|---|
| AbstractList | With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled multimodal data, recommendation systems and multimodal retrieval systems based on continuous representation spaces and deep learning methods are becoming of great interest. Multimodal representations are typically obtained with autoencoders that reconstruct multimodal data. In this article, we describe an alternative method to perform high-level multimodal fusion that leverages crossmodal translation by means of symmetrical encoders cast into a bidirectional deep neural network (BiDNN). Using the lessons learned from multimodal retrieval, we present a BiDNN-based system that performs video hyperlinking and recommends interesting video segments to a viewer. Results established using TRECVIDs 2016 video hyperlinking benchmarking initiative show that our method obtained the best score, thus defining the state of the art. With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms became very popular for organizing and retrieving large video collections in a task defined as video hyperlinking. Information stored as videos typically contain two modalities, namely an audio and a visual one, that are used conjointly in multimodal systems by undergoing fusion. Multimodal autoencoders have been long used for performing multimodal fusion. In this work, we start by evaluating different initial, single-modal representations for automatic speech transcripts and for video keyframes. We progress to evaluating different autoencoding methods of performing multimodal fusion in an offline setup. The best performing setup is then evaluated in a live setup at TRECVID's 2016 video hyperlinking task. As in offline evaluations, we show that focusing on crossmodal translations as a way of performing multimodal fusion yields improved multimodal representations and that our simple system, trained in an unsupervised manner, with no external information information, defines the new state of the art in a live video hyperlinking setup. We conclude by performing an analysis on data gathered after the live evaluations at TRECVID 2016 and express our thoughts on the overall performance of our proposed system. |
| Author | Raymond, Christian Vukotic, Vedran Gravier, Guillaume |
| Author_xml | – sequence: 1 givenname: Vedran surname: Vukotic fullname: Vukotic, Vedran email: vedran.vukotic@irisa.fr organization: INRIA/IRISA Rennes and INSA Rennes – sequence: 2 givenname: Christian surname: Raymond fullname: Raymond, Christian email: christian.raymond@irisa.fr organization: INRIA/IRISA Rennes and INSA Rennes – sequence: 3 givenname: Guillaume surname: Gravier fullname: Gravier, Guillaume email: guillaume.gravier@irisa.fr organization: INRIA/IRISA Rennes and CNRS |
| BackLink | https://inria.hal.science/hal-01848539$$DView record in HAL |
| BookMark | eNp9kDtPwzAYRS0EEm3hF3SJxMSQ4lf82IgqSpFSsVDEZjmJQ13SOOSB1H-Po0AHBhbb-nSP9d0zBeeVqwwAcwQXCEF5t9lskwWGSCwgJggjxNAZmCBJUQgRY-f-DTkMpWBvl2DatnsIIWGST8B9HCwb17YHl-syiOu6cTrbBZ0LNn3Z2XG86lvrqsBWwavNjQvWx9o0pa0-bPV-BS4KXbbm-ueege3q4WW5DpPnx6dlnIQZ4awLteFMs0ySHFNtUpEzWnDJI62jiIrCpBGNDPWnyYsUQYjTKJUkLRCDVCCKyQzcjv_udKnqxh50c1ROW7WOEzXMfHsqIiK_huzNmPVtPnvTdmrv-qby6ynvhiOMKR9Sckxlg4DGFCqzne58067RtlQIqkGuGuSqQa46yfUs-cP-rvQ_NR8pa4w5EYJiKjAj3wCxhcY |
| CODEN | IEMUE4 |
| CitedBy_id | crossref_primary_10_1007_s13735_019_00173_y crossref_primary_10_1007_s13735_018_00166_3 crossref_primary_10_1155_2022_3317234 crossref_primary_10_1007_s00779_019_01232_1 |
| Cites_doi | 10.1109/CVPRW.2014.131 10.1007/978-3-642-37444-9_34 10.1007/978-3-540-85287-2_2 10.1145/2983563.2983567 10.1145/2911996.2912064 10.1145/2647868.2654902 10.1145/3065386 10.1007/978-3-319-24033-6_29 |
| ContentType | Magazine Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD F28 FR3 JQ2 L7M L~C L~D 1XC VOOES |
| DOI | 10.1109/MMUL.2018.023121161 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library (IEL) (UW System Shared) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ANTE: Abstracts in New Technology & Engineering Engineering Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Engineering Research Database Advanced Technologies Database with Aerospace ANTE: Abstracts in New Technology & Engineering Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1941-0166 |
| EndPage | 23 |
| ExternalDocumentID | oai:HAL:hal-01848539v2 10_1109_MMUL_2018_023121161 8424826 |
| Genre | orig-research |
| GroupedDBID | -~X .DC 0R~ 1OL 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFS ACIWK AENEX AETIX AFOGA AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV AZLTO BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RNI RNS RZB TN5 VH1 ZY4 AAYXX CITATION 7SC 7SP 8FD F28 FR3 JQ2 L7M L~C L~D RIG 1XC VOOES |
| ID | FETCH-LOGICAL-c376t-ae76a6c93d24aeb8d64f7975aa5548feb545e4b54edfb1002b5b93bf160481423 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 7 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000440854300002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1070-986X |
| IngestDate | Tue Oct 14 20:31:51 EDT 2025 Sun Jun 29 15:18:20 EDT 2025 Sat Nov 29 02:46:21 EST 2025 Tue Nov 18 20:56:41 EST 2025 Wed Aug 27 02:40:12 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Keywords | deep learning bidirectional learning multimodal fusion tied weights shared weights multimodal autoencoders unsupervised representation learning video retrieval neural networks multimodal retrieval video hyperlinking crossmodal |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c376t-ae76a6c93d24aeb8d64f7975aa5548feb545e4b54edfb1002b5b93bf160481423 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-2266-5682 |
| OpenAccessLink | https://inria.hal.science/hal-01848539 |
| PQID | 2117122472 |
| PQPubID | 75746 |
| PageCount | 13 |
| ParticipantIDs | hal_primary_oai_HAL_hal_01848539v2 crossref_primary_10_1109_MMUL_2018_023121161 ieee_primary_8424826 proquest_journals_2117122472 crossref_citationtrail_10_1109_MMUL_2018_023121161 |
| PublicationCentury | 2000 |
| PublicationDate | 2018-04-01 |
| PublicationDateYYYYMMDD | 2018-04-01 |
| PublicationDate_xml | – month: 04 year: 2018 text: 2018-04-01 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE multimedia |
| PublicationTitleAbbrev | MUL-M |
| PublicationYear | 2018 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Institute of Electrical and Electronics Engineers |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) – name: Institute of Electrical and Electronics Engineers |
| References | ref13 eskevich (ref12) 0 ref15 chen (ref9) 0; 86 ref11 ngiam (ref1) 0 ref2 demirdelen (ref4) 0 awad (ref18) 0 ref16 over (ref14) 0 guinaudeau (ref10) 2013 ref7 glorot (ref17) 0 ref6 ref5 awad (ref3) 0 simonyan (ref8) 0 |
| References_xml | – ident: ref6 doi: 10.1109/CVPRW.2014.131 – ident: ref13 doi: 10.1007/978-3-642-37444-9_34 – volume: 86 year: 0 ident: ref9 article-title: Audio-visual Integration in Multimodal Communication publication-title: Proc IEEE – ident: ref15 doi: 10.1007/978-3-540-85287-2_2 – ident: ref16 doi: 10.1145/2983563.2983567 – year: 0 ident: ref8 article-title: Very Deep Convolutional Networks for Large-Scale Image Recognition publication-title: Proc Int'l Conf Learning Representations (ICLR 15) – ident: ref11 doi: 10.1145/2911996.2912064 – ident: ref2 doi: 10.1145/2647868.2654902 – year: 0 ident: ref4 article-title: IRISA at TRECVid 2017: Beyond Crossmodal and Multimodal Models for Video Hyperlinking publication-title: Working Notes of the TRECVid 2017 Workshop – start-page: 249 year: 0 ident: ref17 article-title: Understanding the Difficulty of Training Deep Feedforward Neural Networks publication-title: Proc 13th Intl Conf Artificial Intelligence and Statistics (AISTATS 10) – year: 0 ident: ref14 article-title: TRECVID 2014-An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics publication-title: Proceedings of TRECVID 2014 – ident: ref7 doi: 10.1145/3065386 – year: 2013 ident: ref10 article-title: HITS and IRISA at MediaEval 2013: Search and Hyperlinking Task publication-title: Working Notes Proc of the MediaEval 2013 Workshop – ident: ref5 doi: 10.1007/978-3-319-24033-6_29 – start-page: 689 year: 0 ident: ref1 article-title: Multimodal Deep Learning publication-title: Proc 28th Int Conf Mach Learn (ICML-11) – year: 0 ident: ref3 article-title: Trecvid 2017: Evaluating Ad-hoc and Instance Video Search, Events Detection, Video Captioning, and Hyperlinking publication-title: Proceedings of TRECVID 2017 – year: 0 ident: ref18 article-title: TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking publication-title: Proceedings of TRECVID 2016 – year: 0 ident: ref12 article-title: The Search and Hyperlinking Task at MediaEval2014 publication-title: Working Notes Proceedings of the Mediaeval 2014 Workshop |
| SSID | ssj0003697 |
| Score | 1.2956861 |
| Snippet | With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled multimodal data, recommendation systems and multimodal... With the recent resurgence of neural networks and the proliferation of massive amounts of unlabeled data, unsupervised learning algorithms became very popular... |
| SourceID | hal proquest crossref ieee |
| SourceType | Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 11 |
| SubjectTerms | Artificial neural networks Coders Computer architecture Computer Science Computer Vision and Pattern Recognition crossmodal deep learning Hypertext systems Information Retrieval Machine learning Multimedia multimodal autoencoders multimodal fusion multimodal retrieval Neural and Evolutionary Computing Neural networks Recommender systems Representations Retrieval Streaming media Task analysis Training unsupervised representation learning video hyperlinking video retrieval Visualization |
| Title | A Crossmodal Approach to Multimodal Fusion in Video Hyperlinking |
| URI | https://ieeexplore.ieee.org/document/8424826 https://www.proquest.com/docview/2117122472 https://inria.hal.science/hal-01848539 |
| Volume | 25 |
| WOSCitedRecordID | wos000440854300002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fT9swED5RxAN76fillTFkoT0SSBwT22-rEFUfAPEwpr5FtnPRKrEGlZa_n7vEjTYxTdpLFFnnyM539p3t83cAXwMb5aLOkxQzTMhC-MRJFRJjbCCNCinK0Cab0Pf3ZjazD1tw3t-FQcQ2-Awv-LU9y6-asOatskujpCJ3eAADrYvurlY_6-YxkQqpcGJNMYsMQ1lqL-_uHm85iMtcMNkZrXiK7A8rNPjJMZBtcpV3M3JrZibD_2vgRxhu6KHFuFOAPdjCxT4MN7kaRBy6-_DhN-LBA_g2FtfcsF9N5Z7EOPKKi1Uj2gu5XfFkzTtpYr4QP-YVNmJKS9ZlzLVwCI-Tm-_X0yTmUkgCTSGrxKEuXBFsXknl0JuqULW2-so58idMjZ48KVT0xKr2TMvqr7zNfZ0VTChDPtcRbC-aBX4CkZvUkZuZS-2VCjJzlTdZ7ZC6ToMdcQRy82_LEInGOd_FU9kuOFJbMiAlA1L2gIzgvK_03PFs_Fv8jEDrJZkjezq-LbmM5BT5IPZVjuCAIeqlIjojONlgXMYB-1LSVzUfMmp5_Pdan2GXm9AF7ZzA9mq5xi-wE15X85flaauLbx5a2E0 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Rb9MwED5tA4nxUtiYKGzDQjwuW-J4if1GNVEV0VY8bKhvlu1ctEpbg7p2v393qRuBQEi8RJF1juzc2Xe2z98H8CmwUy7qPEkxw4Q8hE-cVCHR2gSyqJCiDC3ZRDmd6tnMfN-Bs-4uDCK2yWd4zq_tWX7VhDVvlV1oJRWFw7vwjJmz4m2tbt7NI5UKGXFidDGLGENZai4mk5sxp3Hpc4Y7ozVPkf3mh3ZvOQuypVf5Y05uHc2w939NfAW9LUC0GGxM4DXs4OIAelu2BhEH7wG8_AV68BA-D8QVN-y-qdydGERkcbFqRHsld1M8XPNempgvxI95hY0Y0aJ1GdkW3sDN8Mv11SiJbApJoElklTgsC1cEk1dSOfS6KlRdmvLSOYoodI2eYilU9MSq9gzM6i-9yX2dFQwpQ1HXEewtmgW-BZHr1FGgmcvSKxVk5iqvs9ohdZ2GO2If5Pbf2hChxpnx4s62S47UWFaIZYXYTiF9OOsq_dwgbfxb_CMprZNklOzRYGy5jOQURSHmUfbhkFXUSUXt9OF4q2Mbh-yDpa-WfMxYynd_r_UBXoyuJ2M7_jr99h72uTmbFJ5j2Fst13gCz8Pjav6wPG3t8gmOXtuW |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Crossmodal+Approach+to+Multimodal+Fusion+in+Video+Hyperlinking&rft.jtitle=IEEE+multimedia&rft.au=Vukotic%2C+Vedran&rft.au=Raymond%2C+Christian&rft.au=Gravier%2C+Guillaume&rft.date=2018-04-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1070-986X&rft.eissn=1941-0166&rft.volume=25&rft.issue=2&rft.spage=11&rft_id=info:doi/10.1109%2FMMUL.2018.023121161&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1070-986X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1070-986X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1070-986X&client=summon |