Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio- Visual Event Perception

With only video-level event labels, this paper targets at the task of weakly-supervised audio-visual event perception (WS-AVEP), which aims to temporally localize and categorize events belonging to each modality. Despite the recent progress, most existing approaches either ignore the unsynchronized...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 18827 - 18836
Main Authors: Gao, Junyu, Chen, Mengyuan, Xu, Changsheng
Format: Conference Proceeding
Language:English
Published: IEEE 01.06.2023
Subjects:
ISSN:1063-6919
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract With only video-level event labels, this paper targets at the task of weakly-supervised audio-visual event perception (WS-AVEP), which aims to temporally localize and categorize events belonging to each modality. Despite the recent progress, most existing approaches either ignore the unsynchronized property of audio-visual tracks or discount the complementary modality for explicit enhancement. We argue that, for an event residing in one modality, the modality itself should provide ample presence evidence of this event, while the other complementary modality is encouraged to afford the absence evidence as a reference signal. To this end, we propose to collect Cross-Modal Presence-Absence Evidence (CMPAE) in a unified framework. Specifically, by leveraging uni-modal and cross-modal representations, a presence-absence evidence collector (PAEC) is designed under Subjective Logic theory. To learn the evidence in a reliable range, we propose a joint-modal mutual learning (IML) process, which calibrates the evidence of diverse audible, visible, and audi-visible events adaptively and dynamically. Extensive experiments show that our method surpasses state-of-the-arts (e.g., absolute gains of 3.6% and 6.1 % in terms of event-level visual and audio metrics). Code is available in github.com/MengyuanChen21/CVPR2023-CMPAE.
AbstractList With only video-level event labels, this paper targets at the task of weakly-supervised audio-visual event perception (WS-AVEP), which aims to temporally localize and categorize events belonging to each modality. Despite the recent progress, most existing approaches either ignore the unsynchronized property of audio-visual tracks or discount the complementary modality for explicit enhancement. We argue that, for an event residing in one modality, the modality itself should provide ample presence evidence of this event, while the other complementary modality is encouraged to afford the absence evidence as a reference signal. To this end, we propose to collect Cross-Modal Presence-Absence Evidence (CMPAE) in a unified framework. Specifically, by leveraging uni-modal and cross-modal representations, a presence-absence evidence collector (PAEC) is designed under Subjective Logic theory. To learn the evidence in a reliable range, we propose a joint-modal mutual learning (IML) process, which calibrates the evidence of diverse audible, visible, and audi-visible events adaptively and dynamically. Extensive experiments show that our method surpasses state-of-the-arts (e.g., absolute gains of 3.6% and 6.1 % in terms of event-level visual and audio metrics). Code is available in github.com/MengyuanChen21/CVPR2023-CMPAE.
Author Xu, Changsheng
Gao, Junyu
Chen, Mengyuan
Author_xml – sequence: 1
  givenname: Junyu
  surname: Gao
  fullname: Gao, Junyu
  email: junyu.gao@nlpr.ia.ac.cn
  organization: Institute of Automation, Chinese Academy of Sciences (CASIA),State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS)
– sequence: 2
  givenname: Mengyuan
  surname: Chen
  fullname: Chen, Mengyuan
  email: chenmengyuan2021@ia.ac.cn
  organization: Institute of Automation, Chinese Academy of Sciences (CASIA),State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS)
– sequence: 3
  givenname: Changsheng
  surname: Xu
  fullname: Xu, Changsheng
  email: csxu@nlpr.ia.ac.cn
  organization: Institute of Automation, Chinese Academy of Sciences (CASIA),State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS)
BookMark eNotkMtKAzEYhaMoWGvfoIu8QOqfy0ySZRnqBSoWL3VZMsk_Eh1nSjIt9O0t1dV34PCdxbkmF13fISFTDjPOwd5W69VLIbSwMwFCzoAbKM7IxGprZAESuLDmnIw4lJKVltsrMsn5CwCk4Ly0ZkSw6tsW_RC7T1qlPmf21AfX0lXCjJ1HNq9PpIt9DKfQ9Il-oPtuD-x1t8W0jxkDne9C7Bldx7w72os9dgNdYfK4HWLf3ZDLxrUZJ_8ck_e7xVv1wJbP94_VfMmiADUwpSyX6FyjnVWyCB68wMJ4B0o76QpZIheqlrXywcqg9bEMEk3tdWMUNHJMpn-7ERE32xR_XDpsOAgojmfIXz9rWlE
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52729.2023.01805
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9798350301298
EISSN 1063-6919
EndPage 18836
ExternalDocumentID 10205301
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i204t-44913eaaf7a9435dc0c2e58ca047a3a536e124b3b4cd93d77e58d3e8bc7f840f3
IEDL.DBID RIE
ISICitedReferencesCount 32
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001062531303014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:56:29 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-44913eaaf7a9435dc0c2e58ca047a3a536e124b3b4cd93d77e58d3e8bc7f840f3
PageCount 10
ParticipantIDs ieee_primary_10205301
PublicationCentury 2000
PublicationDate 2023-June
PublicationDateYYYYMMDD 2023-06-01
PublicationDate_xml – month: 06
  year: 2023
  text: 2023-June
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.4683797
Snippet With only video-level event labels, this paper targets at the task of weakly-supervised audio-visual event perception (WS-AVEP), which aims to temporally...
SourceID ieee
SourceType Publisher
StartPage 18827
SubjectTerms Codes
Computer vision
Learning systems
Measurement
Reliability theory
Target tracking
Video: Action and event understanding
Visualization
Title Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio- Visual Event Perception
URI https://ieeexplore.ieee.org/document/10205301
WOSCitedRecordID wos001062531303014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07a8MwEBZt6NCpr5S-0dBVqW3JkTyGkNClwfSRZguydAbTEIc4LvTf9yQ7KR06dLKwkQ06S_fd6ft0hNwn0mUWuGQYeAVMJCHgOmhCBirMdYzhj82aYhNyMlGzWZK2YnWvhQEATz6Dnmv6vXxbmtqlynCGR_jPOLXWvpT9Rqy1S6hwDGX6iWrlcWGQPAyn6XMcIXrsuRrhPXdUVfyriIr3IeOjf379mHR_1Hg03fmZE7IHy1Ny1MJH2k7O6oyATwIYx2OmQ-f82FNp9YKmXmFkgA0yf6XbSqIUASt9B_2x-GIv9cqtGhW-cVDbomR0WlQ19h45QiRNd_yXLnkbj16Hj6ytosCKKBAbJtAAHLTOpU4QG1kTmAhiZXQgpOY65n1AH5_xTBibcCslPrQcVGZkjtFfzs9JZ1ku4YJQqTQgoNEGJAh0ZDqzQkpl8whNHmtxSbpu2Oar5qCM-XbErv64f00OnWUa5tUN6WzWNdySA_O5Kar1nTfvN7bnpnc
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFG4MmugJf2D8bQ9ei9va0e1ICAQjkEURuZGufUsWCSOMmfjf-zoGxoMHT2u2dEv61r7vvb6vHyEPobSZBS4ZBl4OE6ELuA5ql0HgJsrH8MfEG7EJORoF02kYVWT1kgsDAGXxGTRts9zLN5kubKoMZ7iH_4xla-1b6ayKrrVLqXAMZlphUBHkXCd87EyiF99D_Ni0KuFNe1iV_0tGpfQivfo_v39MGj98PBrtPM0J2YPFKalXAJJW0zM_I1CmAbStZKYd6_7YMDNqTqOSY6SBtePySrdaohQhK30H9TH_Yq_F0q4bOb6xXZg0Y3SS5gX27tqSSBrtKmAa5K3XHXf6rNJRYKnniDUTaAIOSiVShYiOjHa0B36glSOk4srnLUAvH_NYaBNyIyU-NByCWMsE47-En5PaIlvABaEyUICQRmmQINCVqdgIKQOTeGh0X4lL0rDDNltujsqYbUfs6o_79-SwPx4OZoOn0fM1ObJW2tRh3ZDaelXALTnQn-s0X92Vpv4Gj8apwA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Collecting+Cross-Modal+Presence-Absence+Evidence+for+Weakly-Supervised+Audio-+Visual+Event+Perception&rft.au=Gao%2C+Junyu&rft.au=Chen%2C+Mengyuan&rft.au=Xu%2C+Changsheng&rft.date=2023-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=18827&rft.epage=18836&rft_id=info:doi/10.1109%2FCVPR52729.2023.01805&rft.externalDocID=10205301