Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio- Visual Event Perception
With only video-level event labels, this paper targets at the task of weakly-supervised audio-visual event perception (WS-AVEP), which aims to temporally localize and categorize events belonging to each modality. Despite the recent progress, most existing approaches either ignore the unsynchronized...
Uložené v:
| Vydané v: | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 18827 - 18836 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.06.2023
|
| Predmet: | |
| ISSN: | 1063-6919 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | With only video-level event labels, this paper targets at the task of weakly-supervised audio-visual event perception (WS-AVEP), which aims to temporally localize and categorize events belonging to each modality. Despite the recent progress, most existing approaches either ignore the unsynchronized property of audio-visual tracks or discount the complementary modality for explicit enhancement. We argue that, for an event residing in one modality, the modality itself should provide ample presence evidence of this event, while the other complementary modality is encouraged to afford the absence evidence as a reference signal. To this end, we propose to collect Cross-Modal Presence-Absence Evidence (CMPAE) in a unified framework. Specifically, by leveraging uni-modal and cross-modal representations, a presence-absence evidence collector (PAEC) is designed under Subjective Logic theory. To learn the evidence in a reliable range, we propose a joint-modal mutual learning (IML) process, which calibrates the evidence of diverse audible, visible, and audi-visible events adaptively and dynamically. Extensive experiments show that our method surpasses state-of-the-arts (e.g., absolute gains of 3.6% and 6.1 % in terms of event-level visual and audio metrics). Code is available in github.com/MengyuanChen21/CVPR2023-CMPAE. |
|---|---|
| AbstractList | With only video-level event labels, this paper targets at the task of weakly-supervised audio-visual event perception (WS-AVEP), which aims to temporally localize and categorize events belonging to each modality. Despite the recent progress, most existing approaches either ignore the unsynchronized property of audio-visual tracks or discount the complementary modality for explicit enhancement. We argue that, for an event residing in one modality, the modality itself should provide ample presence evidence of this event, while the other complementary modality is encouraged to afford the absence evidence as a reference signal. To this end, we propose to collect Cross-Modal Presence-Absence Evidence (CMPAE) in a unified framework. Specifically, by leveraging uni-modal and cross-modal representations, a presence-absence evidence collector (PAEC) is designed under Subjective Logic theory. To learn the evidence in a reliable range, we propose a joint-modal mutual learning (IML) process, which calibrates the evidence of diverse audible, visible, and audi-visible events adaptively and dynamically. Extensive experiments show that our method surpasses state-of-the-arts (e.g., absolute gains of 3.6% and 6.1 % in terms of event-level visual and audio metrics). Code is available in github.com/MengyuanChen21/CVPR2023-CMPAE. |
| Author | Xu, Changsheng Gao, Junyu Chen, Mengyuan |
| Author_xml | – sequence: 1 givenname: Junyu surname: Gao fullname: Gao, Junyu email: junyu.gao@nlpr.ia.ac.cn organization: Institute of Automation, Chinese Academy of Sciences (CASIA),State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS) – sequence: 2 givenname: Mengyuan surname: Chen fullname: Chen, Mengyuan email: chenmengyuan2021@ia.ac.cn organization: Institute of Automation, Chinese Academy of Sciences (CASIA),State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS) – sequence: 3 givenname: Changsheng surname: Xu fullname: Xu, Changsheng email: csxu@nlpr.ia.ac.cn organization: Institute of Automation, Chinese Academy of Sciences (CASIA),State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS) |
| BookMark | eNotkMtKAzEYhaMoWGvfoIu8QOqfy0ySZRnqBSoWL3VZMsk_Eh1nSjIt9O0t1dV34PCdxbkmF13fISFTDjPOwd5W69VLIbSwMwFCzoAbKM7IxGprZAESuLDmnIw4lJKVltsrMsn5CwCk4Ly0ZkSw6tsW_RC7T1qlPmf21AfX0lXCjJ1HNq9PpIt9DKfQ9Il-oPtuD-x1t8W0jxkDne9C7Bldx7w72os9dgNdYfK4HWLf3ZDLxrUZJ_8ck_e7xVv1wJbP94_VfMmiADUwpSyX6FyjnVWyCB68wMJ4B0o76QpZIheqlrXywcqg9bEMEk3tdWMUNHJMpn-7ERE32xR_XDpsOAgojmfIXz9rWlE |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/CVPR52729.2023.01805 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences |
| EISBN | 9798350301298 |
| EISSN | 1063-6919 |
| EndPage | 18836 |
| ExternalDocumentID | 10205301 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-i204t-44913eaaf7a9435dc0c2e58ca047a3a536e124b3b4cd93d77e58d3e8bc7f840f3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 32 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001062531303014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:56:29 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i204t-44913eaaf7a9435dc0c2e58ca047a3a536e124b3b4cd93d77e58d3e8bc7f840f3 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_10205301 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-June |
| PublicationDateYYYYMMDD | 2023-06-01 |
| PublicationDate_xml | – month: 06 year: 2023 text: 2023-June |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) |
| PublicationTitleAbbrev | CVPR |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003211698 |
| Score | 2.4683032 |
| Snippet | With only video-level event labels, this paper targets at the task of weakly-supervised audio-visual event perception (WS-AVEP), which aims to temporally... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 18827 |
| SubjectTerms | Codes Computer vision Learning systems Measurement Reliability theory Target tracking Video: Action and event understanding Visualization |
| Title | Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio- Visual Event Perception |
| URI | https://ieeexplore.ieee.org/document/10205301 |
| WOSCitedRecordID | wos001062531303014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagYmAqjyLe8sDqksRJHI9V1YoBqohH6Vb5cZEqqqZqGiT-PWc3LWJgYIqVyIl0jn2fz993R8idMAoixVPGIUkY4n_DJHDLMinCqCi0ENanzH8Uo1E2mci8Eat7LQwAePIZdF3Tn-Xb0tQuVIYzPMJ_xqm19oVIN2KtXUCF41YmlVkjjwsDed8f589JhOix62qEd12qquRXERXvQ4btf379iHR-1Hg03_mZY7IHixPSbuAjbSZndUrABwGM4zHTvnN-7Km0ak5zrzAywHraX-m2kihFwErfQX3Mv9hLvXSrRoVv7NV2VjI6nlU19h44QiTNd_yXDnkbDl77D6yposBmURCv0f4y5KBUIZREbGRNYCJIMqOCWCiuEp4C-njNdWys5FYIfGg5ZNqIAnd_BT8jrUW5gHNCBeCCEIYq1CaNJcQ6MAjXlE11zE2RBBek48w2XW4SZUy3Frv84_4VOXQjs2FeXZPWelXDDTkwn-tZtbr1w_sNe7em7Q |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG8MmugJPzB-24PX4rZu63okBIIRyKKI3Eg_3hIiYYQxE_972zIwHjx4WrOlW9K39v36-vu9h9ADUwICQWNCIYqIwf-KcKCaJJz5QZZJxrRLmd9nw2EymfC0Eqs7LQwAOPIZNG3TneXrXJU2VGZmeGD-GavW2relsyq51i6kQs1mJuZJJZDzPf7YHqcvUWDwY9NWCW_aZFXRrzIqzot06__8_jFq_OjxcLrzNCdoDxanqF4BSFxNz-IMgQsDKMtkxm3r_sgg12KOU6cxUkBa0l3xtpYoNpAVv4P4mH-R13Jp143CvLFV6llO8HhWlKZ3x1IicbpjwDTQW7czavdIVUeBzAIvXBsLcJ-CEBkT3KAjrTwVQJQo4YVMUBHRGIyXl1SGSnOqGTMPNYVEKpaZ_V9Gz1FtkS_gAmEGZknwfeFLFYccQukpA9iEjmVIVRZ5l6hhh2263KTKmG5H7OqP-_fosDca9Kf9p-HzNTqyVtrwsG5Qbb0q4RYdqM_1rFjdOVN_A1vUqjY |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Collecting+Cross-Modal+Presence-Absence+Evidence+for+Weakly-Supervised+Audio-+Visual+Event+Perception&rft.au=Gao%2C+Junyu&rft.au=Chen%2C+Mengyuan&rft.au=Xu%2C+Changsheng&rft.date=2023-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=18827&rft.epage=18836&rft_id=info:doi/10.1109%2FCVPR52729.2023.01805&rft.externalDocID=10205301 |