A Multimodal Saliency Model for Videos With High Audio-Visual Correspondence

Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS)...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on image processing Ročník 29; s. 3805 - 3819
Hlavní autoři: Min, Xiongkuo, Zhai, Guangtao, Zhou, Jiantao, Zhang, Xiao-Ping, Yang, Xiaokang, Guan, Xinping
Médium: Journal Article
Jazyk:angličtina
Vydáno: United States IEEE 01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1057-7149, 1941-0042, 1941-0042
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS) model for videos containing scenes with high audio-visual correspondence. In such scenes, humans tend to be attracted by the sound sources and it is also possible to localize the sound sources via cross-modal analysis. Specifically, we first detect the spatial and temporal saliency maps from the visual modality by using a novel free energy principle. Then we propose to detect the audio saliency map from both audio and visual modalities by localizing the moving-sounding objects using cross-modal kernel canonical correlation analysis, which is first of its kind in the literature. Finally we propose a new two-stage adaptive audiovisual saliency fusion method to integrate the spatial, temporal and audio saliency maps to our audio-visual saliency map. The proposed MMS model has captured the influence of audio, which is not considered in the latest deep learning based saliency models. To take advantages of both deep saliency modeling and audio-visual saliency modeling, we propose to combine deep saliency models and the MMS model via a later fusion, and we find that an average of 5% performance gain is obtained. Experimental results on audio-visual attention databases show that the introduced models incorporating audio cues have significant superiority over state-of-the-art image and video saliency models which utilize a single visual modality.
AbstractList Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS) model for videos containing scenes with high audio-visual correspondence. In such scenes, humans tend to be attracted by the sound sources and it is also possible to localize the sound sources via cross-modal analysis. Specifically, we first detect the spatial and temporal saliency maps from the visual modality by using a novel free energy principle. Then we propose to detect the audio saliency map from both audio and visual modalities by localizing the moving-sounding objects using cross-modal kernel canonical correlation analysis, which is first of its kind in the literature. Finally we propose a new two-stage adaptive audiovisual saliency fusion method to integrate the spatial, temporal and audio saliency maps to our audio-visual saliency map. The proposed MMS model has captured the influence of audio, which is not considered in the latest deep learning based saliency models. To take advantages of both deep saliency modeling and audio-visual saliency modeling, we propose to combine deep saliency models and the MMS model via a later fusion, and we find that an average of 5% performance gain is obtained. Experimental results on audio-visual attention databases show that the introduced models incorporating audio cues have significant superiority over state-of-the-art image and video saliency models which utilize a single visual modality.
Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS) model for videos containing scenes with high audio-visual correspondence. In such scenes, humans tend to be attracted by the sound sources and it is also possible to localize the sound sources via cross-modal analysis. Specifically, we first detect the spatial and temporal saliency maps from the visual modality by using a novel free energy principle. Then we propose to detect the audio saliency map from both audio and visual modalities by localizing the moving-sounding objects using cross-modal kernel canonical correlation analysis, which is first of its kind in the literature. Finally we propose a new two-stage adaptive audiovisual saliency fusion method to integrate the spatial, temporal and audio saliency maps to our audio-visual saliency map. The proposed MMS model has captured the influence of audio, which is not considered in the latest deep learning based saliency models. To take advantages of both deep saliency modeling and audio-visual saliency modeling, we propose to combine deep saliency models and the MMS model via a later fusion, and we find that an average of 5% performance gain is obtained. Experimental results on audio-visual attention databases show that the introduced models incorporating audio cues have significant superiority over state-of-the-art image and video saliency models which utilize a single visual modality.Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS) model for videos containing scenes with high audio-visual correspondence. In such scenes, humans tend to be attracted by the sound sources and it is also possible to localize the sound sources via cross-modal analysis. Specifically, we first detect the spatial and temporal saliency maps from the visual modality by using a novel free energy principle. Then we propose to detect the audio saliency map from both audio and visual modalities by localizing the moving-sounding objects using cross-modal kernel canonical correlation analysis, which is first of its kind in the literature. Finally we propose a new two-stage adaptive audiovisual saliency fusion method to integrate the spatial, temporal and audio saliency maps to our audio-visual saliency map. The proposed MMS model has captured the influence of audio, which is not considered in the latest deep learning based saliency models. To take advantages of both deep saliency modeling and audio-visual saliency modeling, we propose to combine deep saliency models and the MMS model via a later fusion, and we find that an average of 5% performance gain is obtained. Experimental results on audio-visual attention databases show that the introduced models incorporating audio cues have significant superiority over state-of-the-art image and video saliency models which utilize a single visual modality.
Author Min, Xiongkuo
Guan, Xinping
Yang, Xiaokang
Zhou, Jiantao
Zhai, Guangtao
Zhang, Xiao-Ping
Author_xml – sequence: 1
  givenname: Xiongkuo
  orcidid: 0000-0001-5693-0416
  surname: Min
  fullname: Min, Xiongkuo
  email: minxiongkuo@gmail.com
  organization: Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
– sequence: 2
  givenname: Guangtao
  orcidid: 0000-0001-8165-9322
  surname: Zhai
  fullname: Zhai, Guangtao
  email: zhaiguangtao@sjtu.edu.cn
  organization: Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
– sequence: 3
  givenname: Jiantao
  orcidid: 0000-0002-6015-2618
  surname: Zhou
  fullname: Zhou, Jiantao
  email: jtzhou@umac.mo
  organization: Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China
– sequence: 4
  givenname: Xiao-Ping
  orcidid: 0000-0001-5241-0069
  surname: Zhang
  fullname: Zhang, Xiao-Ping
  email: xzhang@ee.ryerson.ca
  organization: Department of Electrical and Computer Engineering, Ryerson University, ON, Toronto, Canada
– sequence: 5
  givenname: Xiaokang
  surname: Yang
  fullname: Yang, Xiaokang
  email: xkyang@sjtu.edu.cn
  organization: Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
– sequence: 6
  givenname: Xinping
  orcidid: 0000-0003-1858-8538
  surname: Guan
  fullname: Guan, Xinping
  email: xpguan@sjtu.edu.cn
  organization: Department of Automation, Shanghai Jiao Tong University, Shanghai, China
BackLink https://www.ncbi.nlm.nih.gov/pubmed/31976898$$D View this record in MEDLINE/PubMed
BookMark eNp9kctrGzEQh0VJSGI390KhLPTSyzqjh_U4GtPUAZsEmsdxkaVxo7BeudLuIf99ZOzmkENPM4fvmxnmNyInXeyQkC8UJpSCubq_uZswYDBhRkrQ7BO5oEbQGkCwk9LDVNWKCnNORjm_AFAxpfKMnHNqlNRGX5DlrFoNbR-20du2-m3bgJ17rVbRY1ttYqoeg8eYq6fQP1eL8Oe5mg0-xPox5KEI85gS5l3sfNHwMznd2Dbj5bGOycP1z_v5ol7e_rqZz5a140L1tbDUOQBmvOYSjbcaGbfCKuGV26DXTpVTtZK4BrYGy9ZyKpnwhYciWD4mPw5zdyn-HTD3zTZkh21rO4xDbhgXgknB-bSg3z-gL3FIXbluTxnKJBe6UN-O1LDeom92KWxtem3-_akAcABcijkn3LwjFJp9FE2JotlH0RyjKIr8oLjQ2z7Erk82tP8Tvx7EgIjve7SRjCnN3wCvJZNp
CODEN IIPRE4
CitedBy_id crossref_primary_10_1016_j_displa_2025_103061
crossref_primary_10_1016_j_jai_2024_06_002
crossref_primary_10_1016_j_displa_2025_103060
crossref_primary_10_1088_1741_2552_ad49d7
crossref_primary_10_3389_fnins_2022_876969
crossref_primary_10_1109_TIP_2022_3175619
crossref_primary_10_1109_JSTARS_2020_3021390
crossref_primary_10_1007_s00371_024_03316_3
crossref_primary_10_1016_j_cviu_2023_103695
crossref_primary_10_1016_j_displa_2024_102890
crossref_primary_10_3390_su151914104
crossref_primary_10_1016_j_eswa_2023_121709
crossref_primary_10_1142_S0219649222500666
crossref_primary_10_1145_3457905
crossref_primary_10_1109_TMM_2024_3353456
crossref_primary_10_1038_s41598_025_07710_2
crossref_primary_10_1109_TMM_2023_3268370
crossref_primary_10_1016_j_displa_2024_102898
crossref_primary_10_1109_ACCESS_2021_3053956
crossref_primary_10_1016_j_displa_2024_102899
crossref_primary_10_3390_rs15194809
crossref_primary_10_3390_s23239592
crossref_primary_10_1016_j_displa_2024_102896
crossref_primary_10_1109_ACCESS_2021_3095197
crossref_primary_10_47820_recima21_v6i4_6346
crossref_primary_10_1016_j_displa_2024_102652
crossref_primary_10_1016_j_displa_2024_102653
crossref_primary_10_1109_TPAMI_2025_3573729
crossref_primary_10_3389_fpsyg_2021_767840
crossref_primary_10_1016_j_displa_2024_102818
crossref_primary_10_1016_j_displa_2023_102417
crossref_primary_10_1109_TIP_2025_3563076
crossref_primary_10_1016_j_eswa_2024_123323
crossref_primary_10_1109_TCYB_2020_3037208
crossref_primary_10_1109_ACCESS_2025_3592569
crossref_primary_10_1016_j_displa_2024_102881
crossref_primary_10_1016_j_displa_2024_102882
crossref_primary_10_1016_j_displa_2024_102880
crossref_primary_10_1007_s00426_024_02018_8
crossref_primary_10_1016_j_aei_2025_103865
crossref_primary_10_1038_s41598_025_88261_4
crossref_primary_10_1109_TVCG_2023_3320237
crossref_primary_10_1109_ACCESS_2023_3344813
crossref_primary_10_1016_j_displa_2024_102883
crossref_primary_10_1016_j_jvcir_2020_103004
crossref_primary_10_1007_s00371_023_02809_x
crossref_primary_10_1016_j_jvcir_2024_104095
crossref_primary_10_1016_j_displa_2024_102804
crossref_primary_10_1016_j_neucom_2022_05_098
crossref_primary_10_1038_s41598_025_91839_7
crossref_primary_10_1109_TIP_2022_3206621
crossref_primary_10_1109_TIP_2023_3251695
crossref_primary_10_1016_j_knosys_2024_112536
crossref_primary_10_1016_j_displa_2025_103125
crossref_primary_10_1016_j_displa_2025_103003
crossref_primary_10_1016_j_displa_2024_102791
crossref_primary_10_1016_j_displa_2025_103002
crossref_primary_10_1016_j_displa_2024_102671
crossref_primary_10_1016_j_displa_2024_102792
crossref_primary_10_1109_TETCI_2024_3386619
crossref_primary_10_1016_j_displa_2025_103087
crossref_primary_10_1016_j_displa_2024_102955
crossref_primary_10_1016_j_displa_2024_102953
crossref_primary_10_1109_TETCI_2024_3358184
crossref_primary_10_1016_j_displa_2025_103008
crossref_primary_10_1016_j_neucom_2025_129855
crossref_primary_10_1016_j_displa_2022_102238
crossref_primary_10_1016_j_displa_2024_102677
crossref_primary_10_1016_j_displa_2025_103007
crossref_primary_10_1016_j_displa_2025_103128
crossref_primary_10_1109_TIP_2024_3501074
crossref_primary_10_1109_TMM_2023_3306596
crossref_primary_10_3390_s23094241
crossref_primary_10_1109_ACCESS_2023_3244191
crossref_primary_10_1109_TBC_2023_3308329
crossref_primary_10_1109_TIP_2022_3219228
crossref_primary_10_1016_j_displa_2025_103126
crossref_primary_10_1016_j_jvcir_2022_103553
crossref_primary_10_1016_j_neunet_2024_106752
crossref_primary_10_1038_s41598_025_07656_5
crossref_primary_10_1016_j_cviu_2022_103602
crossref_primary_10_1007_s00371_021_02355_4
crossref_primary_10_1016_j_dsp_2023_104272
crossref_primary_10_1016_j_displa_2025_103136
crossref_primary_10_1016_j_displa_2025_103132
crossref_primary_10_1016_j_displa_2025_103097
crossref_primary_10_1016_j_displa_2025_103130
crossref_primary_10_1109_TMM_2023_3271022
crossref_primary_10_1007_s00371_023_02797_y
crossref_primary_10_3389_fncom_2021_746549
crossref_primary_10_1109_TMM_2025_3535280
crossref_primary_10_1016_j_displa_2025_103139
crossref_primary_10_1145_3508361
crossref_primary_10_1016_j_neucom_2023_126775
crossref_primary_10_1016_j_displa_2025_103137
crossref_primary_10_1016_j_ins_2025_122673
crossref_primary_10_1016_j_displa_2024_102709
crossref_primary_10_1109_TCSVT_2020_3030895
crossref_primary_10_1109_TIP_2025_3593911
crossref_primary_10_1109_TIP_2024_3430080
crossref_primary_10_1109_TIP_2024_3378466
crossref_primary_10_1016_j_displa_2023_102450
crossref_primary_10_1016_j_displa_2022_102175
crossref_primary_10_1016_j_displa_2025_102979
crossref_primary_10_3390_rs15163951
crossref_primary_10_1109_TIP_2021_3113799
crossref_primary_10_1016_j_displa_2022_102177
crossref_primary_10_1016_j_displa_2023_102575
crossref_primary_10_1186_s42492_024_00171_w
crossref_primary_10_1016_j_displa_2022_102334
crossref_primary_10_1016_j_image_2025_117370
crossref_primary_10_1007_s11432_019_2757_1
crossref_primary_10_1109_TMM_2023_3325719
crossref_primary_10_1109_ACCESS_2023_3287860
crossref_primary_10_4018_IJFSA_343490
crossref_primary_10_1016_j_neucom_2021_11_100
crossref_primary_10_3390_electronics12244961
crossref_primary_10_1038_s41598_025_12306_x
crossref_primary_10_1109_LSP_2024_3452556
crossref_primary_10_1007_s11280_022_01027_0
crossref_primary_10_1371_journal_pone_0323285
crossref_primary_10_1016_j_dsp_2024_104592
crossref_primary_10_1016_j_displa_2023_102463
crossref_primary_10_1109_TMC_2020_3004534
crossref_primary_10_3390_s23198101
crossref_primary_10_1007_s42979_024_03279_1
crossref_primary_10_1016_j_displa_2023_102504
crossref_primary_10_1016_j_displa_2024_102843
crossref_primary_10_1109_TCSVT_2022_3203421
crossref_primary_10_1016_j_displa_2023_102585
crossref_primary_10_1109_TMM_2021_3093717
crossref_primary_10_3390_ai1040030
crossref_primary_10_1177_14727978251366487
crossref_primary_10_1038_s41598_025_01146_4
crossref_primary_10_1109_TMM_2022_3176942
crossref_primary_10_1016_j_displa_2024_102728
crossref_primary_10_1109_TIP_2021_3073283
crossref_primary_10_1007_s11042_024_19368_5
crossref_primary_10_1016_j_displa_2024_102729
crossref_primary_10_1016_j_displa_2023_102629
crossref_primary_10_1016_j_dsp_2024_104666
crossref_primary_10_1109_TBC_2024_3511927
crossref_primary_10_1109_TCE_2023_3325744
crossref_primary_10_3390_technologies11060178
crossref_primary_10_3390_bdcc9050120
crossref_primary_10_1016_j_aei_2025_103350
crossref_primary_10_1016_j_imavis_2024_104955
crossref_primary_10_1016_j_displa_2021_102072
crossref_primary_10_1016_j_displa_2025_102992
crossref_primary_10_1109_TMM_2024_3380260
crossref_primary_10_1016_j_displa_2023_102430
crossref_primary_10_1016_j_displa_2025_103046
crossref_primary_10_1016_j_displa_2025_103045
crossref_primary_10_1007_s11432_024_4133_3
crossref_primary_10_1016_j_displa_2023_102557
crossref_primary_10_1016_j_displa_2024_102879
crossref_primary_10_3390_app13148482
crossref_primary_10_1016_j_displa_2024_102877
crossref_primary_10_1145_3698881
crossref_primary_10_1016_j_displa_2024_102872
crossref_primary_10_1016_j_displa_2021_102115
crossref_primary_10_1016_j_knosys_2024_111655
crossref_primary_10_1109_TIP_2023_3341700
crossref_primary_10_1038_s41598_025_07442_3
crossref_primary_10_1038_s41598_025_97662_4
crossref_primary_10_1016_j_displa_2023_102439
crossref_primary_10_1007_s10586_025_05538_z
crossref_primary_10_1109_TBC_2020_3028335
crossref_primary_10_1038_s41598_025_95147_y
crossref_primary_10_1109_TMM_2020_3029891
crossref_primary_10_3390_jimaging9120281
crossref_primary_10_1016_j_displa_2025_103179
crossref_primary_10_1038_s41598_024_83978_0
crossref_primary_10_1109_TIP_2023_3242774
crossref_primary_10_1016_j_knosys_2022_109675
crossref_primary_10_1016_j_displa_2025_103174
crossref_primary_10_1016_j_displa_2025_103173
crossref_primary_10_1016_j_displa_2024_102900
crossref_primary_10_1145_3565267
crossref_primary_10_1016_j_displa_2024_102901
crossref_primary_10_1016_j_displa_2024_102744
crossref_primary_10_1016_j_displa_2024_102865
crossref_primary_10_1016_j_displa_2024_102866
crossref_primary_10_1109_TMM_2023_3338412
crossref_primary_10_1109_TETCI_2023_3341333
crossref_primary_10_1016_j_displa_2023_102444
crossref_primary_10_3389_fnbot_2024_1489021
crossref_primary_10_1109_TIP_2024_3461956
crossref_primary_10_1109_TIP_2023_3290528
Cites_doi 10.16910/jemr.5.4.2
10.1109/ICASSP.2016.7472197
10.1162/neco.1995.7.5.889
10.1109/TMM.2012.2228476
10.1016/j.dsp.2019.02.017
10.1093/biomet/28.3-4.321
10.3758/BF03211521
10.1145/2996463
10.1109/TMM.2019.2902097
10.1162/0899766042321814
10.1007/978-3-540-74048-3
10.1007/978-3-642-33783-3_45
10.1109/83.931096
10.1109/EUSIPCO.2015.7362640
10.1037/0096-1523.26.5.1583
10.1038/nrn2787
10.1017/CBO9780511809682
10.1109/TPAMI.2012.89
10.1109/TIP.2015.2425544
10.1109/TMM.2017.2694219
10.1167/14.8.5
10.1109/LSP.2015.2413944
10.1109/TMM.2005.854410
10.1109/ICCV.2017.513
10.1109/JSTSP.2011.2165199
10.1109/ICCV.2009.5459462
10.16910/jemr.6.4.1
10.1109/CVPR.2007.383267
10.1109/34.730558
10.1109/TPAMI.2018.2815601
10.1109/WIAMIS.2013.6616164
10.1109/TCSVT.2014.2329380
10.1167/8.7.32
10.1109/ICCV.2013.26
10.1016/j.jphysparis.2006.10.001
10.1109/TPAMI.2012.147
10.1145/3136624
10.1109/TIP.2011.2161092
10.1167/9.12.15
10.1109/TBC.2018.2816783
10.1109/TCSVT.2013.2273613
10.1109/TITS.2018.2868771
10.1109/TMM.2013.2267205
10.1109/CVPR.2018.00514
10.1109/TIP.2015.2487833
10.1073/pnas.1510393112
10.1109/TIP.2017.2735192
10.1109/TMM.2014.2373812
10.1109/TIP.2018.2851672
10.1109/TMM.2017.2788206
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
DOI 10.1109/TIP.2020.2966082
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList
PubMed
Technology Research Database
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 3
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Engineering
EISSN 1941-0042
EndPage 3819
ExternalDocumentID 31976898
10_1109_TIP_2020_2966082
8962278
Genre orig-research
Journal Article
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 61901260; 61831015; 61521062; 61527804
  funderid: 10.13039/501100001809
– fundername: Universidade de Macau
  grantid: MYRG2016-00137-FST; MYRG2018-00029-FST
  funderid: 10.13039/501100004733
– fundername: Macau Science and Technology Development Fund
  grantid: FDCT/022/2017/A1; FDCT/077/2018/A2
– fundername: China Postdoctoral Science Foundation
  grantid: BX20180197; 2019M651496
  funderid: 10.13039/501100002858
GroupedDBID ---
-~X
.DC
0R~
29I
4.4
53G
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
RIA
RIE
RNS
TAE
TN5
VH1
AAYXX
CITATION
NPM
Z5M
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c347t-4a1cc0029d836e9da8e23a4a74d7cfed8c7001876eb02b0a2b65624d9d8036ea3
IEDL.DBID RIE
ISICitedReferencesCount 219
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000510750900069&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1057-7149
1941-0042
IngestDate Sun Sep 28 00:37:37 EDT 2025
Mon Jun 30 10:18:03 EDT 2025
Wed Feb 19 02:09:36 EST 2025
Sat Nov 29 03:21:11 EST 2025
Tue Nov 18 21:35:28 EST 2025
Wed Aug 27 02:29:44 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c347t-4a1cc0029d836e9da8e23a4a74d7cfed8c7001876eb02b0a2b65624d9d8036ea3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-6015-2618
0000-0003-1858-8538
0000-0001-8165-9322
0000-0001-5693-0416
0000-0001-5241-0069
PMID 31976898
PQID 2349126348
PQPubID 85429
PageCount 15
ParticipantIDs ieee_primary_8962278
proquest_miscellaneous_2344264335
crossref_citationtrail_10_1109_TIP_2020_2966082
pubmed_primary_31976898
proquest_journals_2349126348
crossref_primary_10_1109_TIP_2020_2966082
PublicationCentury 2000
PublicationDate 2020-01-01
PublicationDateYYYYMMDD 2020-01-01
PublicationDate_xml – month: 01
  year: 2020
  text: 2020-01-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: New York
PublicationTitle IEEE transactions on image processing
PublicationTitleAbbrev TIP
PublicationTitleAlternate IEEE Trans Image Process
PublicationYear 2020
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref57
ref13
ref12
ref59
ref15
ref14
min (ref7) 2014
ref52
song (ref6) 2013; 6
ref55
ref11
ref54
brookes (ref58) 2017
ref16
ref18
ref51
ma (ref27) 2005; 7
liu (ref50) 2009
ref46
ref45
ref48
ref47
jiang (ref25) 2017
ref42
ref41
ref44
ref43
coutrot (ref5) 2012; 5
ref49
ref8
ref9
ref4
ref3
ref40
ref35
harel (ref10) 2006
ref34
ref37
ref36
ref30
cornia (ref23) 2016; 27
ref33
hou (ref56) 2009
ref32
ref2
ref1
ref39
ref38
chang (ref53) 2013
cerf (ref17) 2008
ref24
tavakoli (ref31) 2019
ref26
ref20
ref21
guo (ref19) 2008
ref28
ref29
pan (ref22) 2017
ref60
ref61
References_xml – start-page: 241
  year: 2008
  ident: ref17
  article-title: Predicting human gaze using low-level saliency combined with face detection
  publication-title: Proc Adv Neu Inf Proc Sys
– volume: 5
  start-page: 1
  year: 2012
  ident: ref5
  article-title: Influence of soundtrack on eye movements during video exploration
  publication-title: J Eye Movement Res
  doi: 10.16910/jemr.5.4.2
– ident: ref34
  doi: 10.1109/ICASSP.2016.7472197
– ident: ref42
  doi: 10.1162/neco.1995.7.5.889
– ident: ref45
  doi: 10.1109/TMM.2012.2228476
– ident: ref43
  doi: 10.1016/j.dsp.2019.02.017
– ident: ref51
  doi: 10.1093/biomet/28.3-4.321
– ident: ref3
  doi: 10.3758/BF03211521
– start-page: 681
  year: 2009
  ident: ref56
  article-title: Dynamic visual attention: Searching for coding length increments
  publication-title: Proc Adv Neural Inf Process Syst
– start-page: 1
  year: 2008
  ident: ref19
  article-title: Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform
  publication-title: Proc IEEE Conf Comput Vis Pattern Recognit
– ident: ref8
  doi: 10.1145/2996463
– ident: ref61
  doi: 10.1109/TMM.2019.2902097
– ident: ref47
  doi: 10.1162/0899766042321814
– ident: ref46
  doi: 10.1007/978-3-540-74048-3
– ident: ref48
  doi: 10.1007/978-3-642-33783-3_45
– ident: ref49
  doi: 10.1109/83.931096
– year: 2017
  ident: ref25
  article-title: Predicting video saliency with object-to-motion CNN and two-layer convolutional LSTM
  publication-title: arXiv 1709 06316
– ident: ref33
  doi: 10.1109/EUSIPCO.2015.7362640
– ident: ref4
  doi: 10.1037/0096-1523.26.5.1583
– ident: ref35
  doi: 10.1038/nrn2787
– ident: ref52
  doi: 10.1017/CBO9780511809682
– ident: ref1
  doi: 10.1109/TPAMI.2012.89
– ident: ref21
  doi: 10.1109/TIP.2015.2425544
– ident: ref16
  doi: 10.1109/TMM.2017.2694219
– ident: ref32
  doi: 10.1167/14.8.5
– ident: ref15
  doi: 10.1109/LSP.2015.2413944
– volume: 7
  start-page: 907
  year: 2005
  ident: ref27
  article-title: A generic framework of user attention model and its application in video summarization
  publication-title: IEEE Trans Multimedia
  doi: 10.1109/TMM.2005.854410
– ident: ref24
  doi: 10.1109/ICCV.2017.513
– ident: ref30
  doi: 10.1109/JSTSP.2011.2165199
– ident: ref18
  doi: 10.1109/ICCV.2009.5459462
– start-page: 316
  year: 2013
  ident: ref53
  article-title: Canonical correlation analysis based on Hilbert-Schmidt independence criterion and centered kernel target alignment
  publication-title: Proc Int Conf Mach Learn
– volume: 6
  start-page: 1
  year: 2013
  ident: ref6
  article-title: Different types of sounds influence gaze differently in videos
  publication-title: J Eye Movement Res
  doi: 10.16910/jemr.6.4.1
– ident: ref11
  doi: 10.1109/CVPR.2007.383267
– ident: ref9
  doi: 10.1109/34.730558
– ident: ref57
  doi: 10.1109/TPAMI.2018.2815601
– ident: ref55
  doi: 10.1109/WIAMIS.2013.6616164
– ident: ref29
  doi: 10.1109/TCSVT.2014.2329380
– ident: ref12
  doi: 10.1167/8.7.32
– ident: ref13
  doi: 10.1109/ICCV.2013.26
– ident: ref36
  doi: 10.1016/j.jphysparis.2006.10.001
– ident: ref14
  doi: 10.1109/TPAMI.2012.147
– year: 2017
  ident: ref58
  publication-title: VOICEBOX Speech Processing Toolbox for MATLAB
– ident: ref54
  doi: 10.1145/3136624
– ident: ref37
  doi: 10.1109/TIP.2011.2161092
– ident: ref20
  doi: 10.1167/9.12.15
– start-page: 545
  year: 2006
  ident: ref10
  article-title: Graph-based visual saliency
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref41
  doi: 10.1109/TBC.2018.2816783
– ident: ref44
  doi: 10.1109/TCSVT.2013.2273613
– ident: ref60
  doi: 10.1109/TITS.2018.2868771
– ident: ref28
  doi: 10.1109/TMM.2013.2267205
– start-page: 153
  year: 2014
  ident: ref7
  article-title: Sound influences visual attention discriminately in videos
  publication-title: Proc Int Workshop Quality Multimedia Exper (QoMEX)
– ident: ref26
  doi: 10.1109/CVPR.2018.00514
– ident: ref2
  doi: 10.1109/TIP.2015.2487833
– ident: ref59
  doi: 10.1073/pnas.1510393112
– ident: ref39
  doi: 10.1109/TIP.2017.2735192
– ident: ref38
  doi: 10.1109/TMM.2014.2373812
– year: 2009
  ident: ref50
  article-title: Beyond pixels: Exploring new representations and applications for motion analysis
– volume: 27
  start-page: 5142
  year: 2016
  ident: ref23
  article-title: Predicting human eye fixations via an LSTM-based saliency attentive model
  publication-title: IEEE Trans Image Process
  doi: 10.1109/TIP.2018.2851672
– ident: ref40
  doi: 10.1109/TMM.2017.2788206
– year: 2017
  ident: ref22
  article-title: SalGAN: Visual saliency prediction with generative adversarial networks
  publication-title: arXiv 1701 01081
– year: 2019
  ident: ref31
  article-title: DAVE: A deep audio-visual embedding for dynamic saliency prediction
  publication-title: arXiv 1905 10693
SSID ssj0014516
Score 2.66297
Snippet Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 3805
SubjectTerms Adaptation models
attention fusion
Audio data
Audio-visual attention
Brain modeling
Computational modeling
Correlation analysis
Free energy
Machine learning
Modal analysis
Modelling
multimodal
Predictive models
Salience
saliency
Sound sources
Videos
visual attention
Visual databases
Visualization
Title A Multimodal Saliency Model for Videos With High Audio-Visual Correspondence
URI https://ieeexplore.ieee.org/document/8962278
https://www.ncbi.nlm.nih.gov/pubmed/31976898
https://www.proquest.com/docview/2349126348
https://www.proquest.com/docview/2344264335
Volume 29
WOSCitedRecordID wos000510750900069&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 1941-0042
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014516
  issn: 1057-7149
  databaseCode: RIE
  dateStart: 19920101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3faxQxEB7a4oM-WG39cVpLBF8Et5cmuU3yeBSLQikFa723ZZPM0oW6W7q3_fs7ye4tCir4tpBJNmQmzJdMZj6AD15JZ0NZZcI7k6lYWdugrTJnyJ2W1cKaKhVxPdPn52a1shdb8GnKhUHE9PgMj-JniuWH1vfxqmxubB4zN7dhW-t8yNWaIgaRcDZFNhc60wT7NyFJbueXXy_oICj4kYilKE0ksCHDI5xtzW_eKNGr_B1pJo9zuvt_c30GT0dkyZaDKTyHLWz2YHdEmWzcw90ePPmlBOE-nC1ZysH92Qbq_I1QeczFZJEi7YYRoGVXdcC2Yz_q9TWLj0LYsg91m13VXU8dThK5x23bJGrSF_D99PPlyZdsZFjIvFR6nany2PsYmAtG5kg6MyhkqUqtgvYVBuN1Yu3L0XHheCkcwT-hAsmT58NSvoSdpm3wNbCcY85FwEUpvNLOOcIaKLjlVrtcOzGD-WalCz-WH48sGDdFOoZwW5CaiqimYlTTDD5OPW6H0hv_kN2PKpjkxtWfwcFGmcW4N7tCSGWPRS4VNb-fmmlXxVBJ2WDbJ5kIFaVczODVYATT2BvbefPnf76Fx3FmwzXNAeys73p8B4_8_bru7g7JdFfmMJnuA6d-5oU
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1RaxQxEB5qLagP1rZaT6um4Ivg9tIku0kej2Jp8TwKXmvflk0yiwvtbune-vtNsnuLggq-LWSSDZkJ8yWTmQ_gvRXcaFeUCbNGJSJU1laoy8Qo706LMtWqjEVc53KxUNfX-mIDPo65MIgYH5_hUfiMsXzX2C5clU2VzkLm5gN4mArBaJ-tNcYMAuVsjG2mMpEe-K-DklRPl-cX_ijI6BELxShVoLDxpueRtla_-aNIsPJ3rBl9zun2_832GTwdsCWZ9cawAxtY78L2gDPJsIvbXXjySxHCPZjPSMzCvW2c7_zV4_KQjUkCSdoN8ZCWXFUOm5Z8q1bfSXgWQmadq5rkqmo73-Ek0nvcNXUkJ30Ol6eflidnycCxkFgu5CoRxbG1ITTnFM_Qa00h44UopHDSluiUlZG3L0NDmaEFMx4AMuG8vPd9WPAXsFk3Nb4EklHMKHOYFswKaYzxaAMZ1VRLk0nDJjBdr3RuhwLkgQfjJo8HEapzr6Y8qCkf1DSBD2OPu774xj9k94IKRrlh9SdwsFZmPuzONmdc6GOWceGbD8dmv69CsKSosemiTACLnKcT2O-NYBx7bTuv_vzPd_DobPllns_PF59fw-Mwy_7S5gA2V_cdvoEt-2NVtfdvowH_BF7P6OQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Multimodal+Saliency+Model+for+Videos+With+High+Audio-Visual+Correspondence&rft.jtitle=IEEE+transactions+on+image+processing&rft.au=Min%2C+Xiongkuo&rft.au=Zhai%2C+Guangtao&rft.au=Zhou%2C+Jiantao&rft.au=Zhang%2C+Xiao-Ping&rft.date=2020-01-01&rft.issn=1057-7149&rft.eissn=1941-0042&rft.volume=29&rft.spage=3805&rft.epage=3819&rft_id=info:doi/10.1109%2FTIP.2020.2966082&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TIP_2020_2966082
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1057-7149&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1057-7149&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1057-7149&client=summon