A Multimodal Saliency Model for Videos With High Audio-Visual Correspondence
Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS)...
Uloženo v:
| Vydáno v: | IEEE transactions on image processing Ročník 29; s. 3805 - 3819 |
|---|---|
| Hlavní autoři: | , , , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
United States
IEEE
01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 1057-7149, 1941-0042, 1941-0042 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS) model for videos containing scenes with high audio-visual correspondence. In such scenes, humans tend to be attracted by the sound sources and it is also possible to localize the sound sources via cross-modal analysis. Specifically, we first detect the spatial and temporal saliency maps from the visual modality by using a novel free energy principle. Then we propose to detect the audio saliency map from both audio and visual modalities by localizing the moving-sounding objects using cross-modal kernel canonical correlation analysis, which is first of its kind in the literature. Finally we propose a new two-stage adaptive audiovisual saliency fusion method to integrate the spatial, temporal and audio saliency maps to our audio-visual saliency map. The proposed MMS model has captured the influence of audio, which is not considered in the latest deep learning based saliency models. To take advantages of both deep saliency modeling and audio-visual saliency modeling, we propose to combine deep saliency models and the MMS model via a later fusion, and we find that an average of 5% performance gain is obtained. Experimental results on audio-visual attention databases show that the introduced models incorporating audio cues have significant superiority over state-of-the-art image and video saliency models which utilize a single visual modality. |
|---|---|
| AbstractList | Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS) model for videos containing scenes with high audio-visual correspondence. In such scenes, humans tend to be attracted by the sound sources and it is also possible to localize the sound sources via cross-modal analysis. Specifically, we first detect the spatial and temporal saliency maps from the visual modality by using a novel free energy principle. Then we propose to detect the audio saliency map from both audio and visual modalities by localizing the moving-sounding objects using cross-modal kernel canonical correlation analysis, which is first of its kind in the literature. Finally we propose a new two-stage adaptive audiovisual saliency fusion method to integrate the spatial, temporal and audio saliency maps to our audio-visual saliency map. The proposed MMS model has captured the influence of audio, which is not considered in the latest deep learning based saliency models. To take advantages of both deep saliency modeling and audio-visual saliency modeling, we propose to combine deep saliency models and the MMS model via a later fusion, and we find that an average of 5% performance gain is obtained. Experimental results on audio-visual attention databases show that the introduced models incorporating audio cues have significant superiority over state-of-the-art image and video saliency models which utilize a single visual modality. Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS) model for videos containing scenes with high audio-visual correspondence. In such scenes, humans tend to be attracted by the sound sources and it is also possible to localize the sound sources via cross-modal analysis. Specifically, we first detect the spatial and temporal saliency maps from the visual modality by using a novel free energy principle. Then we propose to detect the audio saliency map from both audio and visual modalities by localizing the moving-sounding objects using cross-modal kernel canonical correlation analysis, which is first of its kind in the literature. Finally we propose a new two-stage adaptive audiovisual saliency fusion method to integrate the spatial, temporal and audio saliency maps to our audio-visual saliency map. The proposed MMS model has captured the influence of audio, which is not considered in the latest deep learning based saliency models. To take advantages of both deep saliency modeling and audio-visual saliency modeling, we propose to combine deep saliency models and the MMS model via a later fusion, and we find that an average of 5% performance gain is obtained. Experimental results on audio-visual attention databases show that the introduced models incorporating audio cues have significant superiority over state-of-the-art image and video saliency models which utilize a single visual modality.Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such influence has been widely investigated and proofed by many psychological studies. In this paper, we propose a novel multi-modal saliency (MMS) model for videos containing scenes with high audio-visual correspondence. In such scenes, humans tend to be attracted by the sound sources and it is also possible to localize the sound sources via cross-modal analysis. Specifically, we first detect the spatial and temporal saliency maps from the visual modality by using a novel free energy principle. Then we propose to detect the audio saliency map from both audio and visual modalities by localizing the moving-sounding objects using cross-modal kernel canonical correlation analysis, which is first of its kind in the literature. Finally we propose a new two-stage adaptive audiovisual saliency fusion method to integrate the spatial, temporal and audio saliency maps to our audio-visual saliency map. The proposed MMS model has captured the influence of audio, which is not considered in the latest deep learning based saliency models. To take advantages of both deep saliency modeling and audio-visual saliency modeling, we propose to combine deep saliency models and the MMS model via a later fusion, and we find that an average of 5% performance gain is obtained. Experimental results on audio-visual attention databases show that the introduced models incorporating audio cues have significant superiority over state-of-the-art image and video saliency models which utilize a single visual modality. |
| Author | Min, Xiongkuo Guan, Xinping Yang, Xiaokang Zhou, Jiantao Zhai, Guangtao Zhang, Xiao-Ping |
| Author_xml | – sequence: 1 givenname: Xiongkuo orcidid: 0000-0001-5693-0416 surname: Min fullname: Min, Xiongkuo email: minxiongkuo@gmail.com organization: Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China – sequence: 2 givenname: Guangtao orcidid: 0000-0001-8165-9322 surname: Zhai fullname: Zhai, Guangtao email: zhaiguangtao@sjtu.edu.cn organization: Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China – sequence: 3 givenname: Jiantao orcidid: 0000-0002-6015-2618 surname: Zhou fullname: Zhou, Jiantao email: jtzhou@umac.mo organization: Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau, China – sequence: 4 givenname: Xiao-Ping orcidid: 0000-0001-5241-0069 surname: Zhang fullname: Zhang, Xiao-Ping email: xzhang@ee.ryerson.ca organization: Department of Electrical and Computer Engineering, Ryerson University, ON, Toronto, Canada – sequence: 5 givenname: Xiaokang surname: Yang fullname: Yang, Xiaokang email: xkyang@sjtu.edu.cn organization: Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China – sequence: 6 givenname: Xinping orcidid: 0000-0003-1858-8538 surname: Guan fullname: Guan, Xinping email: xpguan@sjtu.edu.cn organization: Department of Automation, Shanghai Jiao Tong University, Shanghai, China |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/31976898$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9kctrGzEQh0VJSGI390KhLPTSyzqjh_U4GtPUAZsEmsdxkaVxo7BeudLuIf99ZOzmkENPM4fvmxnmNyInXeyQkC8UJpSCubq_uZswYDBhRkrQ7BO5oEbQGkCwk9LDVNWKCnNORjm_AFAxpfKMnHNqlNRGX5DlrFoNbR-20du2-m3bgJ17rVbRY1ttYqoeg8eYq6fQP1eL8Oe5mg0-xPox5KEI85gS5l3sfNHwMznd2Dbj5bGOycP1z_v5ol7e_rqZz5a140L1tbDUOQBmvOYSjbcaGbfCKuGV26DXTpVTtZK4BrYGy9ZyKpnwhYciWD4mPw5zdyn-HTD3zTZkh21rO4xDbhgXgknB-bSg3z-gL3FIXbluTxnKJBe6UN-O1LDeom92KWxtem3-_akAcABcijkn3LwjFJp9FE2JotlH0RyjKIr8oLjQ2z7Erk82tP8Tvx7EgIjve7SRjCnN3wCvJZNp |
| CODEN | IIPRE4 |
| CitedBy_id | crossref_primary_10_1016_j_displa_2025_103061 crossref_primary_10_1016_j_jai_2024_06_002 crossref_primary_10_1016_j_displa_2025_103060 crossref_primary_10_1088_1741_2552_ad49d7 crossref_primary_10_3389_fnins_2022_876969 crossref_primary_10_1109_TIP_2022_3175619 crossref_primary_10_1109_JSTARS_2020_3021390 crossref_primary_10_1007_s00371_024_03316_3 crossref_primary_10_1016_j_cviu_2023_103695 crossref_primary_10_1016_j_displa_2024_102890 crossref_primary_10_3390_su151914104 crossref_primary_10_1016_j_eswa_2023_121709 crossref_primary_10_1142_S0219649222500666 crossref_primary_10_1145_3457905 crossref_primary_10_1109_TMM_2024_3353456 crossref_primary_10_1038_s41598_025_07710_2 crossref_primary_10_1109_TMM_2023_3268370 crossref_primary_10_1016_j_displa_2024_102898 crossref_primary_10_1109_ACCESS_2021_3053956 crossref_primary_10_1016_j_displa_2024_102899 crossref_primary_10_3390_rs15194809 crossref_primary_10_3390_s23239592 crossref_primary_10_1016_j_displa_2024_102896 crossref_primary_10_1109_ACCESS_2021_3095197 crossref_primary_10_47820_recima21_v6i4_6346 crossref_primary_10_1016_j_displa_2024_102652 crossref_primary_10_1016_j_displa_2024_102653 crossref_primary_10_1109_TPAMI_2025_3573729 crossref_primary_10_3389_fpsyg_2021_767840 crossref_primary_10_1016_j_displa_2024_102818 crossref_primary_10_1016_j_displa_2023_102417 crossref_primary_10_1109_TIP_2025_3563076 crossref_primary_10_1016_j_eswa_2024_123323 crossref_primary_10_1109_TCYB_2020_3037208 crossref_primary_10_1109_ACCESS_2025_3592569 crossref_primary_10_1016_j_displa_2024_102881 crossref_primary_10_1016_j_displa_2024_102882 crossref_primary_10_1016_j_displa_2024_102880 crossref_primary_10_1007_s00426_024_02018_8 crossref_primary_10_1016_j_aei_2025_103865 crossref_primary_10_1038_s41598_025_88261_4 crossref_primary_10_1109_TVCG_2023_3320237 crossref_primary_10_1109_ACCESS_2023_3344813 crossref_primary_10_1016_j_displa_2024_102883 crossref_primary_10_1016_j_jvcir_2020_103004 crossref_primary_10_1007_s00371_023_02809_x crossref_primary_10_1016_j_jvcir_2024_104095 crossref_primary_10_1016_j_displa_2024_102804 crossref_primary_10_1016_j_neucom_2022_05_098 crossref_primary_10_1038_s41598_025_91839_7 crossref_primary_10_1109_TIP_2022_3206621 crossref_primary_10_1109_TIP_2023_3251695 crossref_primary_10_1016_j_knosys_2024_112536 crossref_primary_10_1016_j_displa_2025_103125 crossref_primary_10_1016_j_displa_2025_103003 crossref_primary_10_1016_j_displa_2024_102791 crossref_primary_10_1016_j_displa_2025_103002 crossref_primary_10_1016_j_displa_2024_102671 crossref_primary_10_1016_j_displa_2024_102792 crossref_primary_10_1109_TETCI_2024_3386619 crossref_primary_10_1016_j_displa_2025_103087 crossref_primary_10_1016_j_displa_2024_102955 crossref_primary_10_1016_j_displa_2024_102953 crossref_primary_10_1109_TETCI_2024_3358184 crossref_primary_10_1016_j_displa_2025_103008 crossref_primary_10_1016_j_neucom_2025_129855 crossref_primary_10_1016_j_displa_2022_102238 crossref_primary_10_1016_j_displa_2024_102677 crossref_primary_10_1016_j_displa_2025_103007 crossref_primary_10_1016_j_displa_2025_103128 crossref_primary_10_1109_TIP_2024_3501074 crossref_primary_10_1109_TMM_2023_3306596 crossref_primary_10_3390_s23094241 crossref_primary_10_1109_ACCESS_2023_3244191 crossref_primary_10_1109_TBC_2023_3308329 crossref_primary_10_1109_TIP_2022_3219228 crossref_primary_10_1016_j_displa_2025_103126 crossref_primary_10_1016_j_jvcir_2022_103553 crossref_primary_10_1016_j_neunet_2024_106752 crossref_primary_10_1038_s41598_025_07656_5 crossref_primary_10_1016_j_cviu_2022_103602 crossref_primary_10_1007_s00371_021_02355_4 crossref_primary_10_1016_j_dsp_2023_104272 crossref_primary_10_1016_j_displa_2025_103136 crossref_primary_10_1016_j_displa_2025_103132 crossref_primary_10_1016_j_displa_2025_103097 crossref_primary_10_1016_j_displa_2025_103130 crossref_primary_10_1109_TMM_2023_3271022 crossref_primary_10_1007_s00371_023_02797_y crossref_primary_10_3389_fncom_2021_746549 crossref_primary_10_1109_TMM_2025_3535280 crossref_primary_10_1016_j_displa_2025_103139 crossref_primary_10_1145_3508361 crossref_primary_10_1016_j_neucom_2023_126775 crossref_primary_10_1016_j_displa_2025_103137 crossref_primary_10_1016_j_ins_2025_122673 crossref_primary_10_1016_j_displa_2024_102709 crossref_primary_10_1109_TCSVT_2020_3030895 crossref_primary_10_1109_TIP_2025_3593911 crossref_primary_10_1109_TIP_2024_3430080 crossref_primary_10_1109_TIP_2024_3378466 crossref_primary_10_1016_j_displa_2023_102450 crossref_primary_10_1016_j_displa_2022_102175 crossref_primary_10_1016_j_displa_2025_102979 crossref_primary_10_3390_rs15163951 crossref_primary_10_1109_TIP_2021_3113799 crossref_primary_10_1016_j_displa_2022_102177 crossref_primary_10_1016_j_displa_2023_102575 crossref_primary_10_1186_s42492_024_00171_w crossref_primary_10_1016_j_displa_2022_102334 crossref_primary_10_1016_j_image_2025_117370 crossref_primary_10_1007_s11432_019_2757_1 crossref_primary_10_1109_TMM_2023_3325719 crossref_primary_10_1109_ACCESS_2023_3287860 crossref_primary_10_4018_IJFSA_343490 crossref_primary_10_1016_j_neucom_2021_11_100 crossref_primary_10_3390_electronics12244961 crossref_primary_10_1038_s41598_025_12306_x crossref_primary_10_1109_LSP_2024_3452556 crossref_primary_10_1007_s11280_022_01027_0 crossref_primary_10_1371_journal_pone_0323285 crossref_primary_10_1016_j_dsp_2024_104592 crossref_primary_10_1016_j_displa_2023_102463 crossref_primary_10_1109_TMC_2020_3004534 crossref_primary_10_3390_s23198101 crossref_primary_10_1007_s42979_024_03279_1 crossref_primary_10_1016_j_displa_2023_102504 crossref_primary_10_1016_j_displa_2024_102843 crossref_primary_10_1109_TCSVT_2022_3203421 crossref_primary_10_1016_j_displa_2023_102585 crossref_primary_10_1109_TMM_2021_3093717 crossref_primary_10_3390_ai1040030 crossref_primary_10_1177_14727978251366487 crossref_primary_10_1038_s41598_025_01146_4 crossref_primary_10_1109_TMM_2022_3176942 crossref_primary_10_1016_j_displa_2024_102728 crossref_primary_10_1109_TIP_2021_3073283 crossref_primary_10_1007_s11042_024_19368_5 crossref_primary_10_1016_j_displa_2024_102729 crossref_primary_10_1016_j_displa_2023_102629 crossref_primary_10_1016_j_dsp_2024_104666 crossref_primary_10_1109_TBC_2024_3511927 crossref_primary_10_1109_TCE_2023_3325744 crossref_primary_10_3390_technologies11060178 crossref_primary_10_3390_bdcc9050120 crossref_primary_10_1016_j_aei_2025_103350 crossref_primary_10_1016_j_imavis_2024_104955 crossref_primary_10_1016_j_displa_2021_102072 crossref_primary_10_1016_j_displa_2025_102992 crossref_primary_10_1109_TMM_2024_3380260 crossref_primary_10_1016_j_displa_2023_102430 crossref_primary_10_1016_j_displa_2025_103046 crossref_primary_10_1016_j_displa_2025_103045 crossref_primary_10_1007_s11432_024_4133_3 crossref_primary_10_1016_j_displa_2023_102557 crossref_primary_10_1016_j_displa_2024_102879 crossref_primary_10_3390_app13148482 crossref_primary_10_1016_j_displa_2024_102877 crossref_primary_10_1145_3698881 crossref_primary_10_1016_j_displa_2024_102872 crossref_primary_10_1016_j_displa_2021_102115 crossref_primary_10_1016_j_knosys_2024_111655 crossref_primary_10_1109_TIP_2023_3341700 crossref_primary_10_1038_s41598_025_07442_3 crossref_primary_10_1038_s41598_025_97662_4 crossref_primary_10_1016_j_displa_2023_102439 crossref_primary_10_1007_s10586_025_05538_z crossref_primary_10_1109_TBC_2020_3028335 crossref_primary_10_1038_s41598_025_95147_y crossref_primary_10_1109_TMM_2020_3029891 crossref_primary_10_3390_jimaging9120281 crossref_primary_10_1016_j_displa_2025_103179 crossref_primary_10_1038_s41598_024_83978_0 crossref_primary_10_1109_TIP_2023_3242774 crossref_primary_10_1016_j_knosys_2022_109675 crossref_primary_10_1016_j_displa_2025_103174 crossref_primary_10_1016_j_displa_2025_103173 crossref_primary_10_1016_j_displa_2024_102900 crossref_primary_10_1145_3565267 crossref_primary_10_1016_j_displa_2024_102901 crossref_primary_10_1016_j_displa_2024_102744 crossref_primary_10_1016_j_displa_2024_102865 crossref_primary_10_1016_j_displa_2024_102866 crossref_primary_10_1109_TMM_2023_3338412 crossref_primary_10_1109_TETCI_2023_3341333 crossref_primary_10_1016_j_displa_2023_102444 crossref_primary_10_3389_fnbot_2024_1489021 crossref_primary_10_1109_TIP_2024_3461956 crossref_primary_10_1109_TIP_2023_3290528 |
| Cites_doi | 10.16910/jemr.5.4.2 10.1109/ICASSP.2016.7472197 10.1162/neco.1995.7.5.889 10.1109/TMM.2012.2228476 10.1016/j.dsp.2019.02.017 10.1093/biomet/28.3-4.321 10.3758/BF03211521 10.1145/2996463 10.1109/TMM.2019.2902097 10.1162/0899766042321814 10.1007/978-3-540-74048-3 10.1007/978-3-642-33783-3_45 10.1109/83.931096 10.1109/EUSIPCO.2015.7362640 10.1037/0096-1523.26.5.1583 10.1038/nrn2787 10.1017/CBO9780511809682 10.1109/TPAMI.2012.89 10.1109/TIP.2015.2425544 10.1109/TMM.2017.2694219 10.1167/14.8.5 10.1109/LSP.2015.2413944 10.1109/TMM.2005.854410 10.1109/ICCV.2017.513 10.1109/JSTSP.2011.2165199 10.1109/ICCV.2009.5459462 10.16910/jemr.6.4.1 10.1109/CVPR.2007.383267 10.1109/34.730558 10.1109/TPAMI.2018.2815601 10.1109/WIAMIS.2013.6616164 10.1109/TCSVT.2014.2329380 10.1167/8.7.32 10.1109/ICCV.2013.26 10.1016/j.jphysparis.2006.10.001 10.1109/TPAMI.2012.147 10.1145/3136624 10.1109/TIP.2011.2161092 10.1167/9.12.15 10.1109/TBC.2018.2816783 10.1109/TCSVT.2013.2273613 10.1109/TITS.2018.2868771 10.1109/TMM.2013.2267205 10.1109/CVPR.2018.00514 10.1109/TIP.2015.2487833 10.1073/pnas.1510393112 10.1109/TIP.2017.2735192 10.1109/TMM.2014.2373812 10.1109/TIP.2018.2851672 10.1109/TMM.2017.2788206 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020 |
| DBID | 97E RIA RIE AAYXX CITATION NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
| DOI | 10.1109/TIP.2020.2966082 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitle | CrossRef PubMed Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitleList | PubMed Technology Research Database MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 3 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences Engineering |
| EISSN | 1941-0042 |
| EndPage | 3819 |
| ExternalDocumentID | 31976898 10_1109_TIP_2020_2966082 8962278 |
| Genre | orig-research Journal Article |
| GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: 61901260; 61831015; 61521062; 61527804 funderid: 10.13039/501100001809 – fundername: Universidade de Macau grantid: MYRG2016-00137-FST; MYRG2018-00029-FST funderid: 10.13039/501100004733 – fundername: Macau Science and Technology Development Fund grantid: FDCT/022/2017/A1; FDCT/077/2018/A2 – fundername: China Postdoctoral Science Foundation grantid: BX20180197; 2019M651496 funderid: 10.13039/501100002858 |
| GroupedDBID | --- -~X .DC 0R~ 29I 4.4 53G 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RNS TAE TN5 VH1 AAYXX CITATION NPM Z5M 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
| ID | FETCH-LOGICAL-c347t-4a1cc0029d836e9da8e23a4a74d7cfed8c7001876eb02b0a2b65624d9d8036ea3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 219 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000510750900069&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1057-7149 1941-0042 |
| IngestDate | Sun Sep 28 00:37:37 EDT 2025 Mon Jun 30 10:18:03 EDT 2025 Wed Feb 19 02:09:36 EST 2025 Sat Nov 29 03:21:11 EST 2025 Tue Nov 18 21:35:28 EST 2025 Wed Aug 27 02:29:44 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c347t-4a1cc0029d836e9da8e23a4a74d7cfed8c7001876eb02b0a2b65624d9d8036ea3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0002-6015-2618 0000-0003-1858-8538 0000-0001-8165-9322 0000-0001-5693-0416 0000-0001-5241-0069 |
| PMID | 31976898 |
| PQID | 2349126348 |
| PQPubID | 85429 |
| PageCount | 15 |
| ParticipantIDs | ieee_primary_8962278 proquest_miscellaneous_2344264335 crossref_citationtrail_10_1109_TIP_2020_2966082 pubmed_primary_31976898 proquest_journals_2349126348 crossref_primary_10_1109_TIP_2020_2966082 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-01-01 |
| PublicationDateYYYYMMDD | 2020-01-01 |
| PublicationDate_xml | – month: 01 year: 2020 text: 2020-01-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States – name: New York |
| PublicationTitle | IEEE transactions on image processing |
| PublicationTitleAbbrev | TIP |
| PublicationTitleAlternate | IEEE Trans Image Process |
| PublicationYear | 2020 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref57 ref13 ref12 ref59 ref15 ref14 min (ref7) 2014 ref52 song (ref6) 2013; 6 ref55 ref11 ref54 brookes (ref58) 2017 ref16 ref18 ref51 ma (ref27) 2005; 7 liu (ref50) 2009 ref46 ref45 ref48 ref47 jiang (ref25) 2017 ref42 ref41 ref44 ref43 coutrot (ref5) 2012; 5 ref49 ref8 ref9 ref4 ref3 ref40 ref35 harel (ref10) 2006 ref34 ref37 ref36 ref30 cornia (ref23) 2016; 27 ref33 hou (ref56) 2009 ref32 ref2 ref1 ref39 ref38 chang (ref53) 2013 cerf (ref17) 2008 ref24 tavakoli (ref31) 2019 ref26 ref20 ref21 guo (ref19) 2008 ref28 ref29 pan (ref22) 2017 ref60 ref61 |
| References_xml | – start-page: 241 year: 2008 ident: ref17 article-title: Predicting human gaze using low-level saliency combined with face detection publication-title: Proc Adv Neu Inf Proc Sys – volume: 5 start-page: 1 year: 2012 ident: ref5 article-title: Influence of soundtrack on eye movements during video exploration publication-title: J Eye Movement Res doi: 10.16910/jemr.5.4.2 – ident: ref34 doi: 10.1109/ICASSP.2016.7472197 – ident: ref42 doi: 10.1162/neco.1995.7.5.889 – ident: ref45 doi: 10.1109/TMM.2012.2228476 – ident: ref43 doi: 10.1016/j.dsp.2019.02.017 – ident: ref51 doi: 10.1093/biomet/28.3-4.321 – ident: ref3 doi: 10.3758/BF03211521 – start-page: 681 year: 2009 ident: ref56 article-title: Dynamic visual attention: Searching for coding length increments publication-title: Proc Adv Neural Inf Process Syst – start-page: 1 year: 2008 ident: ref19 article-title: Spatio-temporal saliency detection using phase spectrum of quaternion Fourier transform publication-title: Proc IEEE Conf Comput Vis Pattern Recognit – ident: ref8 doi: 10.1145/2996463 – ident: ref61 doi: 10.1109/TMM.2019.2902097 – ident: ref47 doi: 10.1162/0899766042321814 – ident: ref46 doi: 10.1007/978-3-540-74048-3 – ident: ref48 doi: 10.1007/978-3-642-33783-3_45 – ident: ref49 doi: 10.1109/83.931096 – year: 2017 ident: ref25 article-title: Predicting video saliency with object-to-motion CNN and two-layer convolutional LSTM publication-title: arXiv 1709 06316 – ident: ref33 doi: 10.1109/EUSIPCO.2015.7362640 – ident: ref4 doi: 10.1037/0096-1523.26.5.1583 – ident: ref35 doi: 10.1038/nrn2787 – ident: ref52 doi: 10.1017/CBO9780511809682 – ident: ref1 doi: 10.1109/TPAMI.2012.89 – ident: ref21 doi: 10.1109/TIP.2015.2425544 – ident: ref16 doi: 10.1109/TMM.2017.2694219 – ident: ref32 doi: 10.1167/14.8.5 – ident: ref15 doi: 10.1109/LSP.2015.2413944 – volume: 7 start-page: 907 year: 2005 ident: ref27 article-title: A generic framework of user attention model and its application in video summarization publication-title: IEEE Trans Multimedia doi: 10.1109/TMM.2005.854410 – ident: ref24 doi: 10.1109/ICCV.2017.513 – ident: ref30 doi: 10.1109/JSTSP.2011.2165199 – ident: ref18 doi: 10.1109/ICCV.2009.5459462 – start-page: 316 year: 2013 ident: ref53 article-title: Canonical correlation analysis based on Hilbert-Schmidt independence criterion and centered kernel target alignment publication-title: Proc Int Conf Mach Learn – volume: 6 start-page: 1 year: 2013 ident: ref6 article-title: Different types of sounds influence gaze differently in videos publication-title: J Eye Movement Res doi: 10.16910/jemr.6.4.1 – ident: ref11 doi: 10.1109/CVPR.2007.383267 – ident: ref9 doi: 10.1109/34.730558 – ident: ref57 doi: 10.1109/TPAMI.2018.2815601 – ident: ref55 doi: 10.1109/WIAMIS.2013.6616164 – ident: ref29 doi: 10.1109/TCSVT.2014.2329380 – ident: ref12 doi: 10.1167/8.7.32 – ident: ref13 doi: 10.1109/ICCV.2013.26 – ident: ref36 doi: 10.1016/j.jphysparis.2006.10.001 – ident: ref14 doi: 10.1109/TPAMI.2012.147 – year: 2017 ident: ref58 publication-title: VOICEBOX Speech Processing Toolbox for MATLAB – ident: ref54 doi: 10.1145/3136624 – ident: ref37 doi: 10.1109/TIP.2011.2161092 – ident: ref20 doi: 10.1167/9.12.15 – start-page: 545 year: 2006 ident: ref10 article-title: Graph-based visual saliency publication-title: Proc Adv Neural Inf Process Syst – ident: ref41 doi: 10.1109/TBC.2018.2816783 – ident: ref44 doi: 10.1109/TCSVT.2013.2273613 – ident: ref60 doi: 10.1109/TITS.2018.2868771 – ident: ref28 doi: 10.1109/TMM.2013.2267205 – start-page: 153 year: 2014 ident: ref7 article-title: Sound influences visual attention discriminately in videos publication-title: Proc Int Workshop Quality Multimedia Exper (QoMEX) – ident: ref26 doi: 10.1109/CVPR.2018.00514 – ident: ref2 doi: 10.1109/TIP.2015.2487833 – ident: ref59 doi: 10.1073/pnas.1510393112 – ident: ref39 doi: 10.1109/TIP.2017.2735192 – ident: ref38 doi: 10.1109/TMM.2014.2373812 – year: 2009 ident: ref50 article-title: Beyond pixels: Exploring new representations and applications for motion analysis – volume: 27 start-page: 5142 year: 2016 ident: ref23 article-title: Predicting human eye fixations via an LSTM-based saliency attentive model publication-title: IEEE Trans Image Process doi: 10.1109/TIP.2018.2851672 – ident: ref40 doi: 10.1109/TMM.2017.2788206 – year: 2017 ident: ref22 article-title: SalGAN: Visual saliency prediction with generative adversarial networks publication-title: arXiv 1701 01081 – year: 2019 ident: ref31 article-title: DAVE: A deep audio-visual embedding for dynamic saliency prediction publication-title: arXiv 1905 10693 |
| SSID | ssj0014516 |
| Score | 2.66297 |
| Snippet | Audio information has been bypassed by most of current visual attention prediction studies. However, sound could have influence on visual attention and such... |
| SourceID | proquest pubmed crossref ieee |
| SourceType | Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 3805 |
| SubjectTerms | Adaptation models attention fusion Audio data Audio-visual attention Brain modeling Computational modeling Correlation analysis Free energy Machine learning Modal analysis Modelling multimodal Predictive models Salience saliency Sound sources Videos visual attention Visual databases Visualization |
| Title | A Multimodal Saliency Model for Videos With High Audio-Visual Correspondence |
| URI | https://ieeexplore.ieee.org/document/8962278 https://www.ncbi.nlm.nih.gov/pubmed/31976898 https://www.proquest.com/docview/2349126348 https://www.proquest.com/docview/2344264335 |
| Volume | 29 |
| WOSCitedRecordID | wos000510750900069&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1941-0042 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014516 issn: 1057-7149 databaseCode: RIE dateStart: 19920101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3faxQxEB7a4oM-WG39cVpLBF8Et5cmuU3yeBSLQikFa723ZZPM0oW6W7q3_fs7ye4tCir4tpBJNmQmzJdMZj6AD15JZ0NZZcI7k6lYWdugrTJnyJ2W1cKaKhVxPdPn52a1shdb8GnKhUHE9PgMj-JniuWH1vfxqmxubB4zN7dhW-t8yNWaIgaRcDZFNhc60wT7NyFJbueXXy_oICj4kYilKE0ksCHDI5xtzW_eKNGr_B1pJo9zuvt_c30GT0dkyZaDKTyHLWz2YHdEmWzcw90ePPmlBOE-nC1ZysH92Qbq_I1QeczFZJEi7YYRoGVXdcC2Yz_q9TWLj0LYsg91m13VXU8dThK5x23bJGrSF_D99PPlyZdsZFjIvFR6nany2PsYmAtG5kg6MyhkqUqtgvYVBuN1Yu3L0XHheCkcwT-hAsmT58NSvoSdpm3wNbCcY85FwEUpvNLOOcIaKLjlVrtcOzGD-WalCz-WH48sGDdFOoZwW5CaiqimYlTTDD5OPW6H0hv_kN2PKpjkxtWfwcFGmcW4N7tCSGWPRS4VNb-fmmlXxVBJ2WDbJ5kIFaVczODVYATT2BvbefPnf76Fx3FmwzXNAeys73p8B4_8_bru7g7JdFfmMJnuA6d-5oU |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1RaxQxEB5qLagP1rZaT6um4Ivg9tIku0kej2Jp8TwKXmvflk0yiwvtbune-vtNsnuLggq-LWSSDZkJ8yWTmQ_gvRXcaFeUCbNGJSJU1laoy8Qo706LMtWqjEVc53KxUNfX-mIDPo65MIgYH5_hUfiMsXzX2C5clU2VzkLm5gN4mArBaJ-tNcYMAuVsjG2mMpEe-K-DklRPl-cX_ijI6BELxShVoLDxpueRtla_-aNIsPJ3rBl9zun2_832GTwdsCWZ9cawAxtY78L2gDPJsIvbXXjySxHCPZjPSMzCvW2c7_zV4_KQjUkCSdoN8ZCWXFUOm5Z8q1bfSXgWQmadq5rkqmo73-Ek0nvcNXUkJ30Ol6eflidnycCxkFgu5CoRxbG1ITTnFM_Qa00h44UopHDSluiUlZG3L0NDmaEFMx4AMuG8vPd9WPAXsFk3Nb4EklHMKHOYFswKaYzxaAMZ1VRLk0nDJjBdr3RuhwLkgQfjJo8HEapzr6Y8qCkf1DSBD2OPu774xj9k94IKRrlh9SdwsFZmPuzONmdc6GOWceGbD8dmv69CsKSosemiTACLnKcT2O-NYBx7bTuv_vzPd_DobPllns_PF59fw-Mwy_7S5gA2V_cdvoEt-2NVtfdvowH_BF7P6OQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Multimodal+Saliency+Model+for+Videos+With+High+Audio-Visual+Correspondence&rft.jtitle=IEEE+transactions+on+image+processing&rft.au=Min%2C+Xiongkuo&rft.au=Zhai%2C+Guangtao&rft.au=Zhou%2C+Jiantao&rft.au=Zhang%2C+Xiao-Ping&rft.date=2020-01-01&rft.issn=1057-7149&rft.eissn=1941-0042&rft.volume=29&rft.spage=3805&rft.epage=3819&rft_id=info:doi/10.1109%2FTIP.2020.2966082&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TIP_2020_2966082 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1057-7149&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1057-7149&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1057-7149&client=summon |