Interactions Guided Generative Adversarial Network for unsupervised image captioning

•Resnet with a new Multi-scale module and adaptive Channel attention is proposed.•Mutual Attention Network is proposed to reason about interactions among objects.•The information on object-object interactions is adopted to adversarial generation.•The alignment between the image and sentence is perfo...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Neurocomputing (Amsterdam) Ročník 417; s. 419 - 431
Hlavní autori:	Cao, Shan, An, Gaoyun, Zheng, Zhenxing, Ruan, Qiuqi
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Elsevier B.V 05.12.2020
Predmet:	Multi-scale feature Object-object interactions Unsupervised image caption Unsupervised image caption Object-object interactions Multi-scale feature
ISSN:	0925-2312, 1872-8286
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Abstract	•Resnet with a new Multi-scale module and adaptive Channel attention is proposed.•Mutual Attention Network is proposed to reason about interactions among objects.•The information on object-object interactions is adopted to adversarial generation.•The alignment between the image and sentence is performed by cycle consistency.•An effective unsupervised image captioning model, IGGAN, is proposed. Most of the current image captioning models that have achieved great successes heavily depend on manually labeled image-caption pairs. However, it is expensive and time-consuming to acquire large scale paired data. In this paper, we propose the Interactions Guided Generative Adversarial Network (IGGAN) for unsupervised image captioning, which joints multi-scale feature representation and object-object interactions. To get robust feature representation, the image is encoded by ResNet with a new Multi-scale module and adaptive Channel attention (RMCNet). Moreover, the information on object-object interactions is extracted by our Mutual Attention Network (MAN) and then adopted in the process of adversarial generation, which enhances the rationality of generated sentences. To encourage the sentence to be semantically consistent with the image, we utilize the image and generated sentence to reconstruct each other by cycle consistency in IGGAN. Our proposed model can generate sentences without any manually labeled image-caption pairs. Experimental results show that our proposed model achieves quite promising performance on the MSCOCO image captioning dataset. The ablation studies validate the effectiveness of our proposed modules.
AbstractList	•Resnet with a new Multi-scale module and adaptive Channel attention is proposed.•Mutual Attention Network is proposed to reason about interactions among objects.•The information on object-object interactions is adopted to adversarial generation.•The alignment between the image and sentence is performed by cycle consistency.•An effective unsupervised image captioning model, IGGAN, is proposed. Most of the current image captioning models that have achieved great successes heavily depend on manually labeled image-caption pairs. However, it is expensive and time-consuming to acquire large scale paired data. In this paper, we propose the Interactions Guided Generative Adversarial Network (IGGAN) for unsupervised image captioning, which joints multi-scale feature representation and object-object interactions. To get robust feature representation, the image is encoded by ResNet with a new Multi-scale module and adaptive Channel attention (RMCNet). Moreover, the information on object-object interactions is extracted by our Mutual Attention Network (MAN) and then adopted in the process of adversarial generation, which enhances the rationality of generated sentences. To encourage the sentence to be semantically consistent with the image, we utilize the image and generated sentence to reconstruct each other by cycle consistency in IGGAN. Our proposed model can generate sentences without any manually labeled image-caption pairs. Experimental results show that our proposed model achieves quite promising performance on the MSCOCO image captioning dataset. The ablation studies validate the effectiveness of our proposed modules.
Author	Cao, Shan Zheng, Zhenxing Ruan, Qiuqi An, Gaoyun
Author_xml	– sequence: 1 givenname: Shan surname: Cao fullname: Cao, Shan – sequence: 2 givenname: Gaoyun surname: An fullname: An, Gaoyun email: gyan@bjtu.edu.cn – sequence: 3 givenname: Zhenxing surname: Zheng fullname: Zheng, Zhenxing – sequence: 4 givenname: Qiuqi surname: Ruan fullname: Ruan, Qiuqi
BookMark	eNqFkMtOwzAQRS1UJNrCH7DIDyT4kYfNAqmqoFSqYFPWlmNPKpfWqWwniL8nUbtiAauRRnOu5p4ZmrjWAUL3BGcEk_JhnznodHvMKKY4wzzDRFyhKeEVTTnl5QRNsaBFShmhN2gWwh5jUhEqpmi7dhG80tG2LiSrzhowyQrcsIu2h2RhevBBeasOyRvEr9Z_Jk3rk86F7gS-t2G4t0e1g0Sr05hi3e4WXTfqEODuMufo4-V5u3xNN--r9XKxSTWraExFrYnKC0rL2lSUCODAoK5zZoQ2FcNGNVXBGatIbpgphMhLgXmOuSqYqRWwOXo852rfhuChkdpGNT4RvbIHSbAc_ci9PPuRox-JuRz8DHD-Cz75oYj__g97OmMwFOsteBm0BafBWA86StPavwN-AEGjhcU
CitedBy_id	crossref_primary_10_1109_TMM_2023_3265842 crossref_primary_10_1109_ACCESS_2021_3056330 crossref_primary_10_1016_j_neucom_2022_05_058 crossref_primary_10_1016_j_cmpb_2023_107979 crossref_primary_10_1016_j_engappai_2023_106112 crossref_primary_10_1007_s00500_023_08544_8 crossref_primary_10_1007_s11042_023_16687_x crossref_primary_10_1016_j_neucom_2022_10_079 crossref_primary_10_1016_j_neucom_2022_06_062 crossref_primary_10_1007_s11042_024_18680_4 crossref_primary_10_1007_s42979_021_00884_2 crossref_primary_10_1016_j_neucom_2022_06_063 crossref_primary_10_1007_s00521_024_10211_4 crossref_primary_10_1016_j_neunet_2024_106519 crossref_primary_10_1016_j_neucom_2024_127350 crossref_primary_10_1109_TMM_2022_3214090 crossref_primary_10_1007_s11042_024_18748_1 crossref_primary_10_1016_j_neucom_2022_11_045 crossref_primary_10_1109_ACCESS_2021_3129782
Cites_doi	10.1109/TIP.2018.2882225 10.1016/j.neucom.2019.12.073 10.1109/TIP.2019.2917229 10.1016/j.neucom.2018.11.004 10.18653/v1/D18-1399 10.1016/j.neucom.2018.10.059 10.1109/TIP.2012.2197631 10.1109/TGRS.2014.2307354 10.1002/cnm.2765 10.1016/j.neucom.2018.03.078 10.1109/TIP.2018.2889922
ContentType	Journal Article
Copyright	2020 Elsevier B.V.
Copyright_xml	– notice: 2020 Elsevier B.V.
DBID	AAYXX CITATION
DOI	10.1016/j.neucom.2020.08.019
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1872-8286
EndPage	431
ExternalDocumentID	10_1016_j_neucom_2020_08_019 S0925231220312790
GroupedDBID	--- --K --M .DC .~1 0R~ 123 1B1 1~. 1~5 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JM 9JN AABNK AACTN AADPK AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAXLA AAXUO AAYFN ABBOA ABCQJ ABFNM ABJNI ABMAC ABYKQ ACDAQ ACGFS ACRLP ACZNC ADBBV ADEZE AEBSH AEKER AENEX AFKWA AFTJW AFXIZ AGHFR AGUBO AGWIK AGYEJ AHHHB AHZHX AIALX AIEXJ AIKHN AITUG AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD AXJTR BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ IHE J1W KOM LG9 M41 MO0 MOBAO N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 ROL RPZ SDF SDG SDP SES SPC SPCBC SSN SSV SSZ T5K ZMT ~G- 29N 9DU AAQXK AATTM AAXKI AAYWO AAYXX ABWVN ABXDB ACLOT ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN CITATION EFKBS EJD FEDTE FGOYB HLZ HVGLF HZ~ R2- SBC SEW WUQ XPP ~HD
ID	FETCH-LOGICAL-c372t-9bc1a45226bd7219e8e3ebb43d9cd730daf75833714d3d59946908408a53dbae3
ISICitedReferencesCount	23
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000590407200014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	0925-2312
IngestDate	Tue Nov 18 22:26:21 EST 2025 Sat Nov 29 07:19:50 EST 2025 Fri Feb 23 02:45:57 EST 2024
IsPeerReviewed	true
IsScholarly	true
Keywords	Unsupervised image caption Object-object interactions Multi-scale feature
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c372t-9bc1a45226bd7219e8e3ebb43d9cd730daf75833714d3d59946908408a53dbae3
PageCount	13
ParticipantIDs	crossref_citationtrail_10_1016_j_neucom_2020_08_019 crossref_primary_10_1016_j_neucom_2020_08_019 elsevier_sciencedirect_doi_10_1016_j_neucom_2020_08_019
PublicationCentury	2000
PublicationDate	2020-12-05
PublicationDateYYYYMMDD	2020-12-05
PublicationDate_xml	– month: 12 year: 2020 text: 2020-12-05 day: 05
PublicationDecade	2020
PublicationTitle	Neurocomputing (Amsterdam)
PublicationYear	2020
Publisher	Elsevier B.V
Publisher_xml	– name: Elsevier B.V
References	Li, Shen, Zhang, Zhang, Yuan, Yang (b0075) 2014; 52 Xian, Tian (b0275) 2019; 28 Aneja, Deshpande, Schwing (b0280) 2018 Ramanishka, Das, Zhang, Saenko (b0130) 2017 Fang, Gupta, Iandola, Srivastava, Deng, Dollar, Gao, He (b0095) 2015 Yang, Sun, Liang, Ren, Lai (b0105) 2019; 328 Chen, Liao, Chuang, Hsu (b0180) 2017 Chen, Zhang, Xiao, Nie, Shao, Liu, Chua (b0125) 2017 Dai, Fidler, Urtasun, Lin (b0155) 2017 Zhao, Xu, Yang, Ye, Zhao, Feng (b0185) 2017 Zhao, Glotin, Xie, Gao, Wu (b0070) 2012; 21 Wang, Yan, Zhang, Zhang (b0080) 2009 Lu, Xiong, Parikh, Socher (b0120) 2017 Song, Chen, Zhao, Jin (b0190) 2019 Laina, Rupprecht, Navab (b0050) 2019 Anderson, Fernando, Johnson, Gould (b0265) 2016 Yuan, Li, Lu (b0150) 2019; 330 Feng, Ma, Liu, Luo (b0205) 2019 Denkowski, Lavie (b0255) 2014 Ren, Wang, Zhang (b0175) 2017 M. Artetxe, G. Labaka, E. Agirre, K. Cho, Unsupervised neural machine translation, arXiv preprint (2017) arXiv:1710.11041. Liu, Zhu, Ye (b0170) 2017 K. Diederik, B. Jimmy, Adam: a method for stochastic optimization, arXiv preprint (2014) arXiv:1412.6980. X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, C.L. Zitnick, Microsoft coco captions: data collection and evaluation server, arXiv preprint (2015) arXiv:1504.00325. Bird, Klein, Loper (b0235) 2010; 44 Gu, Joty, Cai, Zhao, Yang, Wang (b0200) 2019 Zhao, Chang, Guo (b0040) 2019; 329 Anderson, He, Buehler, Teney, Johnson, Gould, Zhang (b0135) 2018 Gu, Cai, Joty, Niu, Wang (b0005) 2018 Lin (b0250) 2004 Karpathy, Li (b0090) 2015 Gu, Cai, Wang, Chen (b0100) 2018 Hou, Wu, Zhang, Qi, Jia, Luo (b0270) 2020 Huang, Wang, Chen, Wei (b0145) 2019 You, Jin, Wang, Fang, Luo (b0115) 2016 Shettya, Rohrbach, Hendricks, Fritza, Schiele (b0160) 2017 Liang, Bai, Zhang, Qian, Zhu, Mei (b0215) 2019 Gu, Joty, Cai, Wang (b0045) 2018 He, Zhang, Ren, Sun (b0220) 2016 Mathews, Xie, He (b0195) 2018 Li, Chen (b0140) 2018 Su, Fan, Bach (b0065) 2019 Vinyals, Toshev, Bengio, Erhan (b0085) 2015 K. Xu, J.L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R.S. Zemel, Y. Bengio, Show, attend and tell: neural image caption generation with visual attention, arXiv preprint (2015) arXiv:1502.03044. Guo, Liu, Yao, Li, Lu (b0290) 2019 Goceri (b0225) 2016; 32 Karpathy, Fei-Fei (b0230) 2015 Papineni, Roukos, Ward, Zhu (b0245) 2002 Gao, Li, Song, Shen (b0030) 2019; 42 Cornia, Baraldi, Cucchiara (b0015) 2019 Huang, Zhang, Zhao, Li (b0035) 2019; 28 Vedantam, Zitnick, Parikh (b0260) 2015 Yang, Zhang, Cai (b0010) 2018 Li, Zhu, Liu, Yang (b0020) 2019 Wei, Wang, Cao, Shao, Wu (b0165) 2020; 387 Yang, Zhang, Cai (b0025) 2019 G. Lample, A. Conneau, L. Denoyer, M. Ranzato, Unsupervised machine translation using monolingual corpora only, arXiv preprint (2017) arXiv:1711.00043. Song, Yang, Zhang (b0285) 2019; 28 10.1016/j.neucom.2020.08.019_b0060 Zhao (10.1016/j.neucom.2020.08.019_b0070) 2012; 21 Li (10.1016/j.neucom.2020.08.019_b0075) 2014; 52 Papineni (10.1016/j.neucom.2020.08.019_b0245) 2002 Vinyals (10.1016/j.neucom.2020.08.019_b0085) 2015 Huang (10.1016/j.neucom.2020.08.019_b0145) 2019 Gao (10.1016/j.neucom.2020.08.019_b0030) 2019; 42 He (10.1016/j.neucom.2020.08.019_b0220) 2016 Yang (10.1016/j.neucom.2020.08.019_b0105) 2019; 328 Chen (10.1016/j.neucom.2020.08.019_b0125) 2017 Li (10.1016/j.neucom.2020.08.019_b0140) 2018 Liu (10.1016/j.neucom.2020.08.019_b0170) 2017 Huang (10.1016/j.neucom.2020.08.019_b0035) 2019; 28 10.1016/j.neucom.2020.08.019_b0110 Wei (10.1016/j.neucom.2020.08.019_b0165) 2020; 387 Karpathy (10.1016/j.neucom.2020.08.019_b0230) 2015 Liang (10.1016/j.neucom.2020.08.019_b0215) 2019 Laina (10.1016/j.neucom.2020.08.019_b0050) 2019 Ramanishka (10.1016/j.neucom.2020.08.019_b0130) 2017 Shettya (10.1016/j.neucom.2020.08.019_b0160) 2017 Karpathy (10.1016/j.neucom.2020.08.019_b0090) 2015 Song (10.1016/j.neucom.2020.08.019_b0285) 2019; 28 Gu (10.1016/j.neucom.2020.08.019_b0200) 2019 Yuan (10.1016/j.neucom.2020.08.019_b0150) 2019; 330 Su (10.1016/j.neucom.2020.08.019_b0065) 2019 Lin (10.1016/j.neucom.2020.08.019_b0250) 2004 Xian (10.1016/j.neucom.2020.08.019_b0275) 2019; 28 10.1016/j.neucom.2020.08.019_b0240 Chen (10.1016/j.neucom.2020.08.019_b0180) 2017 Mathews (10.1016/j.neucom.2020.08.019_b0195) 2018 Zhao (10.1016/j.neucom.2020.08.019_b0040) 2019; 329 Wang (10.1016/j.neucom.2020.08.019_b0080) 2009 Aneja (10.1016/j.neucom.2020.08.019_b0280) 2018 Feng (10.1016/j.neucom.2020.08.019_b0205) 2019 Cornia (10.1016/j.neucom.2020.08.019_b0015) 2019 Guo (10.1016/j.neucom.2020.08.019_b0290) 2019 Denkowski (10.1016/j.neucom.2020.08.019_b0255) 2014 Hou (10.1016/j.neucom.2020.08.019_b0270) 2020 Ren (10.1016/j.neucom.2020.08.019_b0175) 2017 Anderson (10.1016/j.neucom.2020.08.019_b0135) 2018 Gu (10.1016/j.neucom.2020.08.019_b0100) 2018 Gu (10.1016/j.neucom.2020.08.019_b0045) 2018 Li (10.1016/j.neucom.2020.08.019_b0020) 2019 10.1016/j.neucom.2020.08.019_b0210 10.1016/j.neucom.2020.08.019_b0055 Goceri (10.1016/j.neucom.2020.08.019_b0225) 2016; 32 Vedantam (10.1016/j.neucom.2020.08.019_b0260) 2015 Lu (10.1016/j.neucom.2020.08.019_b0120) 2017 Yang (10.1016/j.neucom.2020.08.019_b0010) 2018 Fang (10.1016/j.neucom.2020.08.019_b0095) 2015 Yang (10.1016/j.neucom.2020.08.019_b0025) 2019 Song (10.1016/j.neucom.2020.08.019_b0190) 2019 Gu (10.1016/j.neucom.2020.08.019_b0005) 2018 Anderson (10.1016/j.neucom.2020.08.019_b0265) 2016 Bird (10.1016/j.neucom.2020.08.019_b0235) 2010; 44 Zhao (10.1016/j.neucom.2020.08.019_b0185) 2017 You (10.1016/j.neucom.2020.08.019_b0115) 2016 Dai (10.1016/j.neucom.2020.08.019_b0155) 2017
References_xml	– start-page: 7181 year: 2018 end-page: 7189 ident: b0005 article-title: Look, imagine and match: improving textual-visual cross-modal retrieval with generative models publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – reference: X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, C.L. Zitnick, Microsoft coco captions: data collection and evaluation server, arXiv preprint (2015) arXiv:1504.00325. – start-page: 10403 year: 2019 end-page: 10412 ident: b0215 article-title: Vrr-vg: refocusing visually-relevant relationships publication-title: Proceedings of the IEEE International Conference on Computer Vision – start-page: 4634 year: 2019 end-page: 4643 ident: b0145 article-title: Attention on attention for image captioning publication-title: Proceedings of the IEEE International Conference on Computer Vision – start-page: 4135 year: 2017 end-page: 4144 ident: b0160 article-title: Speaking the same language: matching machine to human captions by adversarial training publication-title: Proceedings of the IEEE International Conference on Computer Vision – reference: K. Xu, J.L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R.S. Zemel, Y. Bengio, Show, attend and tell: neural image caption generation with visual attention, arXiv preprint (2015) arXiv:1502.03044. – start-page: 1473 year: 2015 end-page: 1482 ident: b0095 article-title: From captions to visual concepts and back publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 8307 year: 2019 end-page: 8316 ident: b0015 article-title: Show control and tell: a framework for generating controllable and grounded captions publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 7414 year: 2019 end-page: 7424 ident: b0050 article-title: Towards unsupervised image captioning with shared multimodal embeddings publication-title: Proceedings of the IEEE International Conference on Computer Vision – start-page: 375 year: 2017 end-page: 383 ident: b0120 article-title: Knowing when to look: adaptive attention via a visual sentinel for image captioning publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 6077 year: 2018 end-page: 6086 ident: b0135 article-title: Bottom-up and top-down attention for image captioning and visual question answering publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 290 year: 2017 end-page: 298 ident: b0175 article-title: Deep reinforcement learning-based image captioning with embedding reward publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – volume: 328 start-page: 56 year: 2019 end-page: 68 ident: b0105 article-title: Image captioning by incorporating affective concepts learned from both visual and textual components publication-title: Neurocomputing – start-page: 1 year: 2018 end-page: 8 ident: b0140 article-title: Image captioning with visual-semantic lstm publication-title: Proceedings of the International Joint Conference on Artificial Intelligence – start-page: 5659 year: 2017 end-page: 5667 ident: b0125 article-title: Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – reference: M. Artetxe, G. Labaka, E. Agirre, K. Cho, Unsupervised neural machine translation, arXiv preprint (2017) arXiv:1710.11041. – volume: 52 start-page: 7086 year: 2014 end-page: 7098 ident: b0075 article-title: Recovering quantitative remote sensing products contaminated by thick clouds and shadows using multitemporal dictionary learning publication-title: IEEE Transactions on Geoscience and Remote Sensing – start-page: 770 year: 2016 end-page: 778 ident: b0220 article-title: Deep residual learning for image recognition publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 3156 year: 2015 end-page: 3164 ident: b0085 article-title: Show and tell: a neural image caption generator publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 6837 year: 2018 end-page: 6844 ident: b0100 article-title: Stack-captioning: coarse-to-fine learning for image captioning publication-title: Proceedings of the AAAI Conference on Artificial Intelligence – volume: 28 start-page: 2743 year: 2019 end-page: 28754 ident: b0285 article-title: Topic-oriented image captioning based on order-embedding publication-title: IEEE Transactions on Image Processing – reference: G. Lample, A. Conneau, L. Denoyer, M. Ranzato, Unsupervised machine translation using monolingual corpora only, arXiv preprint (2017) arXiv:1711.00043. – start-page: 10482 year: 2019 end-page: 10491 ident: b0065 article-title: Unsupervised multi-modal neural machine translation publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 376 year: 2004 end-page: 380 ident: b0250 article-title: a package for automatic evaluation of summaries publication-title: Rouge Proceedings of the Annual Meeting on Association for Computational Linguistics Workshop – start-page: 36 year: 2018 end-page: 52 ident: b0010 article-title: Shuffle-then-assemble: learning object-agnostic visual relationship features publication-title: Proceedings of the European Conference on Computer Vision – start-page: 784 year: 2019 end-page: 792 ident: b0190 article-title: Unpaired cross-lingual image caption generation with self-supervised rewards publication-title: Proceedings of the ACM International Conference on Multimedia – start-page: 4250 year: 2019 end-page: 4260 ident: b0025 article-title: Learning to collocate neural modules for image captioning publication-title: Proceedings of the IEEE International Conference on Computer Vision – volume: 330 start-page: 17 year: 2019 end-page: 28 ident: b0150 article-title: 3g structure for image caption generation publication-title: Neurocomputing – volume: 32 year: 2016 ident: b0225 article-title: Fully automated liver segmentation using sobolev gradient-based level set evolution publication-title: International Journal for Numerical Methods in Biomedical Engineering – start-page: 503 year: 2018 end-page: 519 ident: b0045 article-title: Unpaired image captioning by language pivoting publication-title: Proceedings of the European Conference on Computer Vision – start-page: 1643 year: 2009 end-page: 1650 ident: b0080 article-title: Multi-label sparse coding for automatic image annotation publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 873 year: 2017 end-page: 881 ident: b0170 article-title: Improved image captioning via policy gradient optimization of spider publication-title: Proceedings of the IEEE International Conference on Computer Vision – start-page: 4651 year: 2016 end-page: 4659 ident: b0115 article-title: Image captioning with semantic attention publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 521 year: 2017 end-page: 530 ident: b0180 article-title: Show, adapt and tell: adversarial training of cross-domain image captioner publication-title: Proceedings of the IEEE International Conference on Computer Vision – start-page: 8591 year: 2018 end-page: 8600 ident: b0195 article-title: Semstyle: learning to generate stylised image captions using unaligned text publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 3128 year: 2015 end-page: 3137 ident: b0230 article-title: Deep visual-semantic alignments for generating image descriptions publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 4566 year: 2015 end-page: 4575 ident: b0260 article-title: Cider: consensus-based image description evaluation publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – volume: 28 start-page: 5241 year: 2019 end-page: 5252 ident: b0275 article-title: Self-guiding multimodal lstm–when we do not have a perfect training dataset for image captioning publication-title: IEEE Transactions on Image Processing – start-page: 29 year: 2017 end-page: 38 ident: b0185 article-title: Dual learning for cross-domain image captioning publication-title: Proceedings of the ACM on Conference on Information and Knowledge – start-page: 2970 year: 2017 end-page: 2979 ident: b0155 article-title: Towards diverse and natural image descriptions via a conditional gan publication-title: Proceedings of the IEEE International Conference on Computer Vision – start-page: 7206 year: 2017 end-page: 7215 ident: b0130 article-title: Top-down visual saliency guided by captions publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 4125 year: 2019 end-page: 4134 ident: b0205 article-title: Unsupervised image captioning publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – volume: 387 start-page: 91 year: 2020 end-page: 99 ident: b0165 article-title: Multi-attention generative adversarial network for image captioning publication-title: Neurocomputing – volume: 44 start-page: 421 year: 2010 end-page: 424 ident: b0235 article-title: Natural language processing with python, analyzing text with the natural language toolkit publication-title: Language Resources and Evaluation – start-page: 376 year: 2014 end-page: 380 ident: b0255 article-title: Meteor universal: language specific translation evaluation for any target language publication-title: Proceedings of the Annual Meeting on Association for Computational Linguistics Workshop – start-page: 311 year: 2002 end-page: 318 ident: b0245 article-title: Bleu: a method for automatic evaluation of machine translation publication-title: Proceedings of the Annual Meeting on Association for Computational Linguistics – volume: 21 start-page: 4218 year: 2012 end-page: 4231 ident: b0070 article-title: Cooperative sparse representation in two opposite directions for semi-supervised image annotation publication-title: IEEE Transactions on Image Processing – start-page: 4204 year: 2019 end-page: 4213 ident: b0290 article-title: Mscap: multi-style image captioning with unpaired stylized text publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – volume: 42 start-page: 1112 year: 2019 end-page: 1131 ident: b0030 article-title: Hierarchical lstms with adaptive attention for visual captioning publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence – volume: 28 start-page: 2008 year: 2019 end-page: 2020 ident: b0035 article-title: Bi-directional spatial-semantic attention networks for image-text matching publication-title: IEEE Transactions on Image Processing – start-page: 382 year: 2016 end-page: 398 ident: b0265 article-title: Spice: semantic propositional image caption evaluation publication-title: Proceedings of the European Conference on Computer Vision – reference: K. Diederik, B. Jimmy, Adam: a method for stochastic optimization, arXiv preprint (2014) arXiv:1412.6980. – start-page: 3128 year: 2015 end-page: 3137 ident: b0090 article-title: Deep visual-semantic alignments for generating image descriptions publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 784 year: 2020 end-page: 792 ident: b0270 article-title: Joint commonsense and relation reasoning for image and video captioning publication-title: Proceedings of the AAAI Conference on Artificial Intelligence – volume: 329 start-page: 476 year: 2019 end-page: 485 ident: b0040 article-title: A multimodal fusion approach for image captioning publication-title: Neurocomputing – start-page: 10323 year: 2019 end-page: 10332 ident: b0200 article-title: Unpaired image captioning via scene graph alignments publication-title: Proceedings of the IEEE International Conference on Computer Vision – start-page: 5561 year: 2018 end-page: 5570 ident: b0280 article-title: Convolutional image captioning publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 8928 year: 2019 end-page: 8937 ident: b0020 article-title: Entangled transformer for image captioning publication-title: Proceedings of the IEEE International Conference on Computer Vision – start-page: 311 year: 2002 ident: 10.1016/j.neucom.2020.08.019_b0245 article-title: Bleu: a method for automatic evaluation of machine translation – volume: 28 start-page: 2008 issue: 4 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0035 article-title: Bi-directional spatial-semantic attention networks for image-text matching publication-title: IEEE Transactions on Image Processing doi: 10.1109/TIP.2018.2882225 – volume: 387 start-page: 91 year: 2020 ident: 10.1016/j.neucom.2020.08.019_b0165 article-title: Multi-attention generative adversarial network for image captioning publication-title: Neurocomputing doi: 10.1016/j.neucom.2019.12.073 – start-page: 36 year: 2018 ident: 10.1016/j.neucom.2020.08.019_b0010 article-title: Shuffle-then-assemble: learning object-agnostic visual relationship features – start-page: 784 year: 2020 ident: 10.1016/j.neucom.2020.08.019_b0270 article-title: Joint commonsense and relation reasoning for image and video captioning – start-page: 290 year: 2017 ident: 10.1016/j.neucom.2020.08.019_b0175 article-title: Deep reinforcement learning-based image captioning with embedding reward – start-page: 4204 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0290 article-title: Mscap: multi-style image captioning with unpaired stylized text – start-page: 4651 year: 2016 ident: 10.1016/j.neucom.2020.08.019_b0115 article-title: Image captioning with semantic attention – volume: 28 start-page: 5241 issue: 11 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0275 article-title: Self-guiding multimodal lstm–when we do not have a perfect training dataset for image captioning publication-title: IEEE Transactions on Image Processing doi: 10.1109/TIP.2019.2917229 – start-page: 873 year: 2017 ident: 10.1016/j.neucom.2020.08.019_b0170 article-title: Improved image captioning via policy gradient optimization of spider – volume: 329 start-page: 476 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0040 article-title: A multimodal fusion approach for image captioning publication-title: Neurocomputing doi: 10.1016/j.neucom.2018.11.004 – start-page: 375 year: 2017 ident: 10.1016/j.neucom.2020.08.019_b0120 article-title: Knowing when to look: adaptive attention via a visual sentinel for image captioning – start-page: 4250 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0025 article-title: Learning to collocate neural modules for image captioning – start-page: 6077 year: 2018 ident: 10.1016/j.neucom.2020.08.019_b0135 article-title: Bottom-up and top-down attention for image captioning and visual question answering – start-page: 4566 year: 2015 ident: 10.1016/j.neucom.2020.08.019_b0260 article-title: Cider: consensus-based image description evaluation – start-page: 3156 year: 2015 ident: 10.1016/j.neucom.2020.08.019_b0085 article-title: Show and tell: a neural image caption generator – ident: 10.1016/j.neucom.2020.08.019_b0240 – volume: 42 start-page: 1112 issue: 5 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0030 article-title: Hierarchical lstms with adaptive attention for visual captioning publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence – ident: 10.1016/j.neucom.2020.08.019_b0110 – start-page: 10403 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0215 article-title: Vrr-vg: refocusing visually-relevant relationships – start-page: 5561 year: 2018 ident: 10.1016/j.neucom.2020.08.019_b0280 article-title: Convolutional image captioning – start-page: 376 year: 2004 ident: 10.1016/j.neucom.2020.08.019_b0250 article-title: a package for automatic evaluation of summaries – start-page: 784 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0190 article-title: Unpaired cross-lingual image caption generation with self-supervised rewards – ident: 10.1016/j.neucom.2020.08.019_b0060 doi: 10.18653/v1/D18-1399 – start-page: 3128 year: 2015 ident: 10.1016/j.neucom.2020.08.019_b0230 article-title: Deep visual-semantic alignments for generating image descriptions – start-page: 6837 year: 2018 ident: 10.1016/j.neucom.2020.08.019_b0100 article-title: Stack-captioning: coarse-to-fine learning for image captioning – start-page: 29 year: 2017 ident: 10.1016/j.neucom.2020.08.019_b0185 article-title: Dual learning for cross-domain image captioning – start-page: 382 year: 2016 ident: 10.1016/j.neucom.2020.08.019_b0265 article-title: Spice: semantic propositional image caption evaluation – start-page: 8307 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0015 article-title: Show control and tell: a framework for generating controllable and grounded captions – start-page: 770 year: 2016 ident: 10.1016/j.neucom.2020.08.019_b0220 article-title: Deep residual learning for image recognition – start-page: 4634 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0145 article-title: Attention on attention for image captioning – start-page: 10323 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0200 article-title: Unpaired image captioning via scene graph alignments – start-page: 1473 year: 2015 ident: 10.1016/j.neucom.2020.08.019_b0095 article-title: From captions to visual concepts and back – start-page: 7414 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0050 article-title: Towards unsupervised image captioning with shared multimodal embeddings – start-page: 10482 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0065 article-title: Unsupervised multi-modal neural machine translation – start-page: 7181 year: 2018 ident: 10.1016/j.neucom.2020.08.019_b0005 article-title: Look, imagine and match: improving textual-visual cross-modal retrieval with generative models – start-page: 1643 year: 2009 ident: 10.1016/j.neucom.2020.08.019_b0080 article-title: Multi-label sparse coding for automatic image annotation – start-page: 3128 year: 2015 ident: 10.1016/j.neucom.2020.08.019_b0090 article-title: Deep visual-semantic alignments for generating image descriptions – volume: 330 start-page: 17 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0150 article-title: 3g structure for image caption generation publication-title: Neurocomputing doi: 10.1016/j.neucom.2018.10.059 – start-page: 8591 year: 2018 ident: 10.1016/j.neucom.2020.08.019_b0195 article-title: Semstyle: learning to generate stylised image captions using unaligned text – start-page: 376 year: 2014 ident: 10.1016/j.neucom.2020.08.019_b0255 article-title: Meteor universal: language specific translation evaluation for any target language – volume: 21 start-page: 4218 issue: 9 year: 2012 ident: 10.1016/j.neucom.2020.08.019_b0070 article-title: Cooperative sparse representation in two opposite directions for semi-supervised image annotation publication-title: IEEE Transactions on Image Processing doi: 10.1109/TIP.2012.2197631 – volume: 44 start-page: 421 issue: 4 year: 2010 ident: 10.1016/j.neucom.2020.08.019_b0235 article-title: Natural language processing with python, analyzing text with the natural language toolkit publication-title: Language Resources and Evaluation – volume: 52 start-page: 7086 issue: 11 year: 2014 ident: 10.1016/j.neucom.2020.08.019_b0075 article-title: Recovering quantitative remote sensing products contaminated by thick clouds and shadows using multitemporal dictionary learning publication-title: IEEE Transactions on Geoscience and Remote Sensing doi: 10.1109/TGRS.2014.2307354 – start-page: 1 year: 2018 ident: 10.1016/j.neucom.2020.08.019_b0140 article-title: Image captioning with visual-semantic lstm – start-page: 503 year: 2018 ident: 10.1016/j.neucom.2020.08.019_b0045 article-title: Unpaired image captioning by language pivoting – start-page: 2970 year: 2017 ident: 10.1016/j.neucom.2020.08.019_b0155 article-title: Towards diverse and natural image descriptions via a conditional gan – start-page: 8928 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0020 article-title: Entangled transformer for image captioning – ident: 10.1016/j.neucom.2020.08.019_b0210 – start-page: 7206 year: 2017 ident: 10.1016/j.neucom.2020.08.019_b0130 article-title: Top-down visual saliency guided by captions – start-page: 521 year: 2017 ident: 10.1016/j.neucom.2020.08.019_b0180 article-title: Show, adapt and tell: adversarial training of cross-domain image captioner – volume: 32 issue: 11 year: 2016 ident: 10.1016/j.neucom.2020.08.019_b0225 article-title: Fully automated liver segmentation using sobolev gradient-based level set evolution publication-title: International Journal for Numerical Methods in Biomedical Engineering doi: 10.1002/cnm.2765 – start-page: 4125 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0205 article-title: Unsupervised image captioning – start-page: 4135 year: 2017 ident: 10.1016/j.neucom.2020.08.019_b0160 article-title: Speaking the same language: matching machine to human captions by adversarial training – volume: 328 start-page: 56 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0105 article-title: Image captioning by incorporating affective concepts learned from both visual and textual components publication-title: Neurocomputing doi: 10.1016/j.neucom.2018.03.078 – volume: 28 start-page: 2743 issue: 6 year: 2019 ident: 10.1016/j.neucom.2020.08.019_b0285 article-title: Topic-oriented image captioning based on order-embedding publication-title: IEEE Transactions on Image Processing doi: 10.1109/TIP.2018.2889922 – ident: 10.1016/j.neucom.2020.08.019_b0055 – start-page: 5659 year: 2017 ident: 10.1016/j.neucom.2020.08.019_b0125 article-title: Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning
SSID	ssj0017129
Score	2.4216044
Snippet	•Resnet with a new Multi-scale module and adaptive Channel attention is proposed.•Mutual Attention Network is proposed to reason about interactions among...
SourceID	crossref elsevier
SourceType	Enrichment Source Index Database Publisher
StartPage	419
SubjectTerms	Multi-scale feature Object-object interactions Unsupervised image caption
Title	Interactions Guided Generative Adversarial Network for unsupervised image captioning
URI	https://dx.doi.org/10.1016/j.neucom.2020.08.019
Volume	417
WOSCitedRecordID	wos000590407200014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: ScienceDirect database customDbUrl: eissn: 1872-8286 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017129 issn: 0925-2312 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELaWlgMXKC9RoJUPXIOS2K7j46oqr8OqqIu04hI5sVdN1aZLs6mWH8D_ZuxxHmgRL4lLFFnx2pr59vNkMg9CXhkrE1tIEZVK6IhzEUeFEsvIuEIwcbI0S4bNJuRsli0W6nQy-dblwtxeyrrONhu1-q-qhjFQtkud_Qt19z8KA3APSocrqB2uf6R47-PDdIUGEFAZMCmxuLSPEvIdmBvtm3XMMAbchxq2ddOuHHE08Hx15UJ5Sr0K7tqxCevLeZS-GURwM0yvXLUF46DVuxWOtffBnp0P6Jt6gnurr7-2_djnc4tk42423Uo-8h49sx-r9ks1dk2kGOYhBn_ZVs4MOh5TEYFViRxskXYzmfqE9jEvc0zqDMzKA7PiIc3x6Njif3RFXLyubeuCgdymfIXWbvIPlbXP3FbcTlJgtlSq-A7ZTaVQQI670_cniw_95yiZpFi0MWy9y8H0gYLba_3cxhnZLfM9cj-8cNApAuUhmdj6EXnQNfOggdsfk_kYNxRxQwfc0BFuaMANBdzQMW6oxw0dcPOEfHpzMj9-F4WWG1HJZLqOVFEm2hXZPyqMhMPMZpbZouDMqNLAYWD0Uro8PZlwwwyIynlXMh5nWjBTaMuekp36urbPCAX1xdx9hC0Y2Lgp15lWhbBHDEQMi8T7hHVCystQj961RbnMu8DDixxFmzvR5q5baqL2SdTPWmE9lt88Lzv558GmRFsxB8j8cubzf575gtwb_g0vyc76prUH5G55u66am8OAre8H0p49
linkProvider	Elsevier
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Interactions+Guided+Generative+Adversarial+Network+for+unsupervised+image+captioning&rft.jtitle=Neurocomputing+%28Amsterdam%29&rft.au=Cao%2C+Shan&rft.au=An%2C+Gaoyun&rft.au=Zheng%2C+Zhenxing&rft.au=Ruan%2C+Qiuqi&rft.date=2020-12-05&rft.pub=Elsevier+B.V&rft.issn=0925-2312&rft.eissn=1872-8286&rft.volume=417&rft.spage=419&rft.epage=431&rft_id=info:doi/10.1016%2Fj.neucom.2020.08.019&rft.externalDocID=S0925231220312790
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0925-2312&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0925-2312&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0925-2312&client=summon