Interactions Guided Generative Adversarial Network for unsupervised image captioning

•Resnet with a new Multi-scale module and adaptive Channel attention is proposed.•Mutual Attention Network is proposed to reason about interactions among objects.•The information on object-object interactions is adopted to adversarial generation.•The alignment between the image and sentence is perfo...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Neurocomputing (Amsterdam) Ročník 417; s. 419 - 431
Hlavní autori: Cao, Shan, An, Gaoyun, Zheng, Zhenxing, Ruan, Qiuqi
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 05.12.2020
Predmet:
ISSN:0925-2312, 1872-8286
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract •Resnet with a new Multi-scale module and adaptive Channel attention is proposed.•Mutual Attention Network is proposed to reason about interactions among objects.•The information on object-object interactions is adopted to adversarial generation.•The alignment between the image and sentence is performed by cycle consistency.•An effective unsupervised image captioning model, IGGAN, is proposed. Most of the current image captioning models that have achieved great successes heavily depend on manually labeled image-caption pairs. However, it is expensive and time-consuming to acquire large scale paired data. In this paper, we propose the Interactions Guided Generative Adversarial Network (IGGAN) for unsupervised image captioning, which joints multi-scale feature representation and object-object interactions. To get robust feature representation, the image is encoded by ResNet with a new Multi-scale module and adaptive Channel attention (RMCNet). Moreover, the information on object-object interactions is extracted by our Mutual Attention Network (MAN) and then adopted in the process of adversarial generation, which enhances the rationality of generated sentences. To encourage the sentence to be semantically consistent with the image, we utilize the image and generated sentence to reconstruct each other by cycle consistency in IGGAN. Our proposed model can generate sentences without any manually labeled image-caption pairs. Experimental results show that our proposed model achieves quite promising performance on the MSCOCO image captioning dataset. The ablation studies validate the effectiveness of our proposed modules.
AbstractList •Resnet with a new Multi-scale module and adaptive Channel attention is proposed.•Mutual Attention Network is proposed to reason about interactions among objects.•The information on object-object interactions is adopted to adversarial generation.•The alignment between the image and sentence is performed by cycle consistency.•An effective unsupervised image captioning model, IGGAN, is proposed. Most of the current image captioning models that have achieved great successes heavily depend on manually labeled image-caption pairs. However, it is expensive and time-consuming to acquire large scale paired data. In this paper, we propose the Interactions Guided Generative Adversarial Network (IGGAN) for unsupervised image captioning, which joints multi-scale feature representation and object-object interactions. To get robust feature representation, the image is encoded by ResNet with a new Multi-scale module and adaptive Channel attention (RMCNet). Moreover, the information on object-object interactions is extracted by our Mutual Attention Network (MAN) and then adopted in the process of adversarial generation, which enhances the rationality of generated sentences. To encourage the sentence to be semantically consistent with the image, we utilize the image and generated sentence to reconstruct each other by cycle consistency in IGGAN. Our proposed model can generate sentences without any manually labeled image-caption pairs. Experimental results show that our proposed model achieves quite promising performance on the MSCOCO image captioning dataset. The ablation studies validate the effectiveness of our proposed modules.
Author Cao, Shan
Zheng, Zhenxing
Ruan, Qiuqi
An, Gaoyun
Author_xml – sequence: 1
  givenname: Shan
  surname: Cao
  fullname: Cao, Shan
– sequence: 2
  givenname: Gaoyun
  surname: An
  fullname: An, Gaoyun
  email: gyan@bjtu.edu.cn
– sequence: 3
  givenname: Zhenxing
  surname: Zheng
  fullname: Zheng, Zhenxing
– sequence: 4
  givenname: Qiuqi
  surname: Ruan
  fullname: Ruan, Qiuqi
BookMark eNqFkMtOwzAQRS1UJNrCH7DIDyT4kYfNAqmqoFSqYFPWlmNPKpfWqWwniL8nUbtiAauRRnOu5p4ZmrjWAUL3BGcEk_JhnznodHvMKKY4wzzDRFyhKeEVTTnl5QRNsaBFShmhN2gWwh5jUhEqpmi7dhG80tG2LiSrzhowyQrcsIu2h2RhevBBeasOyRvEr9Z_Jk3rk86F7gS-t2G4t0e1g0Sr05hi3e4WXTfqEODuMufo4-V5u3xNN--r9XKxSTWraExFrYnKC0rL2lSUCODAoK5zZoQ2FcNGNVXBGatIbpgphMhLgXmOuSqYqRWwOXo852rfhuChkdpGNT4RvbIHSbAc_ci9PPuRox-JuRz8DHD-Cz75oYj__g97OmMwFOsteBm0BafBWA86StPavwN-AEGjhcU
CitedBy_id crossref_primary_10_1109_TMM_2023_3265842
crossref_primary_10_1109_ACCESS_2021_3056330
crossref_primary_10_1016_j_neucom_2022_05_058
crossref_primary_10_1016_j_cmpb_2023_107979
crossref_primary_10_1016_j_engappai_2023_106112
crossref_primary_10_1007_s00500_023_08544_8
crossref_primary_10_1007_s11042_023_16687_x
crossref_primary_10_1016_j_neucom_2022_10_079
crossref_primary_10_1016_j_neucom_2022_06_062
crossref_primary_10_1007_s11042_024_18680_4
crossref_primary_10_1007_s42979_021_00884_2
crossref_primary_10_1016_j_neucom_2022_06_063
crossref_primary_10_1007_s00521_024_10211_4
crossref_primary_10_1016_j_neunet_2024_106519
crossref_primary_10_1016_j_neucom_2024_127350
crossref_primary_10_1109_TMM_2022_3214090
crossref_primary_10_1007_s11042_024_18748_1
crossref_primary_10_1016_j_neucom_2022_11_045
crossref_primary_10_1109_ACCESS_2021_3129782
Cites_doi 10.1109/TIP.2018.2882225
10.1016/j.neucom.2019.12.073
10.1109/TIP.2019.2917229
10.1016/j.neucom.2018.11.004
10.18653/v1/D18-1399
10.1016/j.neucom.2018.10.059
10.1109/TIP.2012.2197631
10.1109/TGRS.2014.2307354
10.1002/cnm.2765
10.1016/j.neucom.2018.03.078
10.1109/TIP.2018.2889922
ContentType Journal Article
Copyright 2020 Elsevier B.V.
Copyright_xml – notice: 2020 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.neucom.2020.08.019
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-8286
EndPage 431
ExternalDocumentID 10_1016_j_neucom_2020_08_019
S0925231220312790
GroupedDBID ---
--K
--M
.DC
.~1
0R~
123
1B1
1~.
1~5
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9JM
9JN
AABNK
AACTN
AADPK
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAXLA
AAXUO
AAYFN
ABBOA
ABCQJ
ABFNM
ABJNI
ABMAC
ABYKQ
ACDAQ
ACGFS
ACRLP
ACZNC
ADBBV
ADEZE
AEBSH
AEKER
AENEX
AFKWA
AFTJW
AFXIZ
AGHFR
AGUBO
AGWIK
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
AXJTR
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
IHE
J1W
KOM
LG9
M41
MO0
MOBAO
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
ROL
RPZ
SDF
SDG
SDP
SES
SPC
SPCBC
SSN
SSV
SSZ
T5K
ZMT
~G-
29N
9DU
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EJD
FEDTE
FGOYB
HLZ
HVGLF
HZ~
R2-
SBC
SEW
WUQ
XPP
~HD
ID FETCH-LOGICAL-c372t-9bc1a45226bd7219e8e3ebb43d9cd730daf75833714d3d59946908408a53dbae3
ISICitedReferencesCount 23
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000590407200014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0925-2312
IngestDate Tue Nov 18 22:26:21 EST 2025
Sat Nov 29 07:19:50 EST 2025
Fri Feb 23 02:45:57 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Unsupervised image caption
Object-object interactions
Multi-scale feature
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c372t-9bc1a45226bd7219e8e3ebb43d9cd730daf75833714d3d59946908408a53dbae3
PageCount 13
ParticipantIDs crossref_citationtrail_10_1016_j_neucom_2020_08_019
crossref_primary_10_1016_j_neucom_2020_08_019
elsevier_sciencedirect_doi_10_1016_j_neucom_2020_08_019
PublicationCentury 2000
PublicationDate 2020-12-05
PublicationDateYYYYMMDD 2020-12-05
PublicationDate_xml – month: 12
  year: 2020
  text: 2020-12-05
  day: 05
PublicationDecade 2020
PublicationTitle Neurocomputing (Amsterdam)
PublicationYear 2020
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Li, Shen, Zhang, Zhang, Yuan, Yang (b0075) 2014; 52
Xian, Tian (b0275) 2019; 28
Aneja, Deshpande, Schwing (b0280) 2018
Ramanishka, Das, Zhang, Saenko (b0130) 2017
Fang, Gupta, Iandola, Srivastava, Deng, Dollar, Gao, He (b0095) 2015
Yang, Sun, Liang, Ren, Lai (b0105) 2019; 328
Chen, Liao, Chuang, Hsu (b0180) 2017
Chen, Zhang, Xiao, Nie, Shao, Liu, Chua (b0125) 2017
Dai, Fidler, Urtasun, Lin (b0155) 2017
Zhao, Xu, Yang, Ye, Zhao, Feng (b0185) 2017
Zhao, Glotin, Xie, Gao, Wu (b0070) 2012; 21
Wang, Yan, Zhang, Zhang (b0080) 2009
Lu, Xiong, Parikh, Socher (b0120) 2017
Song, Chen, Zhao, Jin (b0190) 2019
Laina, Rupprecht, Navab (b0050) 2019
Anderson, Fernando, Johnson, Gould (b0265) 2016
Yuan, Li, Lu (b0150) 2019; 330
Feng, Ma, Liu, Luo (b0205) 2019
Denkowski, Lavie (b0255) 2014
Ren, Wang, Zhang (b0175) 2017
M. Artetxe, G. Labaka, E. Agirre, K. Cho, Unsupervised neural machine translation, arXiv preprint (2017) arXiv:1710.11041.
Liu, Zhu, Ye (b0170) 2017
K. Diederik, B. Jimmy, Adam: a method for stochastic optimization, arXiv preprint (2014) arXiv:1412.6980.
X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, C.L. Zitnick, Microsoft coco captions: data collection and evaluation server, arXiv preprint (2015) arXiv:1504.00325.
Bird, Klein, Loper (b0235) 2010; 44
Gu, Joty, Cai, Zhao, Yang, Wang (b0200) 2019
Zhao, Chang, Guo (b0040) 2019; 329
Anderson, He, Buehler, Teney, Johnson, Gould, Zhang (b0135) 2018
Gu, Cai, Joty, Niu, Wang (b0005) 2018
Lin (b0250) 2004
Karpathy, Li (b0090) 2015
Gu, Cai, Wang, Chen (b0100) 2018
Hou, Wu, Zhang, Qi, Jia, Luo (b0270) 2020
Huang, Wang, Chen, Wei (b0145) 2019
You, Jin, Wang, Fang, Luo (b0115) 2016
Shettya, Rohrbach, Hendricks, Fritza, Schiele (b0160) 2017
Liang, Bai, Zhang, Qian, Zhu, Mei (b0215) 2019
Gu, Joty, Cai, Wang (b0045) 2018
He, Zhang, Ren, Sun (b0220) 2016
Mathews, Xie, He (b0195) 2018
Li, Chen (b0140) 2018
Su, Fan, Bach (b0065) 2019
Vinyals, Toshev, Bengio, Erhan (b0085) 2015
K. Xu, J.L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R.S. Zemel, Y. Bengio, Show, attend and tell: neural image caption generation with visual attention, arXiv preprint (2015) arXiv:1502.03044.
Guo, Liu, Yao, Li, Lu (b0290) 2019
Goceri (b0225) 2016; 32
Karpathy, Fei-Fei (b0230) 2015
Papineni, Roukos, Ward, Zhu (b0245) 2002
Gao, Li, Song, Shen (b0030) 2019; 42
Cornia, Baraldi, Cucchiara (b0015) 2019
Huang, Zhang, Zhao, Li (b0035) 2019; 28
Vedantam, Zitnick, Parikh (b0260) 2015
Yang, Zhang, Cai (b0010) 2018
Li, Zhu, Liu, Yang (b0020) 2019
Wei, Wang, Cao, Shao, Wu (b0165) 2020; 387
Yang, Zhang, Cai (b0025) 2019
G. Lample, A. Conneau, L. Denoyer, M. Ranzato, Unsupervised machine translation using monolingual corpora only, arXiv preprint (2017) arXiv:1711.00043.
Song, Yang, Zhang (b0285) 2019; 28
10.1016/j.neucom.2020.08.019_b0060
Zhao (10.1016/j.neucom.2020.08.019_b0070) 2012; 21
Li (10.1016/j.neucom.2020.08.019_b0075) 2014; 52
Papineni (10.1016/j.neucom.2020.08.019_b0245) 2002
Vinyals (10.1016/j.neucom.2020.08.019_b0085) 2015
Huang (10.1016/j.neucom.2020.08.019_b0145) 2019
Gao (10.1016/j.neucom.2020.08.019_b0030) 2019; 42
He (10.1016/j.neucom.2020.08.019_b0220) 2016
Yang (10.1016/j.neucom.2020.08.019_b0105) 2019; 328
Chen (10.1016/j.neucom.2020.08.019_b0125) 2017
Li (10.1016/j.neucom.2020.08.019_b0140) 2018
Liu (10.1016/j.neucom.2020.08.019_b0170) 2017
Huang (10.1016/j.neucom.2020.08.019_b0035) 2019; 28
10.1016/j.neucom.2020.08.019_b0110
Wei (10.1016/j.neucom.2020.08.019_b0165) 2020; 387
Karpathy (10.1016/j.neucom.2020.08.019_b0230) 2015
Liang (10.1016/j.neucom.2020.08.019_b0215) 2019
Laina (10.1016/j.neucom.2020.08.019_b0050) 2019
Ramanishka (10.1016/j.neucom.2020.08.019_b0130) 2017
Shettya (10.1016/j.neucom.2020.08.019_b0160) 2017
Karpathy (10.1016/j.neucom.2020.08.019_b0090) 2015
Song (10.1016/j.neucom.2020.08.019_b0285) 2019; 28
Gu (10.1016/j.neucom.2020.08.019_b0200) 2019
Yuan (10.1016/j.neucom.2020.08.019_b0150) 2019; 330
Su (10.1016/j.neucom.2020.08.019_b0065) 2019
Lin (10.1016/j.neucom.2020.08.019_b0250) 2004
Xian (10.1016/j.neucom.2020.08.019_b0275) 2019; 28
10.1016/j.neucom.2020.08.019_b0240
Chen (10.1016/j.neucom.2020.08.019_b0180) 2017
Mathews (10.1016/j.neucom.2020.08.019_b0195) 2018
Zhao (10.1016/j.neucom.2020.08.019_b0040) 2019; 329
Wang (10.1016/j.neucom.2020.08.019_b0080) 2009
Aneja (10.1016/j.neucom.2020.08.019_b0280) 2018
Feng (10.1016/j.neucom.2020.08.019_b0205) 2019
Cornia (10.1016/j.neucom.2020.08.019_b0015) 2019
Guo (10.1016/j.neucom.2020.08.019_b0290) 2019
Denkowski (10.1016/j.neucom.2020.08.019_b0255) 2014
Hou (10.1016/j.neucom.2020.08.019_b0270) 2020
Ren (10.1016/j.neucom.2020.08.019_b0175) 2017
Anderson (10.1016/j.neucom.2020.08.019_b0135) 2018
Gu (10.1016/j.neucom.2020.08.019_b0100) 2018
Gu (10.1016/j.neucom.2020.08.019_b0045) 2018
Li (10.1016/j.neucom.2020.08.019_b0020) 2019
10.1016/j.neucom.2020.08.019_b0210
10.1016/j.neucom.2020.08.019_b0055
Goceri (10.1016/j.neucom.2020.08.019_b0225) 2016; 32
Vedantam (10.1016/j.neucom.2020.08.019_b0260) 2015
Lu (10.1016/j.neucom.2020.08.019_b0120) 2017
Yang (10.1016/j.neucom.2020.08.019_b0010) 2018
Fang (10.1016/j.neucom.2020.08.019_b0095) 2015
Yang (10.1016/j.neucom.2020.08.019_b0025) 2019
Song (10.1016/j.neucom.2020.08.019_b0190) 2019
Gu (10.1016/j.neucom.2020.08.019_b0005) 2018
Anderson (10.1016/j.neucom.2020.08.019_b0265) 2016
Bird (10.1016/j.neucom.2020.08.019_b0235) 2010; 44
Zhao (10.1016/j.neucom.2020.08.019_b0185) 2017
You (10.1016/j.neucom.2020.08.019_b0115) 2016
Dai (10.1016/j.neucom.2020.08.019_b0155) 2017
References_xml – start-page: 7181
  year: 2018
  end-page: 7189
  ident: b0005
  article-title: Look, imagine and match: improving textual-visual cross-modal retrieval with generative models
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– reference: X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, C.L. Zitnick, Microsoft coco captions: data collection and evaluation server, arXiv preprint (2015) arXiv:1504.00325.
– start-page: 10403
  year: 2019
  end-page: 10412
  ident: b0215
  article-title: Vrr-vg: refocusing visually-relevant relationships
  publication-title: Proceedings of the IEEE International Conference on Computer Vision
– start-page: 4634
  year: 2019
  end-page: 4643
  ident: b0145
  article-title: Attention on attention for image captioning
  publication-title: Proceedings of the IEEE International Conference on Computer Vision
– start-page: 4135
  year: 2017
  end-page: 4144
  ident: b0160
  article-title: Speaking the same language: matching machine to human captions by adversarial training
  publication-title: Proceedings of the IEEE International Conference on Computer Vision
– reference: K. Xu, J.L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R.S. Zemel, Y. Bengio, Show, attend and tell: neural image caption generation with visual attention, arXiv preprint (2015) arXiv:1502.03044.
– start-page: 1473
  year: 2015
  end-page: 1482
  ident: b0095
  article-title: From captions to visual concepts and back
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 8307
  year: 2019
  end-page: 8316
  ident: b0015
  article-title: Show control and tell: a framework for generating controllable and grounded captions
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 7414
  year: 2019
  end-page: 7424
  ident: b0050
  article-title: Towards unsupervised image captioning with shared multimodal embeddings
  publication-title: Proceedings of the IEEE International Conference on Computer Vision
– start-page: 375
  year: 2017
  end-page: 383
  ident: b0120
  article-title: Knowing when to look: adaptive attention via a visual sentinel for image captioning
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 6077
  year: 2018
  end-page: 6086
  ident: b0135
  article-title: Bottom-up and top-down attention for image captioning and visual question answering
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 290
  year: 2017
  end-page: 298
  ident: b0175
  article-title: Deep reinforcement learning-based image captioning with embedding reward
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– volume: 328
  start-page: 56
  year: 2019
  end-page: 68
  ident: b0105
  article-title: Image captioning by incorporating affective concepts learned from both visual and textual components
  publication-title: Neurocomputing
– start-page: 1
  year: 2018
  end-page: 8
  ident: b0140
  article-title: Image captioning with visual-semantic lstm
  publication-title: Proceedings of the International Joint Conference on Artificial Intelligence
– start-page: 5659
  year: 2017
  end-page: 5667
  ident: b0125
  article-title: Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– reference: M. Artetxe, G. Labaka, E. Agirre, K. Cho, Unsupervised neural machine translation, arXiv preprint (2017) arXiv:1710.11041.
– volume: 52
  start-page: 7086
  year: 2014
  end-page: 7098
  ident: b0075
  article-title: Recovering quantitative remote sensing products contaminated by thick clouds and shadows using multitemporal dictionary learning
  publication-title: IEEE Transactions on Geoscience and Remote Sensing
– start-page: 770
  year: 2016
  end-page: 778
  ident: b0220
  article-title: Deep residual learning for image recognition
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 3156
  year: 2015
  end-page: 3164
  ident: b0085
  article-title: Show and tell: a neural image caption generator
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 6837
  year: 2018
  end-page: 6844
  ident: b0100
  article-title: Stack-captioning: coarse-to-fine learning for image captioning
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
– volume: 28
  start-page: 2743
  year: 2019
  end-page: 28754
  ident: b0285
  article-title: Topic-oriented image captioning based on order-embedding
  publication-title: IEEE Transactions on Image Processing
– reference: G. Lample, A. Conneau, L. Denoyer, M. Ranzato, Unsupervised machine translation using monolingual corpora only, arXiv preprint (2017) arXiv:1711.00043.
– start-page: 10482
  year: 2019
  end-page: 10491
  ident: b0065
  article-title: Unsupervised multi-modal neural machine translation
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 376
  year: 2004
  end-page: 380
  ident: b0250
  article-title: a package for automatic evaluation of summaries
  publication-title: Rouge Proceedings of the Annual Meeting on Association for Computational Linguistics Workshop
– start-page: 36
  year: 2018
  end-page: 52
  ident: b0010
  article-title: Shuffle-then-assemble: learning object-agnostic visual relationship features
  publication-title: Proceedings of the European Conference on Computer Vision
– start-page: 784
  year: 2019
  end-page: 792
  ident: b0190
  article-title: Unpaired cross-lingual image caption generation with self-supervised rewards
  publication-title: Proceedings of the ACM International Conference on Multimedia
– start-page: 4250
  year: 2019
  end-page: 4260
  ident: b0025
  article-title: Learning to collocate neural modules for image captioning
  publication-title: Proceedings of the IEEE International Conference on Computer Vision
– volume: 330
  start-page: 17
  year: 2019
  end-page: 28
  ident: b0150
  article-title: 3g structure for image caption generation
  publication-title: Neurocomputing
– volume: 32
  year: 2016
  ident: b0225
  article-title: Fully automated liver segmentation using sobolev gradient-based level set evolution
  publication-title: International Journal for Numerical Methods in Biomedical Engineering
– start-page: 503
  year: 2018
  end-page: 519
  ident: b0045
  article-title: Unpaired image captioning by language pivoting
  publication-title: Proceedings of the European Conference on Computer Vision
– start-page: 1643
  year: 2009
  end-page: 1650
  ident: b0080
  article-title: Multi-label sparse coding for automatic image annotation
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 873
  year: 2017
  end-page: 881
  ident: b0170
  article-title: Improved image captioning via policy gradient optimization of spider
  publication-title: Proceedings of the IEEE International Conference on Computer Vision
– start-page: 4651
  year: 2016
  end-page: 4659
  ident: b0115
  article-title: Image captioning with semantic attention
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 521
  year: 2017
  end-page: 530
  ident: b0180
  article-title: Show, adapt and tell: adversarial training of cross-domain image captioner
  publication-title: Proceedings of the IEEE International Conference on Computer Vision
– start-page: 8591
  year: 2018
  end-page: 8600
  ident: b0195
  article-title: Semstyle: learning to generate stylised image captions using unaligned text
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 3128
  year: 2015
  end-page: 3137
  ident: b0230
  article-title: Deep visual-semantic alignments for generating image descriptions
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 4566
  year: 2015
  end-page: 4575
  ident: b0260
  article-title: Cider: consensus-based image description evaluation
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– volume: 28
  start-page: 5241
  year: 2019
  end-page: 5252
  ident: b0275
  article-title: Self-guiding multimodal lstm–when we do not have a perfect training dataset for image captioning
  publication-title: IEEE Transactions on Image Processing
– start-page: 29
  year: 2017
  end-page: 38
  ident: b0185
  article-title: Dual learning for cross-domain image captioning
  publication-title: Proceedings of the ACM on Conference on Information and Knowledge
– start-page: 2970
  year: 2017
  end-page: 2979
  ident: b0155
  article-title: Towards diverse and natural image descriptions via a conditional gan
  publication-title: Proceedings of the IEEE International Conference on Computer Vision
– start-page: 7206
  year: 2017
  end-page: 7215
  ident: b0130
  article-title: Top-down visual saliency guided by captions
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 4125
  year: 2019
  end-page: 4134
  ident: b0205
  article-title: Unsupervised image captioning
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– volume: 387
  start-page: 91
  year: 2020
  end-page: 99
  ident: b0165
  article-title: Multi-attention generative adversarial network for image captioning
  publication-title: Neurocomputing
– volume: 44
  start-page: 421
  year: 2010
  end-page: 424
  ident: b0235
  article-title: Natural language processing with python, analyzing text with the natural language toolkit
  publication-title: Language Resources and Evaluation
– start-page: 376
  year: 2014
  end-page: 380
  ident: b0255
  article-title: Meteor universal: language specific translation evaluation for any target language
  publication-title: Proceedings of the Annual Meeting on Association for Computational Linguistics Workshop
– start-page: 311
  year: 2002
  end-page: 318
  ident: b0245
  article-title: Bleu: a method for automatic evaluation of machine translation
  publication-title: Proceedings of the Annual Meeting on Association for Computational Linguistics
– volume: 21
  start-page: 4218
  year: 2012
  end-page: 4231
  ident: b0070
  article-title: Cooperative sparse representation in two opposite directions for semi-supervised image annotation
  publication-title: IEEE Transactions on Image Processing
– start-page: 4204
  year: 2019
  end-page: 4213
  ident: b0290
  article-title: Mscap: multi-style image captioning with unpaired stylized text
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– volume: 42
  start-page: 1112
  year: 2019
  end-page: 1131
  ident: b0030
  article-title: Hierarchical lstms with adaptive attention for visual captioning
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
– volume: 28
  start-page: 2008
  year: 2019
  end-page: 2020
  ident: b0035
  article-title: Bi-directional spatial-semantic attention networks for image-text matching
  publication-title: IEEE Transactions on Image Processing
– start-page: 382
  year: 2016
  end-page: 398
  ident: b0265
  article-title: Spice: semantic propositional image caption evaluation
  publication-title: Proceedings of the European Conference on Computer Vision
– reference: K. Diederik, B. Jimmy, Adam: a method for stochastic optimization, arXiv preprint (2014) arXiv:1412.6980.
– start-page: 3128
  year: 2015
  end-page: 3137
  ident: b0090
  article-title: Deep visual-semantic alignments for generating image descriptions
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 784
  year: 2020
  end-page: 792
  ident: b0270
  article-title: Joint commonsense and relation reasoning for image and video captioning
  publication-title: Proceedings of the AAAI Conference on Artificial Intelligence
– volume: 329
  start-page: 476
  year: 2019
  end-page: 485
  ident: b0040
  article-title: A multimodal fusion approach for image captioning
  publication-title: Neurocomputing
– start-page: 10323
  year: 2019
  end-page: 10332
  ident: b0200
  article-title: Unpaired image captioning via scene graph alignments
  publication-title: Proceedings of the IEEE International Conference on Computer Vision
– start-page: 5561
  year: 2018
  end-page: 5570
  ident: b0280
  article-title: Convolutional image captioning
  publication-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 8928
  year: 2019
  end-page: 8937
  ident: b0020
  article-title: Entangled transformer for image captioning
  publication-title: Proceedings of the IEEE International Conference on Computer Vision
– start-page: 311
  year: 2002
  ident: 10.1016/j.neucom.2020.08.019_b0245
  article-title: Bleu: a method for automatic evaluation of machine translation
– volume: 28
  start-page: 2008
  issue: 4
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0035
  article-title: Bi-directional spatial-semantic attention networks for image-text matching
  publication-title: IEEE Transactions on Image Processing
  doi: 10.1109/TIP.2018.2882225
– volume: 387
  start-page: 91
  year: 2020
  ident: 10.1016/j.neucom.2020.08.019_b0165
  article-title: Multi-attention generative adversarial network for image captioning
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2019.12.073
– start-page: 36
  year: 2018
  ident: 10.1016/j.neucom.2020.08.019_b0010
  article-title: Shuffle-then-assemble: learning object-agnostic visual relationship features
– start-page: 784
  year: 2020
  ident: 10.1016/j.neucom.2020.08.019_b0270
  article-title: Joint commonsense and relation reasoning for image and video captioning
– start-page: 290
  year: 2017
  ident: 10.1016/j.neucom.2020.08.019_b0175
  article-title: Deep reinforcement learning-based image captioning with embedding reward
– start-page: 4204
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0290
  article-title: Mscap: multi-style image captioning with unpaired stylized text
– start-page: 4651
  year: 2016
  ident: 10.1016/j.neucom.2020.08.019_b0115
  article-title: Image captioning with semantic attention
– volume: 28
  start-page: 5241
  issue: 11
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0275
  article-title: Self-guiding multimodal lstm–when we do not have a perfect training dataset for image captioning
  publication-title: IEEE Transactions on Image Processing
  doi: 10.1109/TIP.2019.2917229
– start-page: 873
  year: 2017
  ident: 10.1016/j.neucom.2020.08.019_b0170
  article-title: Improved image captioning via policy gradient optimization of spider
– volume: 329
  start-page: 476
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0040
  article-title: A multimodal fusion approach for image captioning
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2018.11.004
– start-page: 375
  year: 2017
  ident: 10.1016/j.neucom.2020.08.019_b0120
  article-title: Knowing when to look: adaptive attention via a visual sentinel for image captioning
– start-page: 4250
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0025
  article-title: Learning to collocate neural modules for image captioning
– start-page: 6077
  year: 2018
  ident: 10.1016/j.neucom.2020.08.019_b0135
  article-title: Bottom-up and top-down attention for image captioning and visual question answering
– start-page: 4566
  year: 2015
  ident: 10.1016/j.neucom.2020.08.019_b0260
  article-title: Cider: consensus-based image description evaluation
– start-page: 3156
  year: 2015
  ident: 10.1016/j.neucom.2020.08.019_b0085
  article-title: Show and tell: a neural image caption generator
– ident: 10.1016/j.neucom.2020.08.019_b0240
– volume: 42
  start-page: 1112
  issue: 5
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0030
  article-title: Hierarchical lstms with adaptive attention for visual captioning
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
– ident: 10.1016/j.neucom.2020.08.019_b0110
– start-page: 10403
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0215
  article-title: Vrr-vg: refocusing visually-relevant relationships
– start-page: 5561
  year: 2018
  ident: 10.1016/j.neucom.2020.08.019_b0280
  article-title: Convolutional image captioning
– start-page: 376
  year: 2004
  ident: 10.1016/j.neucom.2020.08.019_b0250
  article-title: a package for automatic evaluation of summaries
– start-page: 784
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0190
  article-title: Unpaired cross-lingual image caption generation with self-supervised rewards
– ident: 10.1016/j.neucom.2020.08.019_b0060
  doi: 10.18653/v1/D18-1399
– start-page: 3128
  year: 2015
  ident: 10.1016/j.neucom.2020.08.019_b0230
  article-title: Deep visual-semantic alignments for generating image descriptions
– start-page: 6837
  year: 2018
  ident: 10.1016/j.neucom.2020.08.019_b0100
  article-title: Stack-captioning: coarse-to-fine learning for image captioning
– start-page: 29
  year: 2017
  ident: 10.1016/j.neucom.2020.08.019_b0185
  article-title: Dual learning for cross-domain image captioning
– start-page: 382
  year: 2016
  ident: 10.1016/j.neucom.2020.08.019_b0265
  article-title: Spice: semantic propositional image caption evaluation
– start-page: 8307
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0015
  article-title: Show control and tell: a framework for generating controllable and grounded captions
– start-page: 770
  year: 2016
  ident: 10.1016/j.neucom.2020.08.019_b0220
  article-title: Deep residual learning for image recognition
– start-page: 4634
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0145
  article-title: Attention on attention for image captioning
– start-page: 10323
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0200
  article-title: Unpaired image captioning via scene graph alignments
– start-page: 1473
  year: 2015
  ident: 10.1016/j.neucom.2020.08.019_b0095
  article-title: From captions to visual concepts and back
– start-page: 7414
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0050
  article-title: Towards unsupervised image captioning with shared multimodal embeddings
– start-page: 10482
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0065
  article-title: Unsupervised multi-modal neural machine translation
– start-page: 7181
  year: 2018
  ident: 10.1016/j.neucom.2020.08.019_b0005
  article-title: Look, imagine and match: improving textual-visual cross-modal retrieval with generative models
– start-page: 1643
  year: 2009
  ident: 10.1016/j.neucom.2020.08.019_b0080
  article-title: Multi-label sparse coding for automatic image annotation
– start-page: 3128
  year: 2015
  ident: 10.1016/j.neucom.2020.08.019_b0090
  article-title: Deep visual-semantic alignments for generating image descriptions
– volume: 330
  start-page: 17
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0150
  article-title: 3g structure for image caption generation
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2018.10.059
– start-page: 8591
  year: 2018
  ident: 10.1016/j.neucom.2020.08.019_b0195
  article-title: Semstyle: learning to generate stylised image captions using unaligned text
– start-page: 376
  year: 2014
  ident: 10.1016/j.neucom.2020.08.019_b0255
  article-title: Meteor universal: language specific translation evaluation for any target language
– volume: 21
  start-page: 4218
  issue: 9
  year: 2012
  ident: 10.1016/j.neucom.2020.08.019_b0070
  article-title: Cooperative sparse representation in two opposite directions for semi-supervised image annotation
  publication-title: IEEE Transactions on Image Processing
  doi: 10.1109/TIP.2012.2197631
– volume: 44
  start-page: 421
  issue: 4
  year: 2010
  ident: 10.1016/j.neucom.2020.08.019_b0235
  article-title: Natural language processing with python, analyzing text with the natural language toolkit
  publication-title: Language Resources and Evaluation
– volume: 52
  start-page: 7086
  issue: 11
  year: 2014
  ident: 10.1016/j.neucom.2020.08.019_b0075
  article-title: Recovering quantitative remote sensing products contaminated by thick clouds and shadows using multitemporal dictionary learning
  publication-title: IEEE Transactions on Geoscience and Remote Sensing
  doi: 10.1109/TGRS.2014.2307354
– start-page: 1
  year: 2018
  ident: 10.1016/j.neucom.2020.08.019_b0140
  article-title: Image captioning with visual-semantic lstm
– start-page: 503
  year: 2018
  ident: 10.1016/j.neucom.2020.08.019_b0045
  article-title: Unpaired image captioning by language pivoting
– start-page: 2970
  year: 2017
  ident: 10.1016/j.neucom.2020.08.019_b0155
  article-title: Towards diverse and natural image descriptions via a conditional gan
– start-page: 8928
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0020
  article-title: Entangled transformer for image captioning
– ident: 10.1016/j.neucom.2020.08.019_b0210
– start-page: 7206
  year: 2017
  ident: 10.1016/j.neucom.2020.08.019_b0130
  article-title: Top-down visual saliency guided by captions
– start-page: 521
  year: 2017
  ident: 10.1016/j.neucom.2020.08.019_b0180
  article-title: Show, adapt and tell: adversarial training of cross-domain image captioner
– volume: 32
  issue: 11
  year: 2016
  ident: 10.1016/j.neucom.2020.08.019_b0225
  article-title: Fully automated liver segmentation using sobolev gradient-based level set evolution
  publication-title: International Journal for Numerical Methods in Biomedical Engineering
  doi: 10.1002/cnm.2765
– start-page: 4125
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0205
  article-title: Unsupervised image captioning
– start-page: 4135
  year: 2017
  ident: 10.1016/j.neucom.2020.08.019_b0160
  article-title: Speaking the same language: matching machine to human captions by adversarial training
– volume: 328
  start-page: 56
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0105
  article-title: Image captioning by incorporating affective concepts learned from both visual and textual components
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2018.03.078
– volume: 28
  start-page: 2743
  issue: 6
  year: 2019
  ident: 10.1016/j.neucom.2020.08.019_b0285
  article-title: Topic-oriented image captioning based on order-embedding
  publication-title: IEEE Transactions on Image Processing
  doi: 10.1109/TIP.2018.2889922
– ident: 10.1016/j.neucom.2020.08.019_b0055
– start-page: 5659
  year: 2017
  ident: 10.1016/j.neucom.2020.08.019_b0125
  article-title: Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning
SSID ssj0017129
Score 2.4216044
Snippet •Resnet with a new Multi-scale module and adaptive Channel attention is proposed.•Mutual Attention Network is proposed to reason about interactions among...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 419
SubjectTerms Multi-scale feature
Object-object interactions
Unsupervised image caption
Title Interactions Guided Generative Adversarial Network for unsupervised image captioning
URI https://dx.doi.org/10.1016/j.neucom.2020.08.019
Volume 417
WOSCitedRecordID wos000590407200014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: ScienceDirect database
  customDbUrl:
  eissn: 1872-8286
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017129
  issn: 0925-2312
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELaWlgMXKC9RoJUPXIOS2K7j46oqr8OqqIu04hI5sVdN1aZLs6mWH8D_ZuxxHmgRL4lLFFnx2pr59vNkMg9CXhkrE1tIEZVK6IhzEUeFEsvIuEIwcbI0S4bNJuRsli0W6nQy-dblwtxeyrrONhu1-q-qhjFQtkud_Qt19z8KA3APSocrqB2uf6R47-PDdIUGEFAZMCmxuLSPEvIdmBvtm3XMMAbchxq2ddOuHHE08Hx15UJ5Sr0K7tqxCevLeZS-GURwM0yvXLUF46DVuxWOtffBnp0P6Jt6gnurr7-2_djnc4tk42423Uo-8h49sx-r9ks1dk2kGOYhBn_ZVs4MOh5TEYFViRxskXYzmfqE9jEvc0zqDMzKA7PiIc3x6Njif3RFXLyubeuCgdymfIXWbvIPlbXP3FbcTlJgtlSq-A7ZTaVQQI670_cniw_95yiZpFi0MWy9y8H0gYLba_3cxhnZLfM9cj-8cNApAuUhmdj6EXnQNfOggdsfk_kYNxRxQwfc0BFuaMANBdzQMW6oxw0dcPOEfHpzMj9-F4WWG1HJZLqOVFEm2hXZPyqMhMPMZpbZouDMqNLAYWD0Uro8PZlwwwyIynlXMh5nWjBTaMuekp36urbPCAX1xdx9hC0Y2Lgp15lWhbBHDEQMi8T7hHVCystQj961RbnMu8DDixxFmzvR5q5baqL2SdTPWmE9lt88Lzv558GmRFsxB8j8cubzf575gtwb_g0vyc76prUH5G55u66am8OAre8H0p49
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Interactions+Guided+Generative+Adversarial+Network+for+unsupervised+image+captioning&rft.jtitle=Neurocomputing+%28Amsterdam%29&rft.au=Cao%2C+Shan&rft.au=An%2C+Gaoyun&rft.au=Zheng%2C+Zhenxing&rft.au=Ruan%2C+Qiuqi&rft.date=2020-12-05&rft.pub=Elsevier+B.V&rft.issn=0925-2312&rft.eissn=1872-8286&rft.volume=417&rft.spage=419&rft.epage=431&rft_id=info:doi/10.1016%2Fj.neucom.2020.08.019&rft.externalDocID=S0925231220312790
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0925-2312&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0925-2312&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0925-2312&client=summon