Augmented decoding method using semantic diverse beam search for language generation model

Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years. However, current approaches face a critical limitation in semantic diversity. Most diversity-oriented methods evaluate similarity at the surface...

Full description

Saved in:
Bibliographic Details
Published in:Knowledge-based systems Vol. 329; p. 114400
Main Authors: Na, HyungSun, Jun, Hee-Gook, Ahn, Jinhyun, Im, Dong-Hyuk
Format: Journal Article
Language:English
Published: Elsevier B.V 04.11.2025
Subjects:
ISSN:0950-7051
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years. However, current approaches face a critical limitation in semantic diversity. Most diversity-oriented methods evaluate similarity at the surface lexical level, incorrectly treating lexically different but semantically equivalent phrases (e.g., 'dog runs' vs 'canine sprints') as meaningfully diverse outputs. This superficial approach fails to capture true semantic variation. Consequently, generated captions appear different but convey essentially identical meanings. To address this fundamental limitation, we propose Semantic Diverse Beam Search (SDBS), an augmented decoding algorithm that operates in semantic space rather than surface lexical space. SDBS integrates four key innovations: knowledge graph-based semantic similarity scoring, adaptive thresholding for important word focus, statistics-based stratified top-k sampling, and beam size normalization. Additionally, we introduce an early-stop strategy that significantly reduces computational complexity while maintaining generation quality, making SDBS practically viable for real-world applications. Comprehensive experiments demonstrate that SDBS achieves superior performance on both traditional metrics and modern evaluation approaches (BARTScore++, LLM-based assessment), generating captions with genuine semantic diversity while maintaining high accuracy and computational efficiency.
AbstractList Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years. However, current approaches face a critical limitation in semantic diversity. Most diversity-oriented methods evaluate similarity at the surface lexical level, incorrectly treating lexically different but semantically equivalent phrases (e.g., 'dog runs' vs 'canine sprints') as meaningfully diverse outputs. This superficial approach fails to capture true semantic variation. Consequently, generated captions appear different but convey essentially identical meanings. To address this fundamental limitation, we propose Semantic Diverse Beam Search (SDBS), an augmented decoding algorithm that operates in semantic space rather than surface lexical space. SDBS integrates four key innovations: knowledge graph-based semantic similarity scoring, adaptive thresholding for important word focus, statistics-based stratified top-k sampling, and beam size normalization. Additionally, we introduce an early-stop strategy that significantly reduces computational complexity while maintaining generation quality, making SDBS practically viable for real-world applications. Comprehensive experiments demonstrate that SDBS achieves superior performance on both traditional metrics and modern evaluation approaches (BARTScore++, LLM-based assessment), generating captions with genuine semantic diversity while maintaining high accuracy and computational efficiency.
ArticleNumber 114400
Author Jun, Hee-Gook
Im, Dong-Hyuk
Ahn, Jinhyun
Na, HyungSun
Author_xml – sequence: 1
  givenname: HyungSun
  orcidid: 0000-0002-3941-3959
  surname: Na
  fullname: Na, HyungSun
  email: nayosk@kw.ac.kr
  organization: Dept. of Artificial Intelligence Convergence, Kwangwoon University, Seoul, Republic of Korea
– sequence: 2
  givenname: Hee-Gook
  orcidid: 0000-0002-3122-1696
  surname: Jun
  fullname: Jun, Hee-Gook
  email: heegook@finda.co.kr
  organization: Finda Inc., Seoul, Republic of Korea
– sequence: 3
  givenname: Jinhyun
  orcidid: 0000-0002-2331-004X
  surname: Ahn
  fullname: Ahn, Jinhyun
  email: jha@jejunu.ac.kr
  organization: Dept. of Management Information Systems, Jeju National University, Jeju, Republic of Korea
– sequence: 4
  givenname: Dong-Hyuk
  orcidid: 0000-0002-0290-755X
  surname: Im
  fullname: Im, Dong-Hyuk
  email: dhim@kw.ac.kr
  organization: School of Information Convergence, Kwangwoon University, Seoul, Republic of Korea
BookMark eNp9kM1qwzAQhHVIoUnaN-hBL2B3pch2cimE0J9AoJf20otYS2tHaSwVyQnk7evgnnsaZtgZlm_GJj54YuxBQC5AlI-H_NuHdEm5BFnkQigFMGFTWBWQVVCIWzZL6QAAUorllH2tT21HvifLLZlgnW95R_0-WH5KV5OoQ987w607U0zEa8JuSDGaPW9C5Ef07Qlb4i15iti74HkXLB3v2E2Dx0T3fzpnny_PH5u3bPf-ut2sd5mRRdVnJCzUtVzZurG2kgqkQlEtDVBNhLBAUQ4XTdNIVCWW5WqJVA_OohJAKBZzpsZdE0NKkRr9E12H8aIF6CsTfdAjE31lokcmQ-1prNHw29lR1Mk48oasi2R6bYP7f-AXK-NzRg
Cites_doi 10.3390/app11020826
10.3991/ijet.v14i24.12187
10.1162/tacl_a_00166
10.1007/s11042-014-1855-z
10.1007/s11263-015-0816-y
10.1007/s13218-020-00679-2
10.1109/TPAMI.2020.3013834
10.1145/219717.219748
10.1109/JAS.2022.105734
10.3390/app8050739
10.1109/TPAMI.2022.3148210
10.1613/jair.3994
ContentType Journal Article
Copyright 2025 Elsevier B.V.
Copyright_xml – notice: 2025 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.knosys.2025.114400
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
ExternalDocumentID 10_1016_j_knosys_2025_114400
S095070512501439X
GroupedDBID --K
--M
.DC
.~1
0R~
1B1
1~.
1~5
4.4
457
4G.
5VS
7-5
71M
77I
77K
8P~
9JN
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AATTM
AAXKI
AAXUO
AAYFN
AAYWO
ABAOU
ABBOA
ABIVO
ABJNI
ABMAC
ACDAQ
ACGFS
ACLOT
ACRLP
ACVFH
ACZNC
ADBBV
ADCNI
ADEZE
ADGUI
ADTZH
AEBSH
AECPX
AEIPS
AEKER
AENEX
AEUPX
AFJKZ
AFPUW
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIGII
AIIUN
AIKHN
AITUG
AKBMS
AKRWK
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
ANKPU
AOUOD
APXCP
ARUGR
AXJTR
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFKBS
EFLBG
EO8
EO9
EP2
EP3
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
IHE
J1W
JJJVA
KOM
LG9
LY7
M41
MHUIS
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
ROL
RPZ
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SST
SSV
SSW
SSZ
T5K
WH7
XPP
ZMT
~02
~G-
~HD
29L
9DU
AAQXK
AAYXX
ABDPE
ABWVN
ABXDB
ACNNM
ACRPL
ADJOM
ADMUD
ADNMO
AGQPQ
ASPBG
AVWKF
AZFZN
CITATION
EJD
FEDTE
FGOYB
G-2
HLZ
HVGLF
HZ~
R2-
SBC
SET
UHS
WUQ
ID FETCH-LOGICAL-c257t-e1d0bb29dbfdd724024a178c0ebeea03a161d0fff2a46a6698aebff2da410ea13
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001572118500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0950-7051
IngestDate Thu Nov 27 01:00:43 EST 2025
Wed Dec 10 14:25:30 EST 2025
IsPeerReviewed true
IsScholarly true
Keywords Decoding algorithm
Image captioning
Semantic diversity
Beam search
Knowledge graph
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c257t-e1d0bb29dbfdd724024a178c0ebeea03a161d0fff2a46a6698aebff2da410ea13
ORCID 0000-0002-2331-004X
0000-0002-0290-755X
0000-0002-3941-3959
0000-0002-3122-1696
ParticipantIDs crossref_primary_10_1016_j_knosys_2025_114400
elsevier_sciencedirect_doi_10_1016_j_knosys_2025_114400
PublicationCentury 2000
PublicationDate 2025-11-04
PublicationDateYYYYMMDD 2025-11-04
PublicationDate_xml – month: 11
  year: 2025
  text: 2025-11-04
  day: 04
PublicationDecade 2020
PublicationTitle Knowledge-based systems
PublicationYear 2025
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Welleck, Bertsch, Finlayson, Schoelkopf, Xie, Neubig, Kulikov, Harchaoui (bib0013) 2024
Rennie, Marcheret, Mroueh, Ross, Goel (bib0025) 2017
Wang, Chan (bib0012) 2019
Kasai, Sakaguchi, Dunagan, Morrison, Bras, Choi, Smith (bib0044) 2022
Chen, Deng, Wu (bib0010) 2022; 35
Zhang, Li, Fu, Zhang (bib0015) 2019
Sharma, Tripathi, Singh, Tripathi (bib0038) 2015
Im, Park (bib0004) 2015; 74
Shi, Li, Wang (bib0007) 2021
Miller (bib0039) 1995; 38
Russakovsky, Deng, Su, Krause, Satheesh, Ma, Huang, Karpathy, Khosla, Bernstein, Berg, Fei-Fei (bib0002) 2015; 115
Ming, Hu, Fan, Feng, Zhou, Yu (bib0005) 2022; 9
Vinyals, Toshev, Bengio, Erhan (bib0023) 2015
You, Jin, Wang, Fang, Luo (bib0020) 2016
Bahdanau, Cho, Bengio (bib0018) 2015
Sharma, Agrahari, Singh, Firoj, Mishra (bib0001) 2020
Hodosh, Young, Hockenmaier (bib0056) 2013; 47
Wu, Palmer (bib0037) 1994
Guo, Liu, Zhu, Yao, Lu, Lu (bib0027) 2020
Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell, Agarwal, Herbert-Voss, Krueger, Henighan, Child, Ramesh, Ziegler, Wu, Winter, Hesse, Chen, Sigler, Litwin, Gray, Chess, Clark, Berner, McCandlish, Radford, Sutskever, Amodei (bib0031) 2020; 33
Sun, Huang, Wei, Dai, Chen (bib0008) 2020; 34
Xu, Ba, Kiros, Cho, Courville, Salakhutdinov, Zemel, Bengio (bib0024) 2015
Leblond, Alayrac, Sifre, Pislar, Lespiau, Antonoglou, Simonyan, Vinyals (bib0036) 2021
Luo, Ji, Sun, Cao, Wu, Huang, Lin, Ji (bib0028) 2021; 35
Holtzman, Buys, Du, Forbes, Choi (bib0034) 2020
Qian, Yu (bib0017) 2019
Palasundram, Sharef, Nasharuddin, Kasmiran, Azman (bib0016) 2019; 14
Lin (bib0047) 2004
Wang, Wan, Chan (bib0011) 2022; 44
Banerjee, Lavie (bib0048) 2005
Raffel, Shazeer, Roberts, Lee, Narang, Matena, Zhou, Li, Liu (bib0032) 2020; 21
Meister, Vieira, Cotterell (bib0054) 2020
Anderson, Fernando, Johnson, Gould (bib0050) 2016
Biten, Gomez, Rusinol, Karatzas (bib0057) 2019
Biswas, Barz, Sonntag (bib0021) 2020; 34
Stefanini, Cornia, Baraldi, Cascianelli, Fiameni, Cucchiara (bib0022) 2023; 45
Vedantam, Zitnick, Parikh (bib0049) 2015
Young, Lai, Hodosh, Hockenmaier (bib0055) 2014; 2
Zhu, Li, Liu, Peng, Niu (bib0026) 2018; 8
Radford, Wu, Child, Luan, Amodei, Sutskever (bib0030) 2019
Meng, He, Chen, Zhou (bib0006) 2022
Pedersen, Patwardhan, Michelizzi (bib0040) 2004
Papineni, Roukos, Ward, Zhu (bib0046) 2002
Sari, Priyadi, Riskiana (bib0042) 2022
Lin, Maire, Belongie, Hays, Perona, Ramanan, Dollár, Zitnick, Microsoft (bib0043) 2014
Vijayakumar, Cogswell, Selvaraju, Sun, Lee, Crandall, Batra (bib0014) 2018; 32
Wang, Yu, Yu, Dai, Tsvetkov, Cao (bib0029) 2022
Huang, Wang, Chen, Wei (bib0045) 2019
Lu, Qiu, Ding, Zhang, Kocmi, Tao (bib0053) 2024
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (bib0019) 2017; 30
Su, Collier (bib0035) 2023
Matsuoka, Lepage (bib0041) 2011
Yue, Zhang, Yao, Lin, Sun (bib0009) 2021
Kim, Jeon, Rhiu, Ahn, Im (bib0003) 2021; 11
Fan, Lewis, Dauphin (bib0033) 2018
Luo, Shakhnarovich (bib0051) 2019
Lu, Ding, Xie, Zhang, Wong, Tao (bib0052) 2023
Raffel (10.1016/j.knosys.2025.114400_bib0032) 2020; 21
Biten (10.1016/j.knosys.2025.114400_bib0057) 2019
Luo (10.1016/j.knosys.2025.114400_bib0051) 2019
Su (10.1016/j.knosys.2025.114400_bib0035) 2023
Young (10.1016/j.knosys.2025.114400_bib0055) 2014; 2
Im (10.1016/j.knosys.2025.114400_bib0004) 2015; 74
Luo (10.1016/j.knosys.2025.114400_bib0028) 2021; 35
Huang (10.1016/j.knosys.2025.114400_bib0045) 2019
Welleck (10.1016/j.knosys.2025.114400_bib0013) 2024
Kim (10.1016/j.knosys.2025.114400_bib0003) 2021; 11
Russakovsky (10.1016/j.knosys.2025.114400_bib0002) 2015; 115
You (10.1016/j.knosys.2025.114400_bib0020) 2016
Meister (10.1016/j.knosys.2025.114400_bib0054) 2020
Lin (10.1016/j.knosys.2025.114400_bib0043) 2014
Brown (10.1016/j.knosys.2025.114400_bib0031) 2020; 33
Anderson (10.1016/j.knosys.2025.114400_bib0050) 2016
Stefanini (10.1016/j.knosys.2025.114400_bib0022) 2023; 45
Miller (10.1016/j.knosys.2025.114400_bib0039) 1995; 38
Ming (10.1016/j.knosys.2025.114400_bib0005) 2022; 9
Sari (10.1016/j.knosys.2025.114400_bib0042) 2022
Vedantam (10.1016/j.knosys.2025.114400_bib0049) 2015
Yue (10.1016/j.knosys.2025.114400_bib0009) 2021
Vijayakumar (10.1016/j.knosys.2025.114400_bib0014) 2018; 32
Holtzman (10.1016/j.knosys.2025.114400_bib0034) 2020
Sharma (10.1016/j.knosys.2025.114400_bib0001) 2020
Wang (10.1016/j.knosys.2025.114400_bib0011) 2022; 44
Hodosh (10.1016/j.knosys.2025.114400_bib0056) 2013; 47
Leblond (10.1016/j.knosys.2025.114400_bib0036) 2021
Vinyals (10.1016/j.knosys.2025.114400_bib0023) 2015
Palasundram (10.1016/j.knosys.2025.114400_bib0016) 2019; 14
Meng (10.1016/j.knosys.2025.114400_bib0006) 2022
Pedersen (10.1016/j.knosys.2025.114400_bib0040) 2004
Vaswani (10.1016/j.knosys.2025.114400_bib0019) 2017; 30
Sun (10.1016/j.knosys.2025.114400_bib0008) 2020; 34
Lu (10.1016/j.knosys.2025.114400_bib0053) 2024
Rennie (10.1016/j.knosys.2025.114400_bib0025) 2017
Bahdanau (10.1016/j.knosys.2025.114400_bib0018) 2015
Sharma (10.1016/j.knosys.2025.114400_bib0038) 2015
Qian (10.1016/j.knosys.2025.114400_bib0017) 2019
Kasai (10.1016/j.knosys.2025.114400_bib0044) 2022
Lin (10.1016/j.knosys.2025.114400_bib0047) 2004
Xu (10.1016/j.knosys.2025.114400_bib0024) 2015
Matsuoka (10.1016/j.knosys.2025.114400_bib0041) 2011
Radford (10.1016/j.knosys.2025.114400_bib0030) 2019
Wu (10.1016/j.knosys.2025.114400_bib0037) 1994
Guo (10.1016/j.knosys.2025.114400_bib0027) 2020
Lu (10.1016/j.knosys.2025.114400_bib0052) 2023
Banerjee (10.1016/j.knosys.2025.114400_bib0048) 2005
Wang (10.1016/j.knosys.2025.114400_bib0029) 2022
Fan (10.1016/j.knosys.2025.114400_bib0033) 2018
Biswas (10.1016/j.knosys.2025.114400_bib0021) 2020; 34
Chen (10.1016/j.knosys.2025.114400_bib0010) 2022; 35
Zhu (10.1016/j.knosys.2025.114400_bib0026) 2018; 8
Zhang (10.1016/j.knosys.2025.114400_bib0015) 2019
Shi (10.1016/j.knosys.2025.114400_bib0007) 2021
Papineni (10.1016/j.knosys.2025.114400_bib0046) 2002
Wang (10.1016/j.knosys.2025.114400_bib0012) 2019
References_xml – volume: 32
  start-page: 7371
  year: 2018
  end-page: 7379
  ident: bib0014
  article-title: Diverse beam search for improved description of complex scenes
  publication-title: Proc. AAAI Conf. Artif. Intell.
– start-page: 479
  year: 2011
  end-page: 484
  ident: bib0041
  article-title: Ambiguity spotting using WordNet semantic similarity in support to recommended practice for software requirements specifications
  publication-title: Proc. Int. Conf. Natural Lang. Process. Knowl. Eng. (NLPKE)
– volume: 9
  start-page: 1339
  year: 2022
  end-page: 1365
  ident: bib0005
  article-title: Visuals to text: a comprehensive review on automatic image captioning
  publication-title: IEEE/CAA J. Autom. Sinica
– volume: 33
  start-page: 1877
  year: 2020
  end-page: 1901
  ident: bib0031
  article-title: Language models are few-shot learners
  publication-title: Adv. Neural Inf. Process. Syst. (NeurIPS)
– year: 2020
  ident: bib0034
  article-title: The curious case of neural text degeneration
  publication-title: Proc. Int. Conf. Learn. Represent. (ICLR)
– year: 2022
  ident: bib0006
  article-title: IFDID: information Filter upon Diversity-Improved Decoding for Diversity-Faithfulness Tradeoff in NLG
  publication-title: arXiv preprint
– year: 2015
  ident: bib0018
  article-title: Neural machine translation by jointly learning to align and translate
  publication-title: Proc. Int. Conf. Learn. Represent. (ICLR)
– start-page: 740
  year: 2014
  end-page: 755
  ident: bib0043
  article-title: Common objects in context
  publication-title: Proc. Eur. Conf. Comput. Vis. (ECCV)
– volume: 34
  start-page: 8976
  year: 2020
  end-page: 8983
  ident: bib0008
  article-title: Generating diverse translation by manipulating multi-head attention
  publication-title: Proc. AAAI Conf. Artif. Intell.
– start-page: 580
  year: 2021
  end-page: 587
  ident: bib0009
  article-title: CliniQG4QA: generating diverse questions for domain adaptation of clinical question answering
  publication-title: Proc. IEEE Int. Conf. Bioinf. Biomed. (BIBM)
– start-page: 325
  year: 2020
  end-page: 328
  ident: bib0001
  article-title: Image captioning: a comprehensive survey
  publication-title: Proc. Int. Conf. Power Electron. IoT Appl. Renew. Energy Control (PARC)
– year: 2022
  ident: bib0029
  article-title: SimVLM: simple visual language model pretraining with weak supervision
  publication-title: Proc. Int. Conf. Learn. Represent. (ICLR)
– start-page: 889
  year: 2018
  end-page: 898
  ident: bib0033
  article-title: Hierarchical neural story generation
  publication-title: Proc. 56th Annu. Meeting Assoc. Comput. Linguist. (ACL)
– volume: 44
  start-page: 1035
  year: 2022
  end-page: 1049
  ident: bib0011
  article-title: On diversity in image captioning: metrics and methods
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– volume: 34
  start-page: 571
  year: 2020
  end-page: 584
  ident: bib0021
  article-title: Towards explanatory interactive image captioning using top-down and bottom-up features, beam search and re-ranking
  publication-title: Künstl. Intell.
– start-page: 681
  year: 2022
  end-page: 687
  ident: bib0042
  article-title: Implementation of semantic textual similarity between requirement specification and use case description using WUP method (case study: sipjabs application)
  publication-title: Proc. IEEE World AI IoT Congr. (AIIoT)
– start-page: 7008
  year: 2017
  end-page: 7024
  ident: bib0025
  article-title: Self-critical sequence training for image captioning
  publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
– volume: 8
  start-page: 739
  year: 2018
  ident: bib0026
  article-title: Captioning transformer with stacked attention modules
  publication-title: Appl. Sci.
– year: 2024
  ident: bib0013
  article-title: From decoding to meta-generation: inference-time algorithms for large language models
  publication-title: Trans. Mach. Learn. Res. (TMLR)
– start-page: 311
  year: 2002
  end-page: 318
  ident: bib0046
  article-title: BLEU: a method for automatic evaluation of machine translation
  publication-title: Proc. 40th Annu. Meeting Assoc. Comput. Linguist. (ACL)
– start-page: 2187
  year: 2021
  end-page: 2196
  ident: bib0007
  article-title: Partial off-policy learning: balance accuracy and diversity for human-oriented image captioning
  publication-title: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV)
– start-page: 2048
  year: 2015
  end-page: 2057
  ident: bib0024
  article-title: Show, attend and tell: neural image caption generation with visual attention
  publication-title: Proc. Int. Conf. Mach. Learn. (ICML), PMLR
– start-page: 2639
  year: 2019
  end-page: 2649
  ident: bib0017
  article-title: Domain adaptive dialog generation via meta learning
  publication-title: Proc. 57th Annu. Meeting Assoc. Comput. Linguist. (ACL)
– volume: 38
  start-page: 39
  year: 1995
  end-page: 41
  ident: bib0039
  article-title: WordNet: a lexical database for English
  publication-title: Commun. ACM
– start-page: 65
  year: 2005
  end-page: 72
  ident: bib0048
  article-title: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments
  publication-title: Proc. ACL Workshop Intrinsic Extrinsic Eval. Measures Mach. Transl. Summarization
– volume: 45
  start-page: 539
  year: 2023
  end-page: 559
  ident: bib0022
  article-title: From Show to Tell: a Survey on Deep Learning-based Image Captioning
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– volume: 35
  start-page: 2286
  year: 2021
  end-page: 2293
  ident: bib0028
  article-title: Dual-level collaborative transformer for image captioning
  publication-title: Proc. AAAI Conf. Artif. Intell.
– year: 2023
  ident: bib0035
  article-title: Contrastive Search Is What You Need For Neural Text Generation
  publication-title: Trans. Mach. Learn. Res. (TMLR)
– volume: 30
  start-page: 5998
  year: 2017
  end-page: 6008
  ident: bib0019
  article-title: Attention is all you need
  publication-title: Adv. Neural Inf. Process. Syst. (NeurIPS)
– year: 2019
  ident: bib0030
  article-title: OpenAI Technical Report
– volume: 47
  start-page: 853
  year: 2013
  end-page: 899
  ident: bib0056
  article-title: Framing image description as a ranking task: data, models and evaluation metrics
  publication-title: J. Artif. Intell. Res.
– volume: 115
  start-page: 211
  year: 2015
  end-page: 252
  ident: bib0002
  article-title: ImageNet large scale visual recognition challenge
  publication-title: Int. J. Comput. Vis.
– volume: 35
  start-page: 9472
  year: 2022
  end-page: 9485
  ident: bib0010
  article-title: Learning distinct and representative modes for image captioning
  publication-title: Adv. Neural Inf. Process. Syst. (NeurIPS)
– start-page: 133
  year: 1994
  end-page: 138
  ident: bib0037
  article-title: Verbs semantics and lexical selection
  publication-title: Proc. 32nd Annu. Meeting Assoc. Comput. Linguist. (ACL)
– start-page: 25
  year: 2004
  end-page: 29
  ident: bib0040
  article-title: WordNet:similarity – measuring the relatedness of concepts
  publication-title: Proc. AAAI Conf. Artif. Intell.
– start-page: 4566
  year: 2015
  end-page: 4575
  ident: bib0049
  article-title: CIDEr: consensus-based image description evaluation
  publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
– start-page: 4634
  year: 2019
  end-page: 4643
  ident: bib0045
  article-title: Attention on Attention for Image Captioning
  publication-title: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV)
– volume: 14
  start-page: 56
  year: 2019
  end-page: 68
  ident: bib0016
  article-title: Sequence to sequence model performance for education chatbot
  publication-title: Int. J. Emerg. Technol. Learn.
– start-page: 4651
  year: 2016
  end-page: 4659
  ident: bib0020
  article-title: Image captioning with semantic attention
  publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
– start-page: 4195
  year: 2019
  end-page: 4203
  ident: bib0012
  article-title: Describing like humans: on diversity in image captioning
  publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR)
– start-page: 74
  year: 2004
  end-page: 81
  ident: bib0047
  article-title: ROUGE: a package for automatic evaluation of summaries
  publication-title: Proc. ACL Workshop Text Summarization Branches Out
– volume: 74
  start-page: 2273
  year: 2015
  end-page: 2287
  ident: bib0004
  article-title: Linked tag: image annotation using semantic relationships between image tags
  publication-title: Multimed. Tools Appl.
– start-page: 2173
  year: 2020
  end-page: 2185
  ident: bib0054
  article-title: If beam search is the answer, what was the question?
  publication-title: Proc. Conf. Empirical Methods Nat. Lang. Process. (EMNLP)
– start-page: 12466
  year: 2019
  end-page: 12475
  ident: bib0057
  article-title: Good news, everyone! Context driven entity-aware captioning for news images
  publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR)
– start-page: 3156
  year: 2015
  end-page: 3164
  ident: bib0023
  article-title: Show and tell: a neural image caption generator
  publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR)
– volume: 21
  start-page: 1
  year: 2020
  end-page: 67
  ident: bib0032
  article-title: Exploring the limits of transfer learning with a unified text-to-text transformer
  publication-title: J. Mach. Learn. Res.
– start-page: 8410
  year: 2021
  end-page: 8434
  ident: bib0036
  article-title: Machine translation decoding beyond beam search
  publication-title: Proc. Conf. Empirical Methods Nat. Lang. Process. (EMNLP)
– start-page: 3464
  year: 2022
  end-page: 3478
  ident: bib0044
  article-title: Transparent human evaluation for image captioning
  publication-title: Proc. Conf. North Am. Chapter Assoc. Comput. Linguist. (NAACL)
– start-page: 1151
  year: 2019
  end-page: 1161
  ident: bib0015
  article-title: Syntax-enhanced neural machine translation with syntax-aware word representations
  publication-title: Proc. Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. (NAACL-HLT)
– start-page: 8801
  year: 2024
  end-page: 8816
  ident: bib0053
  article-title: Error analysis prompting enables human-like translation evaluation in large language models
  publication-title: Findings Assoc. Comput. Linguist. (ACL)
– start-page: 10327
  year: 2020
  end-page: 10336
  ident: bib0027
  article-title: Normalized and geometry-aware self-attention network for image captioning
  publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR)
– start-page: 382
  year: 2016
  end-page: 398
  ident: bib0050
  article-title: SPICE: semantic propositional image caption evaluation
  publication-title: Proc. Eur. Conf. Comput. Vis. (ECCV)
– start-page: 5892
  year: 2023
  end-page: 5907
  ident: bib0052
  article-title: Toward human-like evaluation for natural language generation with error analysis
  publication-title: Proc. 61st Annu. Meeting Assoc. Comput. Linguist. (ACL)
– start-page: 1
  year: 2015
  end-page: 5
  ident: bib0038
  article-title: Automated patents search through semantic similarity
  publication-title: Proc. Int. Conf. Comput. Commun. Control (IC4)
– year: 2019
  ident: bib0051
  article-title: Analysis of diversity-accuracy tradeoff in image captioning
  publication-title: Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCVW)
– volume: 11
  start-page: 826
  year: 2021
  ident: bib0003
  article-title: Semantic scene graph generation using RDF model and deep learning
  publication-title: Appl. Sci.
– volume: 2
  start-page: 67
  year: 2014
  end-page: 78
  ident: bib0055
  article-title: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions
  publication-title: Trans. Assoc. Comput. Linguist.
– start-page: 4634
  year: 2019
  ident: 10.1016/j.knosys.2025.114400_bib0045
  article-title: Attention on Attention for Image Captioning
– volume: 21
  start-page: 1
  year: 2020
  ident: 10.1016/j.knosys.2025.114400_bib0032
  article-title: Exploring the limits of transfer learning with a unified text-to-text transformer
  publication-title: J. Mach. Learn. Res.
– start-page: 5892
  year: 2023
  ident: 10.1016/j.knosys.2025.114400_bib0052
  article-title: Toward human-like evaluation for natural language generation with error analysis
– start-page: 133
  year: 1994
  ident: 10.1016/j.knosys.2025.114400_bib0037
  article-title: Verbs semantics and lexical selection
– volume: 11
  start-page: 826
  year: 2021
  ident: 10.1016/j.knosys.2025.114400_bib0003
  article-title: Semantic scene graph generation using RDF model and deep learning
  publication-title: Appl. Sci.
  doi: 10.3390/app11020826
– start-page: 681
  year: 2022
  ident: 10.1016/j.knosys.2025.114400_bib0042
  article-title: Implementation of semantic textual similarity between requirement specification and use case description using WUP method (case study: sipjabs application)
– year: 2019
  ident: 10.1016/j.knosys.2025.114400_bib0051
  article-title: Analysis of diversity-accuracy tradeoff in image captioning
– start-page: 740
  year: 2014
  ident: 10.1016/j.knosys.2025.114400_bib0043
  article-title: Common objects in context
– start-page: 2639
  year: 2019
  ident: 10.1016/j.knosys.2025.114400_bib0017
  article-title: Domain adaptive dialog generation via meta learning
– volume: 14
  start-page: 56
  year: 2019
  ident: 10.1016/j.knosys.2025.114400_bib0016
  article-title: Sequence to sequence model performance for education chatbot
  publication-title: Int. J. Emerg. Technol. Learn.
  doi: 10.3991/ijet.v14i24.12187
– start-page: 2173
  year: 2020
  ident: 10.1016/j.knosys.2025.114400_bib0054
  article-title: If beam search is the answer, what was the question?
– volume: 2
  start-page: 67
  year: 2014
  ident: 10.1016/j.knosys.2025.114400_bib0055
  article-title: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions
  publication-title: Trans. Assoc. Comput. Linguist.
  doi: 10.1162/tacl_a_00166
– start-page: 4195
  year: 2019
  ident: 10.1016/j.knosys.2025.114400_bib0012
  article-title: Describing like humans: on diversity in image captioning
– volume: 74
  start-page: 2273
  year: 2015
  ident: 10.1016/j.knosys.2025.114400_bib0004
  article-title: Linked tag: image annotation using semantic relationships between image tags
  publication-title: Multimed. Tools Appl.
  doi: 10.1007/s11042-014-1855-z
– volume: 35
  start-page: 9472
  year: 2022
  ident: 10.1016/j.knosys.2025.114400_bib0010
  article-title: Learning distinct and representative modes for image captioning
– year: 2020
  ident: 10.1016/j.knosys.2025.114400_bib0034
  article-title: The curious case of neural text degeneration
– volume: 115
  start-page: 211
  year: 2015
  ident: 10.1016/j.knosys.2025.114400_bib0002
  article-title: ImageNet large scale visual recognition challenge
  publication-title: Int. J. Comput. Vis.
  doi: 10.1007/s11263-015-0816-y
– start-page: 74
  year: 2004
  ident: 10.1016/j.knosys.2025.114400_bib0047
  article-title: ROUGE: a package for automatic evaluation of summaries
– start-page: 580
  year: 2021
  ident: 10.1016/j.knosys.2025.114400_bib0009
  article-title: CliniQG4QA: generating diverse questions for domain adaptation of clinical question answering
– start-page: 25
  year: 2004
  ident: 10.1016/j.knosys.2025.114400_bib0040
  article-title: WordNet:similarity – measuring the relatedness of concepts
– volume: 30
  start-page: 5998
  year: 2017
  ident: 10.1016/j.knosys.2025.114400_bib0019
  article-title: Attention is all you need
– start-page: 10327
  year: 2020
  ident: 10.1016/j.knosys.2025.114400_bib0027
  article-title: Normalized and geometry-aware self-attention network for image captioning
– year: 2022
  ident: 10.1016/j.knosys.2025.114400_bib0006
  article-title: IFDID: information Filter upon Diversity-Improved Decoding for Diversity-Faithfulness Tradeoff in NLG
  publication-title: arXiv preprint
– start-page: 8410
  year: 2021
  ident: 10.1016/j.knosys.2025.114400_bib0036
  article-title: Machine translation decoding beyond beam search
– start-page: 2187
  year: 2021
  ident: 10.1016/j.knosys.2025.114400_bib0007
  article-title: Partial off-policy learning: balance accuracy and diversity for human-oriented image captioning
– start-page: 12466
  year: 2019
  ident: 10.1016/j.knosys.2025.114400_bib0057
  article-title: Good news, everyone! Context driven entity-aware captioning for news images
– volume: 34
  start-page: 571
  year: 2020
  ident: 10.1016/j.knosys.2025.114400_bib0021
  article-title: Towards explanatory interactive image captioning using top-down and bottom-up features, beam search and re-ranking
  publication-title: Künstl. Intell.
  doi: 10.1007/s13218-020-00679-2
– volume: 44
  start-page: 1035
  year: 2022
  ident: 10.1016/j.knosys.2025.114400_bib0011
  article-title: On diversity in image captioning: metrics and methods
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2020.3013834
– volume: 32
  start-page: 7371
  year: 2018
  ident: 10.1016/j.knosys.2025.114400_bib0014
  article-title: Diverse beam search for improved description of complex scenes
– start-page: 7008
  year: 2017
  ident: 10.1016/j.knosys.2025.114400_bib0025
  article-title: Self-critical sequence training for image captioning
– start-page: 479
  year: 2011
  ident: 10.1016/j.knosys.2025.114400_bib0041
  article-title: Ambiguity spotting using WordNet semantic similarity in support to recommended practice for software requirements specifications
– start-page: 3464
  year: 2022
  ident: 10.1016/j.knosys.2025.114400_bib0044
  article-title: Transparent human evaluation for image captioning
– start-page: 889
  year: 2018
  ident: 10.1016/j.knosys.2025.114400_bib0033
  article-title: Hierarchical neural story generation
– start-page: 65
  year: 2005
  ident: 10.1016/j.knosys.2025.114400_bib0048
  article-title: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments
– year: 2024
  ident: 10.1016/j.knosys.2025.114400_bib0013
  article-title: From decoding to meta-generation: inference-time algorithms for large language models
  publication-title: Trans. Mach. Learn. Res. (TMLR)
– volume: 34
  start-page: 8976
  year: 2020
  ident: 10.1016/j.knosys.2025.114400_bib0008
  article-title: Generating diverse translation by manipulating multi-head attention
– volume: 33
  start-page: 1877
  year: 2020
  ident: 10.1016/j.knosys.2025.114400_bib0031
  article-title: Language models are few-shot learners
– start-page: 311
  year: 2002
  ident: 10.1016/j.knosys.2025.114400_bib0046
  article-title: BLEU: a method for automatic evaluation of machine translation
– start-page: 4566
  year: 2015
  ident: 10.1016/j.knosys.2025.114400_bib0049
  article-title: CIDEr: consensus-based image description evaluation
– start-page: 2048
  year: 2015
  ident: 10.1016/j.knosys.2025.114400_bib0024
  article-title: Show, attend and tell: neural image caption generation with visual attention
– volume: 38
  start-page: 39
  year: 1995
  ident: 10.1016/j.knosys.2025.114400_bib0039
  article-title: WordNet: a lexical database for English
  publication-title: Commun. ACM
  doi: 10.1145/219717.219748
– volume: 35
  start-page: 2286
  year: 2021
  ident: 10.1016/j.knosys.2025.114400_bib0028
  article-title: Dual-level collaborative transformer for image captioning
– start-page: 3156
  year: 2015
  ident: 10.1016/j.knosys.2025.114400_bib0023
  article-title: Show and tell: a neural image caption generator
– start-page: 1
  year: 2015
  ident: 10.1016/j.knosys.2025.114400_bib0038
  article-title: Automated patents search through semantic similarity
– volume: 9
  start-page: 1339
  year: 2022
  ident: 10.1016/j.knosys.2025.114400_bib0005
  article-title: Visuals to text: a comprehensive review on automatic image captioning
  publication-title: IEEE/CAA J. Autom. Sinica
  doi: 10.1109/JAS.2022.105734
– start-page: 382
  year: 2016
  ident: 10.1016/j.knosys.2025.114400_bib0050
  article-title: SPICE: semantic propositional image caption evaluation
– year: 2022
  ident: 10.1016/j.knosys.2025.114400_bib0029
  article-title: SimVLM: simple visual language model pretraining with weak supervision
– year: 2019
  ident: 10.1016/j.knosys.2025.114400_bib0030
– start-page: 325
  year: 2020
  ident: 10.1016/j.knosys.2025.114400_bib0001
  article-title: Image captioning: a comprehensive survey
– start-page: 1151
  year: 2019
  ident: 10.1016/j.knosys.2025.114400_bib0015
  article-title: Syntax-enhanced neural machine translation with syntax-aware word representations
– year: 2015
  ident: 10.1016/j.knosys.2025.114400_bib0018
  article-title: Neural machine translation by jointly learning to align and translate
– start-page: 4651
  year: 2016
  ident: 10.1016/j.knosys.2025.114400_bib0020
  article-title: Image captioning with semantic attention
– volume: 8
  start-page: 739
  year: 2018
  ident: 10.1016/j.knosys.2025.114400_bib0026
  article-title: Captioning transformer with stacked attention modules
  publication-title: Appl. Sci.
  doi: 10.3390/app8050739
– start-page: 8801
  year: 2024
  ident: 10.1016/j.knosys.2025.114400_bib0053
  article-title: Error analysis prompting enables human-like translation evaluation in large language models
– volume: 45
  start-page: 539
  year: 2023
  ident: 10.1016/j.knosys.2025.114400_bib0022
  article-title: From Show to Tell: a Survey on Deep Learning-based Image Captioning
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2022.3148210
– year: 2023
  ident: 10.1016/j.knosys.2025.114400_bib0035
  article-title: Contrastive Search Is What You Need For Neural Text Generation
– volume: 47
  start-page: 853
  year: 2013
  ident: 10.1016/j.knosys.2025.114400_bib0056
  article-title: Framing image description as a ranking task: data, models and evaluation metrics
  publication-title: J. Artif. Intell. Res.
  doi: 10.1613/jair.3994
SSID ssj0002218
Score 2.4382029
Snippet Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years....
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 114400
SubjectTerms Beam search
Decoding algorithm
Image captioning
Knowledge graph
Semantic diversity
Title Augmented decoding method using semantic diverse beam search for language generation model
URI https://dx.doi.org/10.1016/j.knosys.2025.114400
Volume 329
WOSCitedRecordID wos001572118500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  issn: 0950-7051
  databaseCode: AIEXJ
  dateStart: 19950201
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: false
  ssIdentifier: ssj0002218
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9tAEF5R4NBLX7QqfWkPvVmL1o7tXR-jihaoFCFBpYiLtd5dQ0BZEIkr-PedfdmRUlUUiYuTWPHanvky83kyD4S-SnC6WtiBJmmekVy2lDRFIUnaAvkoq5HgWrlhE2wy4dNpdRyS2BdunAAzht_dVTdPqmrYB8q2pbP_oe5-UdgB70HpsAW1w_ZBih93567RpkoUPFq6mhU_JjrpXFxgoecgzZlMlEvJ0EmjxTwJ0Q-bdBhDmHa6sg4AcQNzVonszxiLI9YPqtARuifoE8dJD-7BlJx0Pf6OOuNdnSY_Irm3YLvwBSIzc3E_fPlw7gm-OSewztVqeCIrXJ3eEJ5cr5sJwUdKGA2tZoMdHvnQx5pN9-GFy70rcw33smdPYjsc55QOPqzPLDyxS9uVM9e5sJo-Q1sZKyqw2Vvjw_3pUe-ms8wFf_tLiXWVLvlv_Vx_5y0rXOT0FXoRHiLw2Cv_NdrQ5g16GQd04GCvd9BZjwUcsYA9FrDDAo5YwAEL2GIBeyxgwAKOWMADFrDDwlv06_v-6bcDEoZpEAlWeUl0qmjTZJVqWqWY_U8tFynjksKvWAs6EkD9FW3bNhN5Kcqy4kI38EmJPKVapKN3aNNcG_0eYVk0bSVVXjDGc3jhDS_BUxSCi8KWhO0iEkVV3_ieKXVMJrysvWhrK9rai3YXsSjPOvA-z-dqgMA_j_zw6CM_oucDWj-hzeVtpz-jbfl7OVvcfglY-QOZ3YI1
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Augmented+decoding+method+using+semantic+diverse+beam+search+for+language+generation+model&rft.jtitle=Knowledge-based+systems&rft.au=Na%2C+HyungSun&rft.au=Jun%2C+Hee-Gook&rft.au=Ahn%2C+Jinhyun&rft.au=Im%2C+Dong-Hyuk&rft.date=2025-11-04&rft.pub=Elsevier+B.V&rft.issn=0950-7051&rft.volume=329&rft_id=info:doi/10.1016%2Fj.knosys.2025.114400&rft.externalDocID=S095070512501439X
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0950-7051&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0950-7051&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0950-7051&client=summon