Augmented decoding method using semantic diverse beam search for language generation model

Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years. However, current approaches face a critical limitation in semantic diversity. Most diversity-oriented methods evaluate similarity at the surface...

Full description

Saved in:

Bibliographic Details
Published in:	Knowledge-based systems Vol. 329; p. 114400
Main Authors:	Na, HyungSun, Jun, Hee-Gook, Ahn, Jinhyun, Im, Dong-Hyuk
Format:	Journal Article
Language:	English
Published:	Elsevier B.V 04.11.2025
Subjects:	Beam search Decoding algorithm Image captioning Knowledge graph Semantic diversity Decoding algorithm Image captioning Semantic diversity Beam search Knowledge graph
ISSN:	0950-7051
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years. However, current approaches face a critical limitation in semantic diversity. Most diversity-oriented methods evaluate similarity at the surface lexical level, incorrectly treating lexically different but semantically equivalent phrases (e.g., 'dog runs' vs 'canine sprints') as meaningfully diverse outputs. This superficial approach fails to capture true semantic variation. Consequently, generated captions appear different but convey essentially identical meanings. To address this fundamental limitation, we propose Semantic Diverse Beam Search (SDBS), an augmented decoding algorithm that operates in semantic space rather than surface lexical space. SDBS integrates four key innovations: knowledge graph-based semantic similarity scoring, adaptive thresholding for important word focus, statistics-based stratified top-k sampling, and beam size normalization. Additionally, we introduce an early-stop strategy that significantly reduces computational complexity while maintaining generation quality, making SDBS practically viable for real-world applications. Comprehensive experiments demonstrate that SDBS achieves superior performance on both traditional metrics and modern evaluation approaches (BARTScore++, LLM-based assessment), generating captions with genuine semantic diversity while maintaining high accuracy and computational efficiency.
AbstractList	Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years. However, current approaches face a critical limitation in semantic diversity. Most diversity-oriented methods evaluate similarity at the surface lexical level, incorrectly treating lexically different but semantically equivalent phrases (e.g., 'dog runs' vs 'canine sprints') as meaningfully diverse outputs. This superficial approach fails to capture true semantic variation. Consequently, generated captions appear different but convey essentially identical meanings. To address this fundamental limitation, we propose Semantic Diverse Beam Search (SDBS), an augmented decoding algorithm that operates in semantic space rather than surface lexical space. SDBS integrates four key innovations: knowledge graph-based semantic similarity scoring, adaptive thresholding for important word focus, statistics-based stratified top-k sampling, and beam size normalization. Additionally, we introduce an early-stop strategy that significantly reduces computational complexity while maintaining generation quality, making SDBS practically viable for real-world applications. Comprehensive experiments demonstrate that SDBS achieves superior performance on both traditional metrics and modern evaluation approaches (BARTScore++, LLM-based assessment), generating captions with genuine semantic diversity while maintaining high accuracy and computational efficiency.
ArticleNumber	114400
Author	Jun, Hee-Gook Im, Dong-Hyuk Ahn, Jinhyun Na, HyungSun
Author_xml	– sequence: 1 givenname: HyungSun orcidid: 0000-0002-3941-3959 surname: Na fullname: Na, HyungSun email: nayosk@kw.ac.kr organization: Dept. of Artificial Intelligence Convergence, Kwangwoon University, Seoul, Republic of Korea – sequence: 2 givenname: Hee-Gook orcidid: 0000-0002-3122-1696 surname: Jun fullname: Jun, Hee-Gook email: heegook@finda.co.kr organization: Finda Inc., Seoul, Republic of Korea – sequence: 3 givenname: Jinhyun orcidid: 0000-0002-2331-004X surname: Ahn fullname: Ahn, Jinhyun email: jha@jejunu.ac.kr organization: Dept. of Management Information Systems, Jeju National University, Jeju, Republic of Korea – sequence: 4 givenname: Dong-Hyuk orcidid: 0000-0002-0290-755X surname: Im fullname: Im, Dong-Hyuk email: dhim@kw.ac.kr organization: School of Information Convergence, Kwangwoon University, Seoul, Republic of Korea
BookMark	eNp9kM1qwzAQhHVIoUnaN-hBL2B3pch2cimE0J9AoJf20otYS2tHaSwVyQnk7evgnnsaZtgZlm_GJj54YuxBQC5AlI-H_NuHdEm5BFnkQigFMGFTWBWQVVCIWzZL6QAAUorllH2tT21HvifLLZlgnW95R_0-WH5KV5OoQ987w607U0zEa8JuSDGaPW9C5Ef07Qlb4i15iti74HkXLB3v2E2Dx0T3fzpnny_PH5u3bPf-ut2sd5mRRdVnJCzUtVzZurG2kgqkQlEtDVBNhLBAUQ4XTdNIVCWW5WqJVA_OohJAKBZzpsZdE0NKkRr9E12H8aIF6CsTfdAjE31lokcmQ-1prNHw29lR1Mk48oasi2R6bYP7f-AXK-NzRg
Cites_doi	10.3390/app11020826 10.3991/ijet.v14i24.12187 10.1162/tacl_a_00166 10.1007/s11042-014-1855-z 10.1007/s11263-015-0816-y 10.1007/s13218-020-00679-2 10.1109/TPAMI.2020.3013834 10.1145/219717.219748 10.1109/JAS.2022.105734 10.3390/app8050739 10.1109/TPAMI.2022.3148210 10.1613/jair.3994
ContentType	Journal Article
Copyright	2025 Elsevier B.V.
Copyright_xml	– notice: 2025 Elsevier B.V.
DBID	AAYXX CITATION
DOI	10.1016/j.knosys.2025.114400
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
ExternalDocumentID	10_1016_j_knosys_2025_114400 S095070512501439X
GroupedDBID	--K --M .DC .~1 0R~ 1B1 1~. 1~5 4.4 457 4G. 5VS 7-5 71M 77I 77K 8P~ 9JN AAEDT AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AATTM AAXKI AAXUO AAYFN AAYWO ABAOU ABBOA ABIVO ABJNI ABMAC ACDAQ ACGFS ACLOT ACRLP ACVFH ACZNC ADBBV ADCNI ADEZE ADGUI ADTZH AEBSH AECPX AEIPS AEKER AENEX AEUPX AFJKZ AFPUW AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIGII AIIUN AIKHN AITUG AKBMS AKRWK AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ ANKPU AOUOD APXCP ARUGR AXJTR BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFKBS EFLBG EO8 EO9 EP2 EP3 FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ IHE J1W JJJVA KOM LG9 LY7 M41 MHUIS MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 ROL RPZ SDF SDG SDP SES SEW SPC SPCBC SST SSV SSW SSZ T5K WH7 XPP ZMT ~02 ~G- ~HD 29L 9DU AAQXK AAYXX ABDPE ABWVN ABXDB ACNNM ACRPL ADJOM ADMUD ADNMO AGQPQ ASPBG AVWKF AZFZN CITATION EJD FEDTE FGOYB G-2 HLZ HVGLF HZ~ R2- SBC SET UHS WUQ
ID	FETCH-LOGICAL-c257t-e1d0bb29dbfdd724024a178c0ebeea03a161d0fff2a46a6698aebff2da410ea13
ISICitedReferencesCount	0
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001572118500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	0950-7051
IngestDate	Thu Nov 27 01:00:43 EST 2025 Wed Dec 10 14:25:30 EST 2025
IsPeerReviewed	true
IsScholarly	true
Keywords	Decoding algorithm Image captioning Semantic diversity Beam search Knowledge graph
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c257t-e1d0bb29dbfdd724024a178c0ebeea03a161d0fff2a46a6698aebff2da410ea13
ORCID	0000-0002-2331-004X 0000-0002-0290-755X 0000-0002-3941-3959 0000-0002-3122-1696
ParticipantIDs	crossref_primary_10_1016_j_knosys_2025_114400 elsevier_sciencedirect_doi_10_1016_j_knosys_2025_114400
PublicationCentury	2000
PublicationDate	2025-11-04
PublicationDateYYYYMMDD	2025-11-04
PublicationDate_xml	– month: 11 year: 2025 text: 2025-11-04 day: 04
PublicationDecade	2020
PublicationTitle	Knowledge-based systems
PublicationYear	2025
Publisher	Elsevier B.V
Publisher_xml	– name: Elsevier B.V
References	Welleck, Bertsch, Finlayson, Schoelkopf, Xie, Neubig, Kulikov, Harchaoui (bib0013) 2024 Rennie, Marcheret, Mroueh, Ross, Goel (bib0025) 2017 Wang, Chan (bib0012) 2019 Kasai, Sakaguchi, Dunagan, Morrison, Bras, Choi, Smith (bib0044) 2022 Chen, Deng, Wu (bib0010) 2022; 35 Zhang, Li, Fu, Zhang (bib0015) 2019 Sharma, Tripathi, Singh, Tripathi (bib0038) 2015 Im, Park (bib0004) 2015; 74 Shi, Li, Wang (bib0007) 2021 Miller (bib0039) 1995; 38 Russakovsky, Deng, Su, Krause, Satheesh, Ma, Huang, Karpathy, Khosla, Bernstein, Berg, Fei-Fei (bib0002) 2015; 115 Ming, Hu, Fan, Feng, Zhou, Yu (bib0005) 2022; 9 Vinyals, Toshev, Bengio, Erhan (bib0023) 2015 You, Jin, Wang, Fang, Luo (bib0020) 2016 Bahdanau, Cho, Bengio (bib0018) 2015 Sharma, Agrahari, Singh, Firoj, Mishra (bib0001) 2020 Hodosh, Young, Hockenmaier (bib0056) 2013; 47 Wu, Palmer (bib0037) 1994 Guo, Liu, Zhu, Yao, Lu, Lu (bib0027) 2020 Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell, Agarwal, Herbert-Voss, Krueger, Henighan, Child, Ramesh, Ziegler, Wu, Winter, Hesse, Chen, Sigler, Litwin, Gray, Chess, Clark, Berner, McCandlish, Radford, Sutskever, Amodei (bib0031) 2020; 33 Sun, Huang, Wei, Dai, Chen (bib0008) 2020; 34 Xu, Ba, Kiros, Cho, Courville, Salakhutdinov, Zemel, Bengio (bib0024) 2015 Leblond, Alayrac, Sifre, Pislar, Lespiau, Antonoglou, Simonyan, Vinyals (bib0036) 2021 Luo, Ji, Sun, Cao, Wu, Huang, Lin, Ji (bib0028) 2021; 35 Holtzman, Buys, Du, Forbes, Choi (bib0034) 2020 Qian, Yu (bib0017) 2019 Palasundram, Sharef, Nasharuddin, Kasmiran, Azman (bib0016) 2019; 14 Lin (bib0047) 2004 Wang, Wan, Chan (bib0011) 2022; 44 Banerjee, Lavie (bib0048) 2005 Raffel, Shazeer, Roberts, Lee, Narang, Matena, Zhou, Li, Liu (bib0032) 2020; 21 Meister, Vieira, Cotterell (bib0054) 2020 Anderson, Fernando, Johnson, Gould (bib0050) 2016 Biten, Gomez, Rusinol, Karatzas (bib0057) 2019 Biswas, Barz, Sonntag (bib0021) 2020; 34 Stefanini, Cornia, Baraldi, Cascianelli, Fiameni, Cucchiara (bib0022) 2023; 45 Vedantam, Zitnick, Parikh (bib0049) 2015 Young, Lai, Hodosh, Hockenmaier (bib0055) 2014; 2 Zhu, Li, Liu, Peng, Niu (bib0026) 2018; 8 Radford, Wu, Child, Luan, Amodei, Sutskever (bib0030) 2019 Meng, He, Chen, Zhou (bib0006) 2022 Pedersen, Patwardhan, Michelizzi (bib0040) 2004 Papineni, Roukos, Ward, Zhu (bib0046) 2002 Sari, Priyadi, Riskiana (bib0042) 2022 Lin, Maire, Belongie, Hays, Perona, Ramanan, Dollár, Zitnick, Microsoft (bib0043) 2014 Vijayakumar, Cogswell, Selvaraju, Sun, Lee, Crandall, Batra (bib0014) 2018; 32 Wang, Yu, Yu, Dai, Tsvetkov, Cao (bib0029) 2022 Huang, Wang, Chen, Wei (bib0045) 2019 Lu, Qiu, Ding, Zhang, Kocmi, Tao (bib0053) 2024 Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (bib0019) 2017; 30 Su, Collier (bib0035) 2023 Matsuoka, Lepage (bib0041) 2011 Yue, Zhang, Yao, Lin, Sun (bib0009) 2021 Kim, Jeon, Rhiu, Ahn, Im (bib0003) 2021; 11 Fan, Lewis, Dauphin (bib0033) 2018 Luo, Shakhnarovich (bib0051) 2019 Lu, Ding, Xie, Zhang, Wong, Tao (bib0052) 2023 Raffel (10.1016/j.knosys.2025.114400_bib0032) 2020; 21 Biten (10.1016/j.knosys.2025.114400_bib0057) 2019 Luo (10.1016/j.knosys.2025.114400_bib0051) 2019 Su (10.1016/j.knosys.2025.114400_bib0035) 2023 Young (10.1016/j.knosys.2025.114400_bib0055) 2014; 2 Im (10.1016/j.knosys.2025.114400_bib0004) 2015; 74 Luo (10.1016/j.knosys.2025.114400_bib0028) 2021; 35 Huang (10.1016/j.knosys.2025.114400_bib0045) 2019 Welleck (10.1016/j.knosys.2025.114400_bib0013) 2024 Kim (10.1016/j.knosys.2025.114400_bib0003) 2021; 11 Russakovsky (10.1016/j.knosys.2025.114400_bib0002) 2015; 115 You (10.1016/j.knosys.2025.114400_bib0020) 2016 Meister (10.1016/j.knosys.2025.114400_bib0054) 2020 Lin (10.1016/j.knosys.2025.114400_bib0043) 2014 Brown (10.1016/j.knosys.2025.114400_bib0031) 2020; 33 Anderson (10.1016/j.knosys.2025.114400_bib0050) 2016 Stefanini (10.1016/j.knosys.2025.114400_bib0022) 2023; 45 Miller (10.1016/j.knosys.2025.114400_bib0039) 1995; 38 Ming (10.1016/j.knosys.2025.114400_bib0005) 2022; 9 Sari (10.1016/j.knosys.2025.114400_bib0042) 2022 Vedantam (10.1016/j.knosys.2025.114400_bib0049) 2015 Yue (10.1016/j.knosys.2025.114400_bib0009) 2021 Vijayakumar (10.1016/j.knosys.2025.114400_bib0014) 2018; 32 Holtzman (10.1016/j.knosys.2025.114400_bib0034) 2020 Sharma (10.1016/j.knosys.2025.114400_bib0001) 2020 Wang (10.1016/j.knosys.2025.114400_bib0011) 2022; 44 Hodosh (10.1016/j.knosys.2025.114400_bib0056) 2013; 47 Leblond (10.1016/j.knosys.2025.114400_bib0036) 2021 Vinyals (10.1016/j.knosys.2025.114400_bib0023) 2015 Palasundram (10.1016/j.knosys.2025.114400_bib0016) 2019; 14 Meng (10.1016/j.knosys.2025.114400_bib0006) 2022 Pedersen (10.1016/j.knosys.2025.114400_bib0040) 2004 Vaswani (10.1016/j.knosys.2025.114400_bib0019) 2017; 30 Sun (10.1016/j.knosys.2025.114400_bib0008) 2020; 34 Lu (10.1016/j.knosys.2025.114400_bib0053) 2024 Rennie (10.1016/j.knosys.2025.114400_bib0025) 2017 Bahdanau (10.1016/j.knosys.2025.114400_bib0018) 2015 Sharma (10.1016/j.knosys.2025.114400_bib0038) 2015 Qian (10.1016/j.knosys.2025.114400_bib0017) 2019 Kasai (10.1016/j.knosys.2025.114400_bib0044) 2022 Lin (10.1016/j.knosys.2025.114400_bib0047) 2004 Xu (10.1016/j.knosys.2025.114400_bib0024) 2015 Matsuoka (10.1016/j.knosys.2025.114400_bib0041) 2011 Radford (10.1016/j.knosys.2025.114400_bib0030) 2019 Wu (10.1016/j.knosys.2025.114400_bib0037) 1994 Guo (10.1016/j.knosys.2025.114400_bib0027) 2020 Lu (10.1016/j.knosys.2025.114400_bib0052) 2023 Banerjee (10.1016/j.knosys.2025.114400_bib0048) 2005 Wang (10.1016/j.knosys.2025.114400_bib0029) 2022 Fan (10.1016/j.knosys.2025.114400_bib0033) 2018 Biswas (10.1016/j.knosys.2025.114400_bib0021) 2020; 34 Chen (10.1016/j.knosys.2025.114400_bib0010) 2022; 35 Zhu (10.1016/j.knosys.2025.114400_bib0026) 2018; 8 Zhang (10.1016/j.knosys.2025.114400_bib0015) 2019 Shi (10.1016/j.knosys.2025.114400_bib0007) 2021 Papineni (10.1016/j.knosys.2025.114400_bib0046) 2002 Wang (10.1016/j.knosys.2025.114400_bib0012) 2019
References_xml	– volume: 32 start-page: 7371 year: 2018 end-page: 7379 ident: bib0014 article-title: Diverse beam search for improved description of complex scenes publication-title: Proc. AAAI Conf. Artif. Intell. – start-page: 479 year: 2011 end-page: 484 ident: bib0041 article-title: Ambiguity spotting using WordNet semantic similarity in support to recommended practice for software requirements specifications publication-title: Proc. Int. Conf. Natural Lang. Process. Knowl. Eng. (NLPKE) – volume: 9 start-page: 1339 year: 2022 end-page: 1365 ident: bib0005 article-title: Visuals to text: a comprehensive review on automatic image captioning publication-title: IEEE/CAA J. Autom. Sinica – volume: 33 start-page: 1877 year: 2020 end-page: 1901 ident: bib0031 article-title: Language models are few-shot learners publication-title: Adv. Neural Inf. Process. Syst. (NeurIPS) – year: 2020 ident: bib0034 article-title: The curious case of neural text degeneration publication-title: Proc. Int. Conf. Learn. Represent. (ICLR) – year: 2022 ident: bib0006 article-title: IFDID: information Filter upon Diversity-Improved Decoding for Diversity-Faithfulness Tradeoff in NLG publication-title: arXiv preprint – year: 2015 ident: bib0018 article-title: Neural machine translation by jointly learning to align and translate publication-title: Proc. Int. Conf. Learn. Represent. (ICLR) – start-page: 740 year: 2014 end-page: 755 ident: bib0043 article-title: Common objects in context publication-title: Proc. Eur. Conf. Comput. Vis. (ECCV) – volume: 34 start-page: 8976 year: 2020 end-page: 8983 ident: bib0008 article-title: Generating diverse translation by manipulating multi-head attention publication-title: Proc. AAAI Conf. Artif. Intell. – start-page: 580 year: 2021 end-page: 587 ident: bib0009 article-title: CliniQG4QA: generating diverse questions for domain adaptation of clinical question answering publication-title: Proc. IEEE Int. Conf. Bioinf. Biomed. (BIBM) – start-page: 325 year: 2020 end-page: 328 ident: bib0001 article-title: Image captioning: a comprehensive survey publication-title: Proc. Int. Conf. Power Electron. IoT Appl. Renew. Energy Control (PARC) – year: 2022 ident: bib0029 article-title: SimVLM: simple visual language model pretraining with weak supervision publication-title: Proc. Int. Conf. Learn. Represent. (ICLR) – start-page: 889 year: 2018 end-page: 898 ident: bib0033 article-title: Hierarchical neural story generation publication-title: Proc. 56th Annu. Meeting Assoc. Comput. Linguist. (ACL) – volume: 44 start-page: 1035 year: 2022 end-page: 1049 ident: bib0011 article-title: On diversity in image captioning: metrics and methods publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 34 start-page: 571 year: 2020 end-page: 584 ident: bib0021 article-title: Towards explanatory interactive image captioning using top-down and bottom-up features, beam search and re-ranking publication-title: Künstl. Intell. – start-page: 681 year: 2022 end-page: 687 ident: bib0042 article-title: Implementation of semantic textual similarity between requirement specification and use case description using WUP method (case study: sipjabs application) publication-title: Proc. IEEE World AI IoT Congr. (AIIoT) – start-page: 7008 year: 2017 end-page: 7024 ident: bib0025 article-title: Self-critical sequence training for image captioning publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) – volume: 8 start-page: 739 year: 2018 ident: bib0026 article-title: Captioning transformer with stacked attention modules publication-title: Appl. Sci. – year: 2024 ident: bib0013 article-title: From decoding to meta-generation: inference-time algorithms for large language models publication-title: Trans. Mach. Learn. Res. (TMLR) – start-page: 311 year: 2002 end-page: 318 ident: bib0046 article-title: BLEU: a method for automatic evaluation of machine translation publication-title: Proc. 40th Annu. Meeting Assoc. Comput. Linguist. (ACL) – start-page: 2187 year: 2021 end-page: 2196 ident: bib0007 article-title: Partial off-policy learning: balance accuracy and diversity for human-oriented image captioning publication-title: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV) – start-page: 2048 year: 2015 end-page: 2057 ident: bib0024 article-title: Show, attend and tell: neural image caption generation with visual attention publication-title: Proc. Int. Conf. Mach. Learn. (ICML), PMLR – start-page: 2639 year: 2019 end-page: 2649 ident: bib0017 article-title: Domain adaptive dialog generation via meta learning publication-title: Proc. 57th Annu. Meeting Assoc. Comput. Linguist. (ACL) – volume: 38 start-page: 39 year: 1995 end-page: 41 ident: bib0039 article-title: WordNet: a lexical database for English publication-title: Commun. ACM – start-page: 65 year: 2005 end-page: 72 ident: bib0048 article-title: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments publication-title: Proc. ACL Workshop Intrinsic Extrinsic Eval. Measures Mach. Transl. Summarization – volume: 45 start-page: 539 year: 2023 end-page: 559 ident: bib0022 article-title: From Show to Tell: a Survey on Deep Learning-based Image Captioning publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 35 start-page: 2286 year: 2021 end-page: 2293 ident: bib0028 article-title: Dual-level collaborative transformer for image captioning publication-title: Proc. AAAI Conf. Artif. Intell. – year: 2023 ident: bib0035 article-title: Contrastive Search Is What You Need For Neural Text Generation publication-title: Trans. Mach. Learn. Res. (TMLR) – volume: 30 start-page: 5998 year: 2017 end-page: 6008 ident: bib0019 article-title: Attention is all you need publication-title: Adv. Neural Inf. Process. Syst. (NeurIPS) – year: 2019 ident: bib0030 article-title: OpenAI Technical Report – volume: 47 start-page: 853 year: 2013 end-page: 899 ident: bib0056 article-title: Framing image description as a ranking task: data, models and evaluation metrics publication-title: J. Artif. Intell. Res. – volume: 115 start-page: 211 year: 2015 end-page: 252 ident: bib0002 article-title: ImageNet large scale visual recognition challenge publication-title: Int. J. Comput. Vis. – volume: 35 start-page: 9472 year: 2022 end-page: 9485 ident: bib0010 article-title: Learning distinct and representative modes for image captioning publication-title: Adv. Neural Inf. Process. Syst. (NeurIPS) – start-page: 133 year: 1994 end-page: 138 ident: bib0037 article-title: Verbs semantics and lexical selection publication-title: Proc. 32nd Annu. Meeting Assoc. Comput. Linguist. (ACL) – start-page: 25 year: 2004 end-page: 29 ident: bib0040 article-title: WordNet:similarity – measuring the relatedness of concepts publication-title: Proc. AAAI Conf. Artif. Intell. – start-page: 4566 year: 2015 end-page: 4575 ident: bib0049 article-title: CIDEr: consensus-based image description evaluation publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) – start-page: 4634 year: 2019 end-page: 4643 ident: bib0045 article-title: Attention on Attention for Image Captioning publication-title: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV) – volume: 14 start-page: 56 year: 2019 end-page: 68 ident: bib0016 article-title: Sequence to sequence model performance for education chatbot publication-title: Int. J. Emerg. Technol. Learn. – start-page: 4651 year: 2016 end-page: 4659 ident: bib0020 article-title: Image captioning with semantic attention publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) – start-page: 4195 year: 2019 end-page: 4203 ident: bib0012 article-title: Describing like humans: on diversity in image captioning publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) – start-page: 74 year: 2004 end-page: 81 ident: bib0047 article-title: ROUGE: a package for automatic evaluation of summaries publication-title: Proc. ACL Workshop Text Summarization Branches Out – volume: 74 start-page: 2273 year: 2015 end-page: 2287 ident: bib0004 article-title: Linked tag: image annotation using semantic relationships between image tags publication-title: Multimed. Tools Appl. – start-page: 2173 year: 2020 end-page: 2185 ident: bib0054 article-title: If beam search is the answer, what was the question? publication-title: Proc. Conf. Empirical Methods Nat. Lang. Process. (EMNLP) – start-page: 12466 year: 2019 end-page: 12475 ident: bib0057 article-title: Good news, everyone! Context driven entity-aware captioning for news images publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) – start-page: 3156 year: 2015 end-page: 3164 ident: bib0023 article-title: Show and tell: a neural image caption generator publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) – volume: 21 start-page: 1 year: 2020 end-page: 67 ident: bib0032 article-title: Exploring the limits of transfer learning with a unified text-to-text transformer publication-title: J. Mach. Learn. Res. – start-page: 8410 year: 2021 end-page: 8434 ident: bib0036 article-title: Machine translation decoding beyond beam search publication-title: Proc. Conf. Empirical Methods Nat. Lang. Process. (EMNLP) – start-page: 3464 year: 2022 end-page: 3478 ident: bib0044 article-title: Transparent human evaluation for image captioning publication-title: Proc. Conf. North Am. Chapter Assoc. Comput. Linguist. (NAACL) – start-page: 1151 year: 2019 end-page: 1161 ident: bib0015 article-title: Syntax-enhanced neural machine translation with syntax-aware word representations publication-title: Proc. Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. (NAACL-HLT) – start-page: 8801 year: 2024 end-page: 8816 ident: bib0053 article-title: Error analysis prompting enables human-like translation evaluation in large language models publication-title: Findings Assoc. Comput. Linguist. (ACL) – start-page: 10327 year: 2020 end-page: 10336 ident: bib0027 article-title: Normalized and geometry-aware self-attention network for image captioning publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) – start-page: 382 year: 2016 end-page: 398 ident: bib0050 article-title: SPICE: semantic propositional image caption evaluation publication-title: Proc. Eur. Conf. Comput. Vis. (ECCV) – start-page: 5892 year: 2023 end-page: 5907 ident: bib0052 article-title: Toward human-like evaluation for natural language generation with error analysis publication-title: Proc. 61st Annu. Meeting Assoc. Comput. Linguist. (ACL) – start-page: 1 year: 2015 end-page: 5 ident: bib0038 article-title: Automated patents search through semantic similarity publication-title: Proc. Int. Conf. Comput. Commun. Control (IC4) – year: 2019 ident: bib0051 article-title: Analysis of diversity-accuracy tradeoff in image captioning publication-title: Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCVW) – volume: 11 start-page: 826 year: 2021 ident: bib0003 article-title: Semantic scene graph generation using RDF model and deep learning publication-title: Appl. Sci. – volume: 2 start-page: 67 year: 2014 end-page: 78 ident: bib0055 article-title: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions publication-title: Trans. Assoc. Comput. Linguist. – start-page: 4634 year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0045 article-title: Attention on Attention for Image Captioning – volume: 21 start-page: 1 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0032 article-title: Exploring the limits of transfer learning with a unified text-to-text transformer publication-title: J. Mach. Learn. Res. – start-page: 5892 year: 2023 ident: 10.1016/j.knosys.2025.114400_bib0052 article-title: Toward human-like evaluation for natural language generation with error analysis – start-page: 133 year: 1994 ident: 10.1016/j.knosys.2025.114400_bib0037 article-title: Verbs semantics and lexical selection – volume: 11 start-page: 826 year: 2021 ident: 10.1016/j.knosys.2025.114400_bib0003 article-title: Semantic scene graph generation using RDF model and deep learning publication-title: Appl. Sci. doi: 10.3390/app11020826 – start-page: 681 year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0042 article-title: Implementation of semantic textual similarity between requirement specification and use case description using WUP method (case study: sipjabs application) – year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0051 article-title: Analysis of diversity-accuracy tradeoff in image captioning – start-page: 740 year: 2014 ident: 10.1016/j.knosys.2025.114400_bib0043 article-title: Common objects in context – start-page: 2639 year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0017 article-title: Domain adaptive dialog generation via meta learning – volume: 14 start-page: 56 year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0016 article-title: Sequence to sequence model performance for education chatbot publication-title: Int. J. Emerg. Technol. Learn. doi: 10.3991/ijet.v14i24.12187 – start-page: 2173 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0054 article-title: If beam search is the answer, what was the question? – volume: 2 start-page: 67 year: 2014 ident: 10.1016/j.knosys.2025.114400_bib0055 article-title: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions publication-title: Trans. Assoc. Comput. Linguist. doi: 10.1162/tacl_a_00166 – start-page: 4195 year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0012 article-title: Describing like humans: on diversity in image captioning – volume: 74 start-page: 2273 year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0004 article-title: Linked tag: image annotation using semantic relationships between image tags publication-title: Multimed. Tools Appl. doi: 10.1007/s11042-014-1855-z – volume: 35 start-page: 9472 year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0010 article-title: Learning distinct and representative modes for image captioning – year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0034 article-title: The curious case of neural text degeneration – volume: 115 start-page: 211 year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0002 article-title: ImageNet large scale visual recognition challenge publication-title: Int. J. Comput. Vis. doi: 10.1007/s11263-015-0816-y – start-page: 74 year: 2004 ident: 10.1016/j.knosys.2025.114400_bib0047 article-title: ROUGE: a package for automatic evaluation of summaries – start-page: 580 year: 2021 ident: 10.1016/j.knosys.2025.114400_bib0009 article-title: CliniQG4QA: generating diverse questions for domain adaptation of clinical question answering – start-page: 25 year: 2004 ident: 10.1016/j.knosys.2025.114400_bib0040 article-title: WordNet:similarity – measuring the relatedness of concepts – volume: 30 start-page: 5998 year: 2017 ident: 10.1016/j.knosys.2025.114400_bib0019 article-title: Attention is all you need – start-page: 10327 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0027 article-title: Normalized and geometry-aware self-attention network for image captioning – year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0006 article-title: IFDID: information Filter upon Diversity-Improved Decoding for Diversity-Faithfulness Tradeoff in NLG publication-title: arXiv preprint – start-page: 8410 year: 2021 ident: 10.1016/j.knosys.2025.114400_bib0036 article-title: Machine translation decoding beyond beam search – start-page: 2187 year: 2021 ident: 10.1016/j.knosys.2025.114400_bib0007 article-title: Partial off-policy learning: balance accuracy and diversity for human-oriented image captioning – start-page: 12466 year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0057 article-title: Good news, everyone! Context driven entity-aware captioning for news images – volume: 34 start-page: 571 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0021 article-title: Towards explanatory interactive image captioning using top-down and bottom-up features, beam search and re-ranking publication-title: Künstl. Intell. doi: 10.1007/s13218-020-00679-2 – volume: 44 start-page: 1035 year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0011 article-title: On diversity in image captioning: metrics and methods publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2020.3013834 – volume: 32 start-page: 7371 year: 2018 ident: 10.1016/j.knosys.2025.114400_bib0014 article-title: Diverse beam search for improved description of complex scenes – start-page: 7008 year: 2017 ident: 10.1016/j.knosys.2025.114400_bib0025 article-title: Self-critical sequence training for image captioning – start-page: 479 year: 2011 ident: 10.1016/j.knosys.2025.114400_bib0041 article-title: Ambiguity spotting using WordNet semantic similarity in support to recommended practice for software requirements specifications – start-page: 3464 year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0044 article-title: Transparent human evaluation for image captioning – start-page: 889 year: 2018 ident: 10.1016/j.knosys.2025.114400_bib0033 article-title: Hierarchical neural story generation – start-page: 65 year: 2005 ident: 10.1016/j.knosys.2025.114400_bib0048 article-title: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments – year: 2024 ident: 10.1016/j.knosys.2025.114400_bib0013 article-title: From decoding to meta-generation: inference-time algorithms for large language models publication-title: Trans. Mach. Learn. Res. (TMLR) – volume: 34 start-page: 8976 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0008 article-title: Generating diverse translation by manipulating multi-head attention – volume: 33 start-page: 1877 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0031 article-title: Language models are few-shot learners – start-page: 311 year: 2002 ident: 10.1016/j.knosys.2025.114400_bib0046 article-title: BLEU: a method for automatic evaluation of machine translation – start-page: 4566 year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0049 article-title: CIDEr: consensus-based image description evaluation – start-page: 2048 year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0024 article-title: Show, attend and tell: neural image caption generation with visual attention – volume: 38 start-page: 39 year: 1995 ident: 10.1016/j.knosys.2025.114400_bib0039 article-title: WordNet: a lexical database for English publication-title: Commun. ACM doi: 10.1145/219717.219748 – volume: 35 start-page: 2286 year: 2021 ident: 10.1016/j.knosys.2025.114400_bib0028 article-title: Dual-level collaborative transformer for image captioning – start-page: 3156 year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0023 article-title: Show and tell: a neural image caption generator – start-page: 1 year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0038 article-title: Automated patents search through semantic similarity – volume: 9 start-page: 1339 year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0005 article-title: Visuals to text: a comprehensive review on automatic image captioning publication-title: IEEE/CAA J. Autom. Sinica doi: 10.1109/JAS.2022.105734 – start-page: 382 year: 2016 ident: 10.1016/j.knosys.2025.114400_bib0050 article-title: SPICE: semantic propositional image caption evaluation – year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0029 article-title: SimVLM: simple visual language model pretraining with weak supervision – year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0030 – start-page: 325 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0001 article-title: Image captioning: a comprehensive survey – start-page: 1151 year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0015 article-title: Syntax-enhanced neural machine translation with syntax-aware word representations – year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0018 article-title: Neural machine translation by jointly learning to align and translate – start-page: 4651 year: 2016 ident: 10.1016/j.knosys.2025.114400_bib0020 article-title: Image captioning with semantic attention – volume: 8 start-page: 739 year: 2018 ident: 10.1016/j.knosys.2025.114400_bib0026 article-title: Captioning transformer with stacked attention modules publication-title: Appl. Sci. doi: 10.3390/app8050739 – start-page: 8801 year: 2024 ident: 10.1016/j.knosys.2025.114400_bib0053 article-title: Error analysis prompting enables human-like translation evaluation in large language models – volume: 45 start-page: 539 year: 2023 ident: 10.1016/j.knosys.2025.114400_bib0022 article-title: From Show to Tell: a Survey on Deep Learning-based Image Captioning publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2022.3148210 – year: 2023 ident: 10.1016/j.knosys.2025.114400_bib0035 article-title: Contrastive Search Is What You Need For Neural Text Generation – volume: 47 start-page: 853 year: 2013 ident: 10.1016/j.knosys.2025.114400_bib0056 article-title: Framing image description as a ranking task: data, models and evaluation metrics publication-title: J. Artif. Intell. Res. doi: 10.1613/jair.3994
SSID	ssj0002218
Score	2.4382029
Snippet	Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years....
SourceID	crossref elsevier
SourceType	Index Database Publisher
StartPage	114400
SubjectTerms	Beam search Decoding algorithm Image captioning Knowledge graph Semantic diversity
Title	Augmented decoding method using semantic diverse beam search for language generation model
URI	https://dx.doi.org/10.1016/j.knosys.2025.114400
Volume	329
WOSCitedRecordID	wos001572118500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 issn: 0950-7051 databaseCode: AIEXJ dateStart: 19950201 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: false ssIdentifier: ssj0002218 providerName: Elsevier
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9tAEF5R4NBLX7QqfWkPvVmL1o7tXR-jihaoFCFBpYiLtd5dQ0BZEIkr-PedfdmRUlUUiYuTWPHanvky83kyD4S-SnC6WtiBJmmekVy2lDRFIUnaAvkoq5HgWrlhE2wy4dNpdRyS2BdunAAzht_dVTdPqmrYB8q2pbP_oe5-UdgB70HpsAW1w_ZBih93567RpkoUPFq6mhU_JjrpXFxgoecgzZlMlEvJ0EmjxTwJ0Q-bdBhDmHa6sg4AcQNzVonszxiLI9YPqtARuifoE8dJD-7BlJx0Pf6OOuNdnSY_Irm3YLvwBSIzc3E_fPlw7gm-OSewztVqeCIrXJ3eEJ5cr5sJwUdKGA2tZoMdHvnQx5pN9-GFy70rcw33smdPYjsc55QOPqzPLDyxS9uVM9e5sJo-Q1sZKyqw2Vvjw_3pUe-ms8wFf_tLiXWVLvlv_Vx_5y0rXOT0FXoRHiLw2Cv_NdrQ5g16GQd04GCvd9BZjwUcsYA9FrDDAo5YwAEL2GIBeyxgwAKOWMADFrDDwlv06_v-6bcDEoZpEAlWeUl0qmjTZJVqWqWY_U8tFynjksKvWAs6EkD9FW3bNhN5Kcqy4kI38EmJPKVapKN3aNNcG_0eYVk0bSVVXjDGc3jhDS_BUxSCi8KWhO0iEkVV3_ieKXVMJrysvWhrK9rai3YXsSjPOvA-z-dqgMA_j_zw6CM_oucDWj-hzeVtpz-jbfl7OVvcfglY-QOZ3YI1
linkProvider	Elsevier
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Augmented+decoding+method+using+semantic+diverse+beam+search+for+language+generation+model&rft.jtitle=Knowledge-based+systems&rft.au=Na%2C+HyungSun&rft.au=Jun%2C+Hee-Gook&rft.au=Ahn%2C+Jinhyun&rft.au=Im%2C+Dong-Hyuk&rft.date=2025-11-04&rft.pub=Elsevier+B.V&rft.issn=0950-7051&rft.volume=329&rft_id=info:doi/10.1016%2Fj.knosys.2025.114400&rft.externalDocID=S095070512501439X
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0950-7051&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0950-7051&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0950-7051&client=summon