Augmented decoding method using semantic diverse beam search for language generation model
Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years. However, current approaches face a critical limitation in semantic diversity. Most diversity-oriented methods evaluate similarity at the surface...
Saved in:
| Published in: | Knowledge-based systems Vol. 329; p. 114400 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier B.V
04.11.2025
|
| Subjects: | |
| ISSN: | 0950-7051 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years. However, current approaches face a critical limitation in semantic diversity. Most diversity-oriented methods evaluate similarity at the surface lexical level, incorrectly treating lexically different but semantically equivalent phrases (e.g., 'dog runs' vs 'canine sprints') as meaningfully diverse outputs. This superficial approach fails to capture true semantic variation. Consequently, generated captions appear different but convey essentially identical meanings. To address this fundamental limitation, we propose Semantic Diverse Beam Search (SDBS), an augmented decoding algorithm that operates in semantic space rather than surface lexical space. SDBS integrates four key innovations: knowledge graph-based semantic similarity scoring, adaptive thresholding for important word focus, statistics-based stratified top-k sampling, and beam size normalization. Additionally, we introduce an early-stop strategy that significantly reduces computational complexity while maintaining generation quality, making SDBS practically viable for real-world applications. Comprehensive experiments demonstrate that SDBS achieves superior performance on both traditional metrics and modern evaluation approaches (BARTScore++, LLM-based assessment), generating captions with genuine semantic diversity while maintaining high accuracy and computational efficiency. |
|---|---|
| AbstractList | Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years. However, current approaches face a critical limitation in semantic diversity. Most diversity-oriented methods evaluate similarity at the surface lexical level, incorrectly treating lexically different but semantically equivalent phrases (e.g., 'dog runs' vs 'canine sprints') as meaningfully diverse outputs. This superficial approach fails to capture true semantic variation. Consequently, generated captions appear different but convey essentially identical meanings. To address this fundamental limitation, we propose Semantic Diverse Beam Search (SDBS), an augmented decoding algorithm that operates in semantic space rather than surface lexical space. SDBS integrates four key innovations: knowledge graph-based semantic similarity scoring, adaptive thresholding for important word focus, statistics-based stratified top-k sampling, and beam size normalization. Additionally, we introduce an early-stop strategy that significantly reduces computational complexity while maintaining generation quality, making SDBS practically viable for real-world applications. Comprehensive experiments demonstrate that SDBS achieves superior performance on both traditional metrics and modern evaluation approaches (BARTScore++, LLM-based assessment), generating captions with genuine semantic diversity while maintaining high accuracy and computational efficiency. |
| ArticleNumber | 114400 |
| Author | Jun, Hee-Gook Im, Dong-Hyuk Ahn, Jinhyun Na, HyungSun |
| Author_xml | – sequence: 1 givenname: HyungSun orcidid: 0000-0002-3941-3959 surname: Na fullname: Na, HyungSun email: nayosk@kw.ac.kr organization: Dept. of Artificial Intelligence Convergence, Kwangwoon University, Seoul, Republic of Korea – sequence: 2 givenname: Hee-Gook orcidid: 0000-0002-3122-1696 surname: Jun fullname: Jun, Hee-Gook email: heegook@finda.co.kr organization: Finda Inc., Seoul, Republic of Korea – sequence: 3 givenname: Jinhyun orcidid: 0000-0002-2331-004X surname: Ahn fullname: Ahn, Jinhyun email: jha@jejunu.ac.kr organization: Dept. of Management Information Systems, Jeju National University, Jeju, Republic of Korea – sequence: 4 givenname: Dong-Hyuk orcidid: 0000-0002-0290-755X surname: Im fullname: Im, Dong-Hyuk email: dhim@kw.ac.kr organization: School of Information Convergence, Kwangwoon University, Seoul, Republic of Korea |
| BookMark | eNp9kM1qwzAQhHVIoUnaN-hBL2B3pch2cimE0J9AoJf20otYS2tHaSwVyQnk7evgnnsaZtgZlm_GJj54YuxBQC5AlI-H_NuHdEm5BFnkQigFMGFTWBWQVVCIWzZL6QAAUorllH2tT21HvifLLZlgnW95R_0-WH5KV5OoQ987w607U0zEa8JuSDGaPW9C5Ef07Qlb4i15iti74HkXLB3v2E2Dx0T3fzpnny_PH5u3bPf-ut2sd5mRRdVnJCzUtVzZurG2kgqkQlEtDVBNhLBAUQ4XTdNIVCWW5WqJVA_OohJAKBZzpsZdE0NKkRr9E12H8aIF6CsTfdAjE31lokcmQ-1prNHw29lR1Mk48oasi2R6bYP7f-AXK-NzRg |
| Cites_doi | 10.3390/app11020826 10.3991/ijet.v14i24.12187 10.1162/tacl_a_00166 10.1007/s11042-014-1855-z 10.1007/s11263-015-0816-y 10.1007/s13218-020-00679-2 10.1109/TPAMI.2020.3013834 10.1145/219717.219748 10.1109/JAS.2022.105734 10.3390/app8050739 10.1109/TPAMI.2022.3148210 10.1613/jair.3994 |
| ContentType | Journal Article |
| Copyright | 2025 Elsevier B.V. |
| Copyright_xml | – notice: 2025 Elsevier B.V. |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.knosys.2025.114400 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| ExternalDocumentID | 10_1016_j_knosys_2025_114400 S095070512501439X |
| GroupedDBID | --K --M .DC .~1 0R~ 1B1 1~. 1~5 4.4 457 4G. 5VS 7-5 71M 77I 77K 8P~ 9JN AAEDT AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AATTM AAXKI AAXUO AAYFN AAYWO ABAOU ABBOA ABIVO ABJNI ABMAC ACDAQ ACGFS ACLOT ACRLP ACVFH ACZNC ADBBV ADCNI ADEZE ADGUI ADTZH AEBSH AECPX AEIPS AEKER AENEX AEUPX AFJKZ AFPUW AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIGII AIIUN AIKHN AITUG AKBMS AKRWK AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ ANKPU AOUOD APXCP ARUGR AXJTR BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFKBS EFLBG EO8 EO9 EP2 EP3 FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ IHE J1W JJJVA KOM LG9 LY7 M41 MHUIS MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 ROL RPZ SDF SDG SDP SES SEW SPC SPCBC SST SSV SSW SSZ T5K WH7 XPP ZMT ~02 ~G- ~HD 29L 9DU AAQXK AAYXX ABDPE ABWVN ABXDB ACNNM ACRPL ADJOM ADMUD ADNMO AGQPQ ASPBG AVWKF AZFZN CITATION EJD FEDTE FGOYB G-2 HLZ HVGLF HZ~ R2- SBC SET UHS WUQ |
| ID | FETCH-LOGICAL-c257t-e1d0bb29dbfdd724024a178c0ebeea03a161d0fff2a46a6698aebff2da410ea13 |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001572118500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0950-7051 |
| IngestDate | Thu Nov 27 01:00:43 EST 2025 Wed Dec 10 14:25:30 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Decoding algorithm Image captioning Semantic diversity Beam search Knowledge graph |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c257t-e1d0bb29dbfdd724024a178c0ebeea03a161d0fff2a46a6698aebff2da410ea13 |
| ORCID | 0000-0002-2331-004X 0000-0002-0290-755X 0000-0002-3941-3959 0000-0002-3122-1696 |
| ParticipantIDs | crossref_primary_10_1016_j_knosys_2025_114400 elsevier_sciencedirect_doi_10_1016_j_knosys_2025_114400 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-11-04 |
| PublicationDateYYYYMMDD | 2025-11-04 |
| PublicationDate_xml | – month: 11 year: 2025 text: 2025-11-04 day: 04 |
| PublicationDecade | 2020 |
| PublicationTitle | Knowledge-based systems |
| PublicationYear | 2025 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Welleck, Bertsch, Finlayson, Schoelkopf, Xie, Neubig, Kulikov, Harchaoui (bib0013) 2024 Rennie, Marcheret, Mroueh, Ross, Goel (bib0025) 2017 Wang, Chan (bib0012) 2019 Kasai, Sakaguchi, Dunagan, Morrison, Bras, Choi, Smith (bib0044) 2022 Chen, Deng, Wu (bib0010) 2022; 35 Zhang, Li, Fu, Zhang (bib0015) 2019 Sharma, Tripathi, Singh, Tripathi (bib0038) 2015 Im, Park (bib0004) 2015; 74 Shi, Li, Wang (bib0007) 2021 Miller (bib0039) 1995; 38 Russakovsky, Deng, Su, Krause, Satheesh, Ma, Huang, Karpathy, Khosla, Bernstein, Berg, Fei-Fei (bib0002) 2015; 115 Ming, Hu, Fan, Feng, Zhou, Yu (bib0005) 2022; 9 Vinyals, Toshev, Bengio, Erhan (bib0023) 2015 You, Jin, Wang, Fang, Luo (bib0020) 2016 Bahdanau, Cho, Bengio (bib0018) 2015 Sharma, Agrahari, Singh, Firoj, Mishra (bib0001) 2020 Hodosh, Young, Hockenmaier (bib0056) 2013; 47 Wu, Palmer (bib0037) 1994 Guo, Liu, Zhu, Yao, Lu, Lu (bib0027) 2020 Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell, Agarwal, Herbert-Voss, Krueger, Henighan, Child, Ramesh, Ziegler, Wu, Winter, Hesse, Chen, Sigler, Litwin, Gray, Chess, Clark, Berner, McCandlish, Radford, Sutskever, Amodei (bib0031) 2020; 33 Sun, Huang, Wei, Dai, Chen (bib0008) 2020; 34 Xu, Ba, Kiros, Cho, Courville, Salakhutdinov, Zemel, Bengio (bib0024) 2015 Leblond, Alayrac, Sifre, Pislar, Lespiau, Antonoglou, Simonyan, Vinyals (bib0036) 2021 Luo, Ji, Sun, Cao, Wu, Huang, Lin, Ji (bib0028) 2021; 35 Holtzman, Buys, Du, Forbes, Choi (bib0034) 2020 Qian, Yu (bib0017) 2019 Palasundram, Sharef, Nasharuddin, Kasmiran, Azman (bib0016) 2019; 14 Lin (bib0047) 2004 Wang, Wan, Chan (bib0011) 2022; 44 Banerjee, Lavie (bib0048) 2005 Raffel, Shazeer, Roberts, Lee, Narang, Matena, Zhou, Li, Liu (bib0032) 2020; 21 Meister, Vieira, Cotterell (bib0054) 2020 Anderson, Fernando, Johnson, Gould (bib0050) 2016 Biten, Gomez, Rusinol, Karatzas (bib0057) 2019 Biswas, Barz, Sonntag (bib0021) 2020; 34 Stefanini, Cornia, Baraldi, Cascianelli, Fiameni, Cucchiara (bib0022) 2023; 45 Vedantam, Zitnick, Parikh (bib0049) 2015 Young, Lai, Hodosh, Hockenmaier (bib0055) 2014; 2 Zhu, Li, Liu, Peng, Niu (bib0026) 2018; 8 Radford, Wu, Child, Luan, Amodei, Sutskever (bib0030) 2019 Meng, He, Chen, Zhou (bib0006) 2022 Pedersen, Patwardhan, Michelizzi (bib0040) 2004 Papineni, Roukos, Ward, Zhu (bib0046) 2002 Sari, Priyadi, Riskiana (bib0042) 2022 Lin, Maire, Belongie, Hays, Perona, Ramanan, Dollár, Zitnick, Microsoft (bib0043) 2014 Vijayakumar, Cogswell, Selvaraju, Sun, Lee, Crandall, Batra (bib0014) 2018; 32 Wang, Yu, Yu, Dai, Tsvetkov, Cao (bib0029) 2022 Huang, Wang, Chen, Wei (bib0045) 2019 Lu, Qiu, Ding, Zhang, Kocmi, Tao (bib0053) 2024 Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin (bib0019) 2017; 30 Su, Collier (bib0035) 2023 Matsuoka, Lepage (bib0041) 2011 Yue, Zhang, Yao, Lin, Sun (bib0009) 2021 Kim, Jeon, Rhiu, Ahn, Im (bib0003) 2021; 11 Fan, Lewis, Dauphin (bib0033) 2018 Luo, Shakhnarovich (bib0051) 2019 Lu, Ding, Xie, Zhang, Wong, Tao (bib0052) 2023 Raffel (10.1016/j.knosys.2025.114400_bib0032) 2020; 21 Biten (10.1016/j.knosys.2025.114400_bib0057) 2019 Luo (10.1016/j.knosys.2025.114400_bib0051) 2019 Su (10.1016/j.knosys.2025.114400_bib0035) 2023 Young (10.1016/j.knosys.2025.114400_bib0055) 2014; 2 Im (10.1016/j.knosys.2025.114400_bib0004) 2015; 74 Luo (10.1016/j.knosys.2025.114400_bib0028) 2021; 35 Huang (10.1016/j.knosys.2025.114400_bib0045) 2019 Welleck (10.1016/j.knosys.2025.114400_bib0013) 2024 Kim (10.1016/j.knosys.2025.114400_bib0003) 2021; 11 Russakovsky (10.1016/j.knosys.2025.114400_bib0002) 2015; 115 You (10.1016/j.knosys.2025.114400_bib0020) 2016 Meister (10.1016/j.knosys.2025.114400_bib0054) 2020 Lin (10.1016/j.knosys.2025.114400_bib0043) 2014 Brown (10.1016/j.knosys.2025.114400_bib0031) 2020; 33 Anderson (10.1016/j.knosys.2025.114400_bib0050) 2016 Stefanini (10.1016/j.knosys.2025.114400_bib0022) 2023; 45 Miller (10.1016/j.knosys.2025.114400_bib0039) 1995; 38 Ming (10.1016/j.knosys.2025.114400_bib0005) 2022; 9 Sari (10.1016/j.knosys.2025.114400_bib0042) 2022 Vedantam (10.1016/j.knosys.2025.114400_bib0049) 2015 Yue (10.1016/j.knosys.2025.114400_bib0009) 2021 Vijayakumar (10.1016/j.knosys.2025.114400_bib0014) 2018; 32 Holtzman (10.1016/j.knosys.2025.114400_bib0034) 2020 Sharma (10.1016/j.knosys.2025.114400_bib0001) 2020 Wang (10.1016/j.knosys.2025.114400_bib0011) 2022; 44 Hodosh (10.1016/j.knosys.2025.114400_bib0056) 2013; 47 Leblond (10.1016/j.knosys.2025.114400_bib0036) 2021 Vinyals (10.1016/j.knosys.2025.114400_bib0023) 2015 Palasundram (10.1016/j.knosys.2025.114400_bib0016) 2019; 14 Meng (10.1016/j.knosys.2025.114400_bib0006) 2022 Pedersen (10.1016/j.knosys.2025.114400_bib0040) 2004 Vaswani (10.1016/j.knosys.2025.114400_bib0019) 2017; 30 Sun (10.1016/j.knosys.2025.114400_bib0008) 2020; 34 Lu (10.1016/j.knosys.2025.114400_bib0053) 2024 Rennie (10.1016/j.knosys.2025.114400_bib0025) 2017 Bahdanau (10.1016/j.knosys.2025.114400_bib0018) 2015 Sharma (10.1016/j.knosys.2025.114400_bib0038) 2015 Qian (10.1016/j.knosys.2025.114400_bib0017) 2019 Kasai (10.1016/j.knosys.2025.114400_bib0044) 2022 Lin (10.1016/j.knosys.2025.114400_bib0047) 2004 Xu (10.1016/j.knosys.2025.114400_bib0024) 2015 Matsuoka (10.1016/j.knosys.2025.114400_bib0041) 2011 Radford (10.1016/j.knosys.2025.114400_bib0030) 2019 Wu (10.1016/j.knosys.2025.114400_bib0037) 1994 Guo (10.1016/j.knosys.2025.114400_bib0027) 2020 Lu (10.1016/j.knosys.2025.114400_bib0052) 2023 Banerjee (10.1016/j.knosys.2025.114400_bib0048) 2005 Wang (10.1016/j.knosys.2025.114400_bib0029) 2022 Fan (10.1016/j.knosys.2025.114400_bib0033) 2018 Biswas (10.1016/j.knosys.2025.114400_bib0021) 2020; 34 Chen (10.1016/j.knosys.2025.114400_bib0010) 2022; 35 Zhu (10.1016/j.knosys.2025.114400_bib0026) 2018; 8 Zhang (10.1016/j.knosys.2025.114400_bib0015) 2019 Shi (10.1016/j.knosys.2025.114400_bib0007) 2021 Papineni (10.1016/j.knosys.2025.114400_bib0046) 2002 Wang (10.1016/j.knosys.2025.114400_bib0012) 2019 |
| References_xml | – volume: 32 start-page: 7371 year: 2018 end-page: 7379 ident: bib0014 article-title: Diverse beam search for improved description of complex scenes publication-title: Proc. AAAI Conf. Artif. Intell. – start-page: 479 year: 2011 end-page: 484 ident: bib0041 article-title: Ambiguity spotting using WordNet semantic similarity in support to recommended practice for software requirements specifications publication-title: Proc. Int. Conf. Natural Lang. Process. Knowl. Eng. (NLPKE) – volume: 9 start-page: 1339 year: 2022 end-page: 1365 ident: bib0005 article-title: Visuals to text: a comprehensive review on automatic image captioning publication-title: IEEE/CAA J. Autom. Sinica – volume: 33 start-page: 1877 year: 2020 end-page: 1901 ident: bib0031 article-title: Language models are few-shot learners publication-title: Adv. Neural Inf. Process. Syst. (NeurIPS) – year: 2020 ident: bib0034 article-title: The curious case of neural text degeneration publication-title: Proc. Int. Conf. Learn. Represent. (ICLR) – year: 2022 ident: bib0006 article-title: IFDID: information Filter upon Diversity-Improved Decoding for Diversity-Faithfulness Tradeoff in NLG publication-title: arXiv preprint – year: 2015 ident: bib0018 article-title: Neural machine translation by jointly learning to align and translate publication-title: Proc. Int. Conf. Learn. Represent. (ICLR) – start-page: 740 year: 2014 end-page: 755 ident: bib0043 article-title: Common objects in context publication-title: Proc. Eur. Conf. Comput. Vis. (ECCV) – volume: 34 start-page: 8976 year: 2020 end-page: 8983 ident: bib0008 article-title: Generating diverse translation by manipulating multi-head attention publication-title: Proc. AAAI Conf. Artif. Intell. – start-page: 580 year: 2021 end-page: 587 ident: bib0009 article-title: CliniQG4QA: generating diverse questions for domain adaptation of clinical question answering publication-title: Proc. IEEE Int. Conf. Bioinf. Biomed. (BIBM) – start-page: 325 year: 2020 end-page: 328 ident: bib0001 article-title: Image captioning: a comprehensive survey publication-title: Proc. Int. Conf. Power Electron. IoT Appl. Renew. Energy Control (PARC) – year: 2022 ident: bib0029 article-title: SimVLM: simple visual language model pretraining with weak supervision publication-title: Proc. Int. Conf. Learn. Represent. (ICLR) – start-page: 889 year: 2018 end-page: 898 ident: bib0033 article-title: Hierarchical neural story generation publication-title: Proc. 56th Annu. Meeting Assoc. Comput. Linguist. (ACL) – volume: 44 start-page: 1035 year: 2022 end-page: 1049 ident: bib0011 article-title: On diversity in image captioning: metrics and methods publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 34 start-page: 571 year: 2020 end-page: 584 ident: bib0021 article-title: Towards explanatory interactive image captioning using top-down and bottom-up features, beam search and re-ranking publication-title: Künstl. Intell. – start-page: 681 year: 2022 end-page: 687 ident: bib0042 article-title: Implementation of semantic textual similarity between requirement specification and use case description using WUP method (case study: sipjabs application) publication-title: Proc. IEEE World AI IoT Congr. (AIIoT) – start-page: 7008 year: 2017 end-page: 7024 ident: bib0025 article-title: Self-critical sequence training for image captioning publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) – volume: 8 start-page: 739 year: 2018 ident: bib0026 article-title: Captioning transformer with stacked attention modules publication-title: Appl. Sci. – year: 2024 ident: bib0013 article-title: From decoding to meta-generation: inference-time algorithms for large language models publication-title: Trans. Mach. Learn. Res. (TMLR) – start-page: 311 year: 2002 end-page: 318 ident: bib0046 article-title: BLEU: a method for automatic evaluation of machine translation publication-title: Proc. 40th Annu. Meeting Assoc. Comput. Linguist. (ACL) – start-page: 2187 year: 2021 end-page: 2196 ident: bib0007 article-title: Partial off-policy learning: balance accuracy and diversity for human-oriented image captioning publication-title: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV) – start-page: 2048 year: 2015 end-page: 2057 ident: bib0024 article-title: Show, attend and tell: neural image caption generation with visual attention publication-title: Proc. Int. Conf. Mach. Learn. (ICML), PMLR – start-page: 2639 year: 2019 end-page: 2649 ident: bib0017 article-title: Domain adaptive dialog generation via meta learning publication-title: Proc. 57th Annu. Meeting Assoc. Comput. Linguist. (ACL) – volume: 38 start-page: 39 year: 1995 end-page: 41 ident: bib0039 article-title: WordNet: a lexical database for English publication-title: Commun. ACM – start-page: 65 year: 2005 end-page: 72 ident: bib0048 article-title: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments publication-title: Proc. ACL Workshop Intrinsic Extrinsic Eval. Measures Mach. Transl. Summarization – volume: 45 start-page: 539 year: 2023 end-page: 559 ident: bib0022 article-title: From Show to Tell: a Survey on Deep Learning-based Image Captioning publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – volume: 35 start-page: 2286 year: 2021 end-page: 2293 ident: bib0028 article-title: Dual-level collaborative transformer for image captioning publication-title: Proc. AAAI Conf. Artif. Intell. – year: 2023 ident: bib0035 article-title: Contrastive Search Is What You Need For Neural Text Generation publication-title: Trans. Mach. Learn. Res. (TMLR) – volume: 30 start-page: 5998 year: 2017 end-page: 6008 ident: bib0019 article-title: Attention is all you need publication-title: Adv. Neural Inf. Process. Syst. (NeurIPS) – year: 2019 ident: bib0030 article-title: OpenAI Technical Report – volume: 47 start-page: 853 year: 2013 end-page: 899 ident: bib0056 article-title: Framing image description as a ranking task: data, models and evaluation metrics publication-title: J. Artif. Intell. Res. – volume: 115 start-page: 211 year: 2015 end-page: 252 ident: bib0002 article-title: ImageNet large scale visual recognition challenge publication-title: Int. J. Comput. Vis. – volume: 35 start-page: 9472 year: 2022 end-page: 9485 ident: bib0010 article-title: Learning distinct and representative modes for image captioning publication-title: Adv. Neural Inf. Process. Syst. (NeurIPS) – start-page: 133 year: 1994 end-page: 138 ident: bib0037 article-title: Verbs semantics and lexical selection publication-title: Proc. 32nd Annu. Meeting Assoc. Comput. Linguist. (ACL) – start-page: 25 year: 2004 end-page: 29 ident: bib0040 article-title: WordNet:similarity – measuring the relatedness of concepts publication-title: Proc. AAAI Conf. Artif. Intell. – start-page: 4566 year: 2015 end-page: 4575 ident: bib0049 article-title: CIDEr: consensus-based image description evaluation publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) – start-page: 4634 year: 2019 end-page: 4643 ident: bib0045 article-title: Attention on Attention for Image Captioning publication-title: Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV) – volume: 14 start-page: 56 year: 2019 end-page: 68 ident: bib0016 article-title: Sequence to sequence model performance for education chatbot publication-title: Int. J. Emerg. Technol. Learn. – start-page: 4651 year: 2016 end-page: 4659 ident: bib0020 article-title: Image captioning with semantic attention publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) – start-page: 4195 year: 2019 end-page: 4203 ident: bib0012 article-title: Describing like humans: on diversity in image captioning publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) – start-page: 74 year: 2004 end-page: 81 ident: bib0047 article-title: ROUGE: a package for automatic evaluation of summaries publication-title: Proc. ACL Workshop Text Summarization Branches Out – volume: 74 start-page: 2273 year: 2015 end-page: 2287 ident: bib0004 article-title: Linked tag: image annotation using semantic relationships between image tags publication-title: Multimed. Tools Appl. – start-page: 2173 year: 2020 end-page: 2185 ident: bib0054 article-title: If beam search is the answer, what was the question? publication-title: Proc. Conf. Empirical Methods Nat. Lang. Process. (EMNLP) – start-page: 12466 year: 2019 end-page: 12475 ident: bib0057 article-title: Good news, everyone! Context driven entity-aware captioning for news images publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) – start-page: 3156 year: 2015 end-page: 3164 ident: bib0023 article-title: Show and tell: a neural image caption generator publication-title: Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) – volume: 21 start-page: 1 year: 2020 end-page: 67 ident: bib0032 article-title: Exploring the limits of transfer learning with a unified text-to-text transformer publication-title: J. Mach. Learn. Res. – start-page: 8410 year: 2021 end-page: 8434 ident: bib0036 article-title: Machine translation decoding beyond beam search publication-title: Proc. Conf. Empirical Methods Nat. Lang. Process. (EMNLP) – start-page: 3464 year: 2022 end-page: 3478 ident: bib0044 article-title: Transparent human evaluation for image captioning publication-title: Proc. Conf. North Am. Chapter Assoc. Comput. Linguist. (NAACL) – start-page: 1151 year: 2019 end-page: 1161 ident: bib0015 article-title: Syntax-enhanced neural machine translation with syntax-aware word representations publication-title: Proc. Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. (NAACL-HLT) – start-page: 8801 year: 2024 end-page: 8816 ident: bib0053 article-title: Error analysis prompting enables human-like translation evaluation in large language models publication-title: Findings Assoc. Comput. Linguist. (ACL) – start-page: 10327 year: 2020 end-page: 10336 ident: bib0027 article-title: Normalized and geometry-aware self-attention network for image captioning publication-title: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) – start-page: 382 year: 2016 end-page: 398 ident: bib0050 article-title: SPICE: semantic propositional image caption evaluation publication-title: Proc. Eur. Conf. Comput. Vis. (ECCV) – start-page: 5892 year: 2023 end-page: 5907 ident: bib0052 article-title: Toward human-like evaluation for natural language generation with error analysis publication-title: Proc. 61st Annu. Meeting Assoc. Comput. Linguist. (ACL) – start-page: 1 year: 2015 end-page: 5 ident: bib0038 article-title: Automated patents search through semantic similarity publication-title: Proc. Int. Conf. Comput. Commun. Control (IC4) – year: 2019 ident: bib0051 article-title: Analysis of diversity-accuracy tradeoff in image captioning publication-title: Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCVW) – volume: 11 start-page: 826 year: 2021 ident: bib0003 article-title: Semantic scene graph generation using RDF model and deep learning publication-title: Appl. Sci. – volume: 2 start-page: 67 year: 2014 end-page: 78 ident: bib0055 article-title: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions publication-title: Trans. Assoc. Comput. Linguist. – start-page: 4634 year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0045 article-title: Attention on Attention for Image Captioning – volume: 21 start-page: 1 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0032 article-title: Exploring the limits of transfer learning with a unified text-to-text transformer publication-title: J. Mach. Learn. Res. – start-page: 5892 year: 2023 ident: 10.1016/j.knosys.2025.114400_bib0052 article-title: Toward human-like evaluation for natural language generation with error analysis – start-page: 133 year: 1994 ident: 10.1016/j.knosys.2025.114400_bib0037 article-title: Verbs semantics and lexical selection – volume: 11 start-page: 826 year: 2021 ident: 10.1016/j.knosys.2025.114400_bib0003 article-title: Semantic scene graph generation using RDF model and deep learning publication-title: Appl. Sci. doi: 10.3390/app11020826 – start-page: 681 year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0042 article-title: Implementation of semantic textual similarity between requirement specification and use case description using WUP method (case study: sipjabs application) – year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0051 article-title: Analysis of diversity-accuracy tradeoff in image captioning – start-page: 740 year: 2014 ident: 10.1016/j.knosys.2025.114400_bib0043 article-title: Common objects in context – start-page: 2639 year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0017 article-title: Domain adaptive dialog generation via meta learning – volume: 14 start-page: 56 year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0016 article-title: Sequence to sequence model performance for education chatbot publication-title: Int. J. Emerg. Technol. Learn. doi: 10.3991/ijet.v14i24.12187 – start-page: 2173 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0054 article-title: If beam search is the answer, what was the question? – volume: 2 start-page: 67 year: 2014 ident: 10.1016/j.knosys.2025.114400_bib0055 article-title: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions publication-title: Trans. Assoc. Comput. Linguist. doi: 10.1162/tacl_a_00166 – start-page: 4195 year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0012 article-title: Describing like humans: on diversity in image captioning – volume: 74 start-page: 2273 year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0004 article-title: Linked tag: image annotation using semantic relationships between image tags publication-title: Multimed. Tools Appl. doi: 10.1007/s11042-014-1855-z – volume: 35 start-page: 9472 year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0010 article-title: Learning distinct and representative modes for image captioning – year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0034 article-title: The curious case of neural text degeneration – volume: 115 start-page: 211 year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0002 article-title: ImageNet large scale visual recognition challenge publication-title: Int. J. Comput. Vis. doi: 10.1007/s11263-015-0816-y – start-page: 74 year: 2004 ident: 10.1016/j.knosys.2025.114400_bib0047 article-title: ROUGE: a package for automatic evaluation of summaries – start-page: 580 year: 2021 ident: 10.1016/j.knosys.2025.114400_bib0009 article-title: CliniQG4QA: generating diverse questions for domain adaptation of clinical question answering – start-page: 25 year: 2004 ident: 10.1016/j.knosys.2025.114400_bib0040 article-title: WordNet:similarity – measuring the relatedness of concepts – volume: 30 start-page: 5998 year: 2017 ident: 10.1016/j.knosys.2025.114400_bib0019 article-title: Attention is all you need – start-page: 10327 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0027 article-title: Normalized and geometry-aware self-attention network for image captioning – year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0006 article-title: IFDID: information Filter upon Diversity-Improved Decoding for Diversity-Faithfulness Tradeoff in NLG publication-title: arXiv preprint – start-page: 8410 year: 2021 ident: 10.1016/j.knosys.2025.114400_bib0036 article-title: Machine translation decoding beyond beam search – start-page: 2187 year: 2021 ident: 10.1016/j.knosys.2025.114400_bib0007 article-title: Partial off-policy learning: balance accuracy and diversity for human-oriented image captioning – start-page: 12466 year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0057 article-title: Good news, everyone! Context driven entity-aware captioning for news images – volume: 34 start-page: 571 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0021 article-title: Towards explanatory interactive image captioning using top-down and bottom-up features, beam search and re-ranking publication-title: Künstl. Intell. doi: 10.1007/s13218-020-00679-2 – volume: 44 start-page: 1035 year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0011 article-title: On diversity in image captioning: metrics and methods publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2020.3013834 – volume: 32 start-page: 7371 year: 2018 ident: 10.1016/j.knosys.2025.114400_bib0014 article-title: Diverse beam search for improved description of complex scenes – start-page: 7008 year: 2017 ident: 10.1016/j.knosys.2025.114400_bib0025 article-title: Self-critical sequence training for image captioning – start-page: 479 year: 2011 ident: 10.1016/j.knosys.2025.114400_bib0041 article-title: Ambiguity spotting using WordNet semantic similarity in support to recommended practice for software requirements specifications – start-page: 3464 year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0044 article-title: Transparent human evaluation for image captioning – start-page: 889 year: 2018 ident: 10.1016/j.knosys.2025.114400_bib0033 article-title: Hierarchical neural story generation – start-page: 65 year: 2005 ident: 10.1016/j.knosys.2025.114400_bib0048 article-title: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments – year: 2024 ident: 10.1016/j.knosys.2025.114400_bib0013 article-title: From decoding to meta-generation: inference-time algorithms for large language models publication-title: Trans. Mach. Learn. Res. (TMLR) – volume: 34 start-page: 8976 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0008 article-title: Generating diverse translation by manipulating multi-head attention – volume: 33 start-page: 1877 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0031 article-title: Language models are few-shot learners – start-page: 311 year: 2002 ident: 10.1016/j.knosys.2025.114400_bib0046 article-title: BLEU: a method for automatic evaluation of machine translation – start-page: 4566 year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0049 article-title: CIDEr: consensus-based image description evaluation – start-page: 2048 year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0024 article-title: Show, attend and tell: neural image caption generation with visual attention – volume: 38 start-page: 39 year: 1995 ident: 10.1016/j.knosys.2025.114400_bib0039 article-title: WordNet: a lexical database for English publication-title: Commun. ACM doi: 10.1145/219717.219748 – volume: 35 start-page: 2286 year: 2021 ident: 10.1016/j.knosys.2025.114400_bib0028 article-title: Dual-level collaborative transformer for image captioning – start-page: 3156 year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0023 article-title: Show and tell: a neural image caption generator – start-page: 1 year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0038 article-title: Automated patents search through semantic similarity – volume: 9 start-page: 1339 year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0005 article-title: Visuals to text: a comprehensive review on automatic image captioning publication-title: IEEE/CAA J. Autom. Sinica doi: 10.1109/JAS.2022.105734 – start-page: 382 year: 2016 ident: 10.1016/j.knosys.2025.114400_bib0050 article-title: SPICE: semantic propositional image caption evaluation – year: 2022 ident: 10.1016/j.knosys.2025.114400_bib0029 article-title: SimVLM: simple visual language model pretraining with weak supervision – year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0030 – start-page: 325 year: 2020 ident: 10.1016/j.knosys.2025.114400_bib0001 article-title: Image captioning: a comprehensive survey – start-page: 1151 year: 2019 ident: 10.1016/j.knosys.2025.114400_bib0015 article-title: Syntax-enhanced neural machine translation with syntax-aware word representations – year: 2015 ident: 10.1016/j.knosys.2025.114400_bib0018 article-title: Neural machine translation by jointly learning to align and translate – start-page: 4651 year: 2016 ident: 10.1016/j.knosys.2025.114400_bib0020 article-title: Image captioning with semantic attention – volume: 8 start-page: 739 year: 2018 ident: 10.1016/j.knosys.2025.114400_bib0026 article-title: Captioning transformer with stacked attention modules publication-title: Appl. Sci. doi: 10.3390/app8050739 – start-page: 8801 year: 2024 ident: 10.1016/j.knosys.2025.114400_bib0053 article-title: Error analysis prompting enables human-like translation evaluation in large language models – volume: 45 start-page: 539 year: 2023 ident: 10.1016/j.knosys.2025.114400_bib0022 article-title: From Show to Tell: a Survey on Deep Learning-based Image Captioning publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2022.3148210 – year: 2023 ident: 10.1016/j.knosys.2025.114400_bib0035 article-title: Contrastive Search Is What You Need For Neural Text Generation – volume: 47 start-page: 853 year: 2013 ident: 10.1016/j.knosys.2025.114400_bib0056 article-title: Framing image description as a ranking task: data, models and evaluation metrics publication-title: J. Artif. Intell. Res. doi: 10.1613/jair.3994 |
| SSID | ssj0002218 |
| Score | 2.4382029 |
| Snippet | Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years.... |
| SourceID | crossref elsevier |
| SourceType | Index Database Publisher |
| StartPage | 114400 |
| SubjectTerms | Beam search Decoding algorithm Image captioning Knowledge graph Semantic diversity |
| Title | Augmented decoding method using semantic diverse beam search for language generation model |
| URI | https://dx.doi.org/10.1016/j.knosys.2025.114400 |
| Volume | 329 |
| WOSCitedRecordID | wos001572118500002&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 issn: 0950-7051 databaseCode: AIEXJ dateStart: 19950201 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: false ssIdentifier: ssj0002218 providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9tAEF5R4NBLX7QqfWkPvVmL1o7tXR-jihaoFCFBpYiLtd5dQ0BZEIkr-PedfdmRUlUUiYuTWPHanvky83kyD4S-SnC6WtiBJmmekVy2lDRFIUnaAvkoq5HgWrlhE2wy4dNpdRyS2BdunAAzht_dVTdPqmrYB8q2pbP_oe5-UdgB70HpsAW1w_ZBih93567RpkoUPFq6mhU_JjrpXFxgoecgzZlMlEvJ0EmjxTwJ0Q-bdBhDmHa6sg4AcQNzVonszxiLI9YPqtARuifoE8dJD-7BlJx0Pf6OOuNdnSY_Irm3YLvwBSIzc3E_fPlw7gm-OSewztVqeCIrXJ3eEJ5cr5sJwUdKGA2tZoMdHvnQx5pN9-GFy70rcw33smdPYjsc55QOPqzPLDyxS9uVM9e5sJo-Q1sZKyqw2Vvjw_3pUe-ms8wFf_tLiXWVLvlv_Vx_5y0rXOT0FXoRHiLw2Cv_NdrQ5g16GQd04GCvd9BZjwUcsYA9FrDDAo5YwAEL2GIBeyxgwAKOWMADFrDDwlv06_v-6bcDEoZpEAlWeUl0qmjTZJVqWqWY_U8tFynjksKvWAs6EkD9FW3bNhN5Kcqy4kI38EmJPKVapKN3aNNcG_0eYVk0bSVVXjDGc3jhDS_BUxSCi8KWhO0iEkVV3_ieKXVMJrysvWhrK9rai3YXsSjPOvA-z-dqgMA_j_zw6CM_oucDWj-hzeVtpz-jbfl7OVvcfglY-QOZ3YI1 |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Augmented+decoding+method+using+semantic+diverse+beam+search+for+language+generation+model&rft.jtitle=Knowledge-based+systems&rft.au=Na%2C+HyungSun&rft.au=Jun%2C+Hee-Gook&rft.au=Ahn%2C+Jinhyun&rft.au=Im%2C+Dong-Hyuk&rft.date=2025-11-04&rft.pub=Elsevier+B.V&rft.issn=0950-7051&rft.volume=329&rft_id=info:doi/10.1016%2Fj.knosys.2025.114400&rft.externalDocID=S095070512501439X |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0950-7051&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0950-7051&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0950-7051&client=summon |