Augmented decoding method using semantic diverse beam search for language generation model

Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years. However, current approaches face a critical limitation in semantic diversity. Most diversity-oriented methods evaluate similarity at the surface...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems Jg. 329; S. 114400
Hauptverfasser: Na, HyungSun, Jun, Hee-Gook, Ahn, Jinhyun, Im, Dong-Hyuk
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier B.V 04.11.2025
Schlagworte:
ISSN:0950-7051
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Image captioning, the task of automatically generating natural language descriptions from visual content, has achieved remarkable accuracy in recent years. However, current approaches face a critical limitation in semantic diversity. Most diversity-oriented methods evaluate similarity at the surface lexical level, incorrectly treating lexically different but semantically equivalent phrases (e.g., 'dog runs' vs 'canine sprints') as meaningfully diverse outputs. This superficial approach fails to capture true semantic variation. Consequently, generated captions appear different but convey essentially identical meanings. To address this fundamental limitation, we propose Semantic Diverse Beam Search (SDBS), an augmented decoding algorithm that operates in semantic space rather than surface lexical space. SDBS integrates four key innovations: knowledge graph-based semantic similarity scoring, adaptive thresholding for important word focus, statistics-based stratified top-k sampling, and beam size normalization. Additionally, we introduce an early-stop strategy that significantly reduces computational complexity while maintaining generation quality, making SDBS practically viable for real-world applications. Comprehensive experiments demonstrate that SDBS achieves superior performance on both traditional metrics and modern evaluation approaches (BARTScore++, LLM-based assessment), generating captions with genuine semantic diversity while maintaining high accuracy and computational efficiency.
ISSN:0950-7051
DOI:10.1016/j.knosys.2025.114400