Crafting Captivating Narratives: Enhancing Image Captioning Through Encoder-Decoder Architectures

This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image features, while an innovative encoder-decoder architecture is used to generate descriptive captions. To ensure compatibility with the model, pre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2024 International Conference on Data Science and Network Security (ICDSNS) S. 1 - 6
Hauptverfasser: Arul Mary S A, Sahaaya, Kumar Pandey, Adamya, Jha, Aadarsh, Sharma, Kartikay, Shohel, Mohmad, B, Natarajan
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 26.07.2024
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image features, while an innovative encoder-decoder architecture is used to generate descriptive captions. To ensure compatibility with the model, pre-processing techniques are applied to the dataset, which comprises images and their corresponding descriptions. Tokenization, vocabulary generation, and effectively utilizing a data generator are integral parts of the training process, facilitating the handling vast datasets. Multiple training epochs are employed to optimize the model's ability to associate captions with specific picture attributes. The model's performance is evaluated using BLEU-1 and BLEU-2 scores on a designated test set, providing quantitative insights into the quality of the generated captions. Additionally, this work includes a visualization component enabling users to input names of specific images, generating captions in real-time for comparative analysis with actual captions. This comprehensive approach integrates methodologies from computer vision and natural language processing, showcasing how deep learning can enhance the interpretability and comprehension of images akin to human perception. BLEU is a widely used, automated metric for evaluating image captioning models, offering objective and quantitative comparisons.
DOI:10.1109/ICDSNS62112.2024.10691286