Crafting Captivating Narratives: Enhancing Image Captioning Through Encoder-Decoder Architectures

This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image features, while an innovative encoder-decoder architecture is used to generate descriptive captions. To ensure compatibility with the model, pre...

Full description

Saved in:

Bibliographic Details
Published in:	2024 International Conference on Data Science and Network Security (ICDSNS) pp. 1 - 6
Main Authors:	Arul Mary S A, Sahaaya, Kumar Pandey, Adamya, Jha, Aadarsh, Sharma, Kartikay, Shohel, Mohmad, B, Natarajan
Format:	Conference Proceeding
Language:	English
Published:	IEEE 26.07.2024
Subjects:	BLEU score Computer architecture Deep learning Feature extraction Generators image captioning LSTM Measurement natural language processing Network security Real-time systems Tokenization Training transfer learning VGG-16 Vocabulary
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image features, while an innovative encoder-decoder architecture is used to generate descriptive captions. To ensure compatibility with the model, pre-processing techniques are applied to the dataset, which comprises images and their corresponding descriptions. Tokenization, vocabulary generation, and effectively utilizing a data generator are integral parts of the training process, facilitating the handling vast datasets. Multiple training epochs are employed to optimize the model's ability to associate captions with specific picture attributes. The model's performance is evaluated using BLEU-1 and BLEU-2 scores on a designated test set, providing quantitative insights into the quality of the generated captions. Additionally, this work includes a visualization component enabling users to input names of specific images, generating captions in real-time for comparative analysis with actual captions. This comprehensive approach integrates methodologies from computer vision and natural language processing, showcasing how deep learning can enhance the interpretability and comprehension of images akin to human perception. BLEU is a widely used, automated metric for evaluating image captioning models, offering objective and quantitative comparisons.
DOI:	10.1109/ICDSNS62112.2024.10691286