Crafting Captivating Narratives: Enhancing Image Captioning Through Encoder-Decoder Architectures
This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image features, while an innovative encoder-decoder architecture is used to generate descriptive captions. To ensure compatibility with the model, pre...
Saved in:
| Published in: | 2024 International Conference on Data Science and Network Security (ICDSNS) pp. 1 - 6 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
26.07.2024
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image features, while an innovative encoder-decoder architecture is used to generate descriptive captions. To ensure compatibility with the model, pre-processing techniques are applied to the dataset, which comprises images and their corresponding descriptions. Tokenization, vocabulary generation, and effectively utilizing a data generator are integral parts of the training process, facilitating the handling vast datasets. Multiple training epochs are employed to optimize the model's ability to associate captions with specific picture attributes. The model's performance is evaluated using BLEU-1 and BLEU-2 scores on a designated test set, providing quantitative insights into the quality of the generated captions. Additionally, this work includes a visualization component enabling users to input names of specific images, generating captions in real-time for comparative analysis with actual captions. This comprehensive approach integrates methodologies from computer vision and natural language processing, showcasing how deep learning can enhance the interpretability and comprehension of images akin to human perception. BLEU is a widely used, automated metric for evaluating image captioning models, offering objective and quantitative comparisons. |
|---|---|
| DOI: | 10.1109/ICDSNS62112.2024.10691286 |