Bangla Image Caption Generation Using Vision Transformer (ViT) Based Model

In the era of digital content and visual communication, Bangla image captioning has emerged as a crucial technology for enhancing accessibility, improving content discoverability, and bridging the language gap for millions of Bangla speakers worldwide. Our work proposes a novel approach combining a...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2025 International Conference on Electrical, Computer and Communication Engineering (ECCE) s. 1 - 6
Hlavní autoři:	Sarker, Arpita, Das, Udoy, Murad, Hasan
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 13.02.2025
Témata:	Bangla Image Captioning Bangla NLP Computational modeling Computer architecture Computer vision Encoder-Decoder Transformer Feature extraction Image edge detection Measurement Meteors Residual neural networks Transformers Vision Transformer Visual communication
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	In the era of digital content and visual communication, Bangla image captioning has emerged as a crucial technology for enhancing accessibility, improving content discoverability, and bridging the language gap for millions of Bangla speakers worldwide. Our work proposes a novel approach combining a vision transformer as a feature extractor with a customized encoder-decoder architecture for Bangla language generation. We use a wide range of metrics, such as BLEU, ROUGE-L, and METEOR, that have been specifically tailored for Bangla to assess the effectiveness of our model. The proposed model performs at the cutting edge with a BLEU score of 0.6572, a ROUGE-L score of 0.6218, and a METEOR score of 0.4513. Comparative analysis with other architectures, such as Xception, ResNet101, ResNet50, and InceptionV3 combined with encoder-decoder models, gives information about both the advantages and drawbacks of several methods for captioning images in Bangla.
DOI:	10.1109/ECCE64574.2025.11013210