Attention based Image Caption Generation (ABICG) using Encoder-Decoder Architecture

The image captioning is utilized to develop the explanations of the sentences describing the series of scenes captured in the image or picture forms. The practice of using image captioning is vast although it is a tedious task for the machine to learn what a human is capable of. The model must be bu...

Full description

Saved in:
Bibliographic Details
Published in:International Conference on Smart Systems and Inventive Technology (Online) pp. 1564 - 1572
Main Authors: Kulkarni, Uday, Tomar, Kushagra, Kalmat, Mayuri, Bandi, Rakshita, Jadhav, Pranav, Meena, Sm
Format: Conference Proceeding
Language:English
Published: IEEE 23.01.2023
Subjects:
ISSN:2832-3017
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The image captioning is utilized to develop the explanations of the sentences describing the series of scenes captured in the image or picture forms. The practice of using image captioning is vast although it is a tedious task for the machine to learn what a human is capable of. The model must be built in a way such that when it reads the scene, it recognizes and reproduce to-the-point captions or descriptions. The generated descriptions must be semantically and syntactically accurate. Hence, availability of Artificial Intelligence (AI) and Machine Learning algorithms viz. Natural Language Processing (NLP), Deep Learning (DL) etc. makes the task easier. Although majority of the existing machine-generated captions are valid, they do not focus on the crucial parts of the images, which results in lesser clarity of the captions. In the proposed paper, anew introduction to attention mechanism called Bahdanau's along with EncoderDecoder architecture is being used so as to reflect the image captions that are more accurate and detailed. It uses a pretrained Convolutional Neural Network (CNN) called InceptionV3 architecture to gather the features of images and then a Recurrent Neural Network (RNN) called Gated Recurrent Unit (GRU) architecture in order to develop captions. This model is trained on Flickr8k dataset and the captions generated are 10% more accurate than the present state of art.
AbstractList The image captioning is utilized to develop the explanations of the sentences describing the series of scenes captured in the image or picture forms. The practice of using image captioning is vast although it is a tedious task for the machine to learn what a human is capable of. The model must be built in a way such that when it reads the scene, it recognizes and reproduce to-the-point captions or descriptions. The generated descriptions must be semantically and syntactically accurate. Hence, availability of Artificial Intelligence (AI) and Machine Learning algorithms viz. Natural Language Processing (NLP), Deep Learning (DL) etc. makes the task easier. Although majority of the existing machine-generated captions are valid, they do not focus on the crucial parts of the images, which results in lesser clarity of the captions. In the proposed paper, anew introduction to attention mechanism called Bahdanau's along with EncoderDecoder architecture is being used so as to reflect the image captions that are more accurate and detailed. It uses a pretrained Convolutional Neural Network (CNN) called InceptionV3 architecture to gather the features of images and then a Recurrent Neural Network (RNN) called Gated Recurrent Unit (GRU) architecture in order to develop captions. This model is trained on Flickr8k dataset and the captions generated are 10% more accurate than the present state of art.
Author Kulkarni, Uday
Jadhav, Pranav
Kalmat, Mayuri
Bandi, Rakshita
Tomar, Kushagra
Meena, Sm
Author_xml – sequence: 1
  givenname: Uday
  surname: Kulkarni
  fullname: Kulkarni, Uday
  email: udaykulkarni@kletech.ac.in
  organization: KLE Technological University,Dept. of CSE,Hubballi,India
– sequence: 2
  givenname: Kushagra
  surname: Tomar
  fullname: Tomar, Kushagra
  email: kushagratomar2016@gmail.com
  organization: KLE Technological University,Dept. of CSE,Hubballi,India
– sequence: 3
  givenname: Mayuri
  surname: Kalmat
  fullname: Kalmat, Mayuri
  email: mayurikalmat1@gmail.com
  organization: KLE Technological University,Dept. of CSE,Hubballi,India
– sequence: 4
  givenname: Rakshita
  surname: Bandi
  fullname: Bandi, Rakshita
  email: rakshitabandi0@gmail.com
  organization: KLE Technological University,Dept. of CSE,Hubballi,India
– sequence: 5
  givenname: Pranav
  surname: Jadhav
  fullname: Jadhav, Pranav
  email: jadhavpranav250@gmail.com
  organization: KLE Technological University,Dept. of CSE,Hubballi,India
– sequence: 6
  givenname: Sm
  surname: Meena
  fullname: Meena, Sm
  email: msm@kletech.ac.in
  organization: KLE Technological University,Dept. of CSE,Hubballi,India
BookMark eNo1kLtOw0AQRRcEEknIH1CYDgqHmZ192KUxIViKRBEXdNF6PQ5GZBPZTsHfRzJQnatT3OJMxVU4BBbiHmGBCOlTkW82Ral1gmohQdICAQyCggsxRWO0ssrYj0sxkQnJmADtjZj3_RcAkASFmiZikw0Dh6E9hKhyPddRsXc7jnJ3HN2KA3dunA_Zc5GvHqNT34ZdtAz-UHMXv_DIKOv8ZzuwH04d34rrxn33PP_jTJSvyzJ_i9fvqyLP1nGLmA6xrXTqDGlOa2uwkoTUNM7blCV5LVkqlVBFzmivvGfUVSrBW82qSRpf00zc_d62zLw9du3edT_b_wZ0BuTaUwc
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICSSIT55814.2023.10061040
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 166547467X
9781665474672
EISSN 2832-3017
EndPage 1572
ExternalDocumentID 10061040
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i119t-7b59a635e9d761b2313ffac79e23c52e24483b3a65c4cce15b920c75e4f8fcd3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:21:19 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-7b59a635e9d761b2313ffac79e23c52e24483b3a65c4cce15b920c75e4f8fcd3
PageCount 9
ParticipantIDs ieee_primary_10061040
PublicationCentury 2000
PublicationDate 2023-Jan.-23
PublicationDateYYYYMMDD 2023-01-23
PublicationDate_xml – month: 01
  year: 2023
  text: 2023-Jan.-23
  day: 23
PublicationDecade 2020
PublicationTitle International Conference on Smart Systems and Inventive Technology (Online)
PublicationTitleAbbrev ICSSIT
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003204153
Score 1.8275601
Snippet The image captioning is utilized to develop the explanations of the sentences describing the series of scenes captured in the image or picture forms. The...
SourceID ieee
SourceType Publisher
StartPage 1564
SubjectTerms Attention mechanism
Convolutional Neural Network (CNN)
Decoder
Decoding
Encoder
Gated Recurrent Unit (GRU)
Image captioning
Machine learning algorithms
Natural language processing
Recurrent Neural Network (RNN)
Recurrent neural networks
Training
Transformers
Vocabulary
Title Attention based Image Caption Generation (ABICG) using Encoder-Decoder Architecture
URI https://ieeexplore.ieee.org/document/10061040
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5aRDypWPFNBA962NrNY7M51rXVgpSCRXoreUykB7dSt_5-k2xb9eDB0w4LC8uXZL5hMt8MQlcMFJM-kk80SW3CtPN-UAqSOLA0y4ySEGcdvjyJwSAfj-VwKVaPWhgAiMVn0ApmvMu3M7MIqTJ_wj37-F23iTaFyGqx1jqhQklQm9NtdLnso3nbL7xDGnGepyF5Qmhr9f2vSSqRSHq7__yFPdT8luTh4Zps9tEGlAfouVNVdbkiDmxkcf_NuwdcqOgHcN1SOprXnbt-8XCDQ5n7K-6WQck-T-4hPnHnx21CE4163VHxmCynJCTTNJVVIjSXyocNIK3IUu3jNeqcMsKjTA0nHmuWU01Vxg0zBlKuJWkbwYG53BlLD1GjnJVwhLCwGSFa-8VzjAG0lYWcgMxz0wbPY-YYNQMgk_e6D8ZkhcXJH-9P0U6APSQsCD1DjWq-gHO0ZT6r6cf8Iq7eFx_XmnM
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagIGACRBFvjMQAQyDxI4nHEloaUapKRKhbFdsX1IEUlZTfj-20BQYGJlseLOvOvu90vu8OoUsGORPGk_ckCbTHZGHsoIiIV4CmYahyAa7X4Usv6vfj4VAM5mR1x4UBAJd8Bjd26v7y9UTNbKjMvHCDPubWraI1zhjxa7rWMqRCieWb0w10Ma-keZsmxiRlnMeBDZ8QerPY4VcvFQclne1_HmIHNb9JeXiwhJtdtALlHnpuVVWdsIgtHmmcvhkDgZPcWQJcF5V206vWXZo8XGOb6P6K26Xlsk-9e3Ajbv34T2iirNPOkq4375PgjYNAVF4kuciN4wBCR2EgjcdGiyJXkZEzVZwYabOYSpqHXDGlIOBSEF9FHFgRF0rTfdQoJyUcIBzpkBApjfoKxgD8XENMQMSx8sEgmTpETSuQ0XtdCWO0kMXRH-vnaLObPfVGvbT_eIy2rAps-ILQE9SopjM4Revqsxp_TM-cJr8AWludug
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=International+Conference+on+Smart+Systems+and+Inventive+Technology+%28Online%29&rft.atitle=Attention+based+Image+Caption+Generation+%28ABICG%29+using+Encoder-Decoder+Architecture&rft.au=Kulkarni%2C+Uday&rft.au=Tomar%2C+Kushagra&rft.au=Kalmat%2C+Mayuri&rft.au=Bandi%2C+Rakshita&rft.date=2023-01-23&rft.pub=IEEE&rft.eissn=2832-3017&rft.spage=1564&rft.epage=1572&rft_id=info:doi/10.1109%2FICSSIT55814.2023.10061040&rft.externalDocID=10061040