Attention based Image Caption Generation (ABICG) using Encoder-Decoder Architecture
The image captioning is utilized to develop the explanations of the sentences describing the series of scenes captured in the image or picture forms. The practice of using image captioning is vast although it is a tedious task for the machine to learn what a human is capable of. The model must be bu...
Saved in:
| Published in: | International Conference on Smart Systems and Inventive Technology (Online) pp. 1564 - 1572 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
23.01.2023
|
| Subjects: | |
| ISSN: | 2832-3017 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | The image captioning is utilized to develop the explanations of the sentences describing the series of scenes captured in the image or picture forms. The practice of using image captioning is vast although it is a tedious task for the machine to learn what a human is capable of. The model must be built in a way such that when it reads the scene, it recognizes and reproduce to-the-point captions or descriptions. The generated descriptions must be semantically and syntactically accurate. Hence, availability of Artificial Intelligence (AI) and Machine Learning algorithms viz. Natural Language Processing (NLP), Deep Learning (DL) etc. makes the task easier. Although majority of the existing machine-generated captions are valid, they do not focus on the crucial parts of the images, which results in lesser clarity of the captions. In the proposed paper, anew introduction to attention mechanism called Bahdanau's along with EncoderDecoder architecture is being used so as to reflect the image captions that are more accurate and detailed. It uses a pretrained Convolutional Neural Network (CNN) called InceptionV3 architecture to gather the features of images and then a Recurrent Neural Network (RNN) called Gated Recurrent Unit (GRU) architecture in order to develop captions. This model is trained on Flickr8k dataset and the captions generated are 10% more accurate than the present state of art. |
|---|---|
| AbstractList | The image captioning is utilized to develop the explanations of the sentences describing the series of scenes captured in the image or picture forms. The practice of using image captioning is vast although it is a tedious task for the machine to learn what a human is capable of. The model must be built in a way such that when it reads the scene, it recognizes and reproduce to-the-point captions or descriptions. The generated descriptions must be semantically and syntactically accurate. Hence, availability of Artificial Intelligence (AI) and Machine Learning algorithms viz. Natural Language Processing (NLP), Deep Learning (DL) etc. makes the task easier. Although majority of the existing machine-generated captions are valid, they do not focus on the crucial parts of the images, which results in lesser clarity of the captions. In the proposed paper, anew introduction to attention mechanism called Bahdanau's along with EncoderDecoder architecture is being used so as to reflect the image captions that are more accurate and detailed. It uses a pretrained Convolutional Neural Network (CNN) called InceptionV3 architecture to gather the features of images and then a Recurrent Neural Network (RNN) called Gated Recurrent Unit (GRU) architecture in order to develop captions. This model is trained on Flickr8k dataset and the captions generated are 10% more accurate than the present state of art. |
| Author | Kulkarni, Uday Jadhav, Pranav Kalmat, Mayuri Bandi, Rakshita Tomar, Kushagra Meena, Sm |
| Author_xml | – sequence: 1 givenname: Uday surname: Kulkarni fullname: Kulkarni, Uday email: udaykulkarni@kletech.ac.in organization: KLE Technological University,Dept. of CSE,Hubballi,India – sequence: 2 givenname: Kushagra surname: Tomar fullname: Tomar, Kushagra email: kushagratomar2016@gmail.com organization: KLE Technological University,Dept. of CSE,Hubballi,India – sequence: 3 givenname: Mayuri surname: Kalmat fullname: Kalmat, Mayuri email: mayurikalmat1@gmail.com organization: KLE Technological University,Dept. of CSE,Hubballi,India – sequence: 4 givenname: Rakshita surname: Bandi fullname: Bandi, Rakshita email: rakshitabandi0@gmail.com organization: KLE Technological University,Dept. of CSE,Hubballi,India – sequence: 5 givenname: Pranav surname: Jadhav fullname: Jadhav, Pranav email: jadhavpranav250@gmail.com organization: KLE Technological University,Dept. of CSE,Hubballi,India – sequence: 6 givenname: Sm surname: Meena fullname: Meena, Sm email: msm@kletech.ac.in organization: KLE Technological University,Dept. of CSE,Hubballi,India |
| BookMark | eNo1kLtOw0AQRRcEEknIH1CYDgqHmZ192KUxIViKRBEXdNF6PQ5GZBPZTsHfRzJQnatT3OJMxVU4BBbiHmGBCOlTkW82Ral1gmohQdICAQyCggsxRWO0ssrYj0sxkQnJmADtjZj3_RcAkASFmiZikw0Dh6E9hKhyPddRsXc7jnJ3HN2KA3dunA_Zc5GvHqNT34ZdtAz-UHMXv_DIKOv8ZzuwH04d34rrxn33PP_jTJSvyzJ_i9fvqyLP1nGLmA6xrXTqDGlOa2uwkoTUNM7blCV5LVkqlVBFzmivvGfUVSrBW82qSRpf00zc_d62zLw9du3edT_b_wZ0BuTaUwc |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICSSIT55814.2023.10061040 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 166547467X 9781665474672 |
| EISSN | 2832-3017 |
| EndPage | 1572 |
| ExternalDocumentID | 10061040 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL |
| ID | FETCH-LOGICAL-i119t-7b59a635e9d761b2313ffac79e23c52e24483b3a65c4cce15b920c75e4f8fcd3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 02:21:19 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i119t-7b59a635e9d761b2313ffac79e23c52e24483b3a65c4cce15b920c75e4f8fcd3 |
| PageCount | 9 |
| ParticipantIDs | ieee_primary_10061040 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-Jan.-23 |
| PublicationDateYYYYMMDD | 2023-01-23 |
| PublicationDate_xml | – month: 01 year: 2023 text: 2023-Jan.-23 day: 23 |
| PublicationDecade | 2020 |
| PublicationTitle | International Conference on Smart Systems and Inventive Technology (Online) |
| PublicationTitleAbbrev | ICSSIT |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003204153 |
| Score | 1.8275601 |
| Snippet | The image captioning is utilized to develop the explanations of the sentences describing the series of scenes captured in the image or picture forms. The... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1564 |
| SubjectTerms | Attention mechanism Convolutional Neural Network (CNN) Decoder Decoding Encoder Gated Recurrent Unit (GRU) Image captioning Machine learning algorithms Natural language processing Recurrent Neural Network (RNN) Recurrent neural networks Training Transformers Vocabulary |
| Title | Attention based Image Caption Generation (ABICG) using Encoder-Decoder Architecture |
| URI | https://ieeexplore.ieee.org/document/10061040 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5aRDypWPFNBA962NrNY7M51rXVgpSCRXoreUykB7dSt_5-k2xb9eDB0w4LC8uXZL5hMt8MQlcMFJM-kk80SW3CtPN-UAqSOLA0y4ySEGcdvjyJwSAfj-VwKVaPWhgAiMVn0ApmvMu3M7MIqTJ_wj37-F23iTaFyGqx1jqhQklQm9NtdLnso3nbL7xDGnGepyF5Qmhr9f2vSSqRSHq7__yFPdT8luTh4Zps9tEGlAfouVNVdbkiDmxkcf_NuwdcqOgHcN1SOprXnbt-8XCDQ5n7K-6WQck-T-4hPnHnx21CE4163VHxmCynJCTTNJVVIjSXyocNIK3IUu3jNeqcMsKjTA0nHmuWU01Vxg0zBlKuJWkbwYG53BlLD1GjnJVwhLCwGSFa-8VzjAG0lYWcgMxz0wbPY-YYNQMgk_e6D8ZkhcXJH-9P0U6APSQsCD1DjWq-gHO0ZT6r6cf8Iq7eFx_XmnM |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagIGACRBFvjMQAQyDxI4nHEloaUapKRKhbFdsX1IEUlZTfj-20BQYGJlseLOvOvu90vu8OoUsGORPGk_ckCbTHZGHsoIiIV4CmYahyAa7X4Usv6vfj4VAM5mR1x4UBAJd8Bjd26v7y9UTNbKjMvHCDPubWraI1zhjxa7rWMqRCieWb0w10Ma-keZsmxiRlnMeBDZ8QerPY4VcvFQclne1_HmIHNb9JeXiwhJtdtALlHnpuVVWdsIgtHmmcvhkDgZPcWQJcF5V206vWXZo8XGOb6P6K26Xlsk-9e3Ajbv34T2iirNPOkq4375PgjYNAVF4kuciN4wBCR2EgjcdGiyJXkZEzVZwYabOYSpqHXDGlIOBSEF9FHFgRF0rTfdQoJyUcIBzpkBApjfoKxgD8XENMQMSx8sEgmTpETSuQ0XtdCWO0kMXRH-vnaLObPfVGvbT_eIy2rAps-ILQE9SopjM4Revqsxp_TM-cJr8AWludug |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=International+Conference+on+Smart+Systems+and+Inventive+Technology+%28Online%29&rft.atitle=Attention+based+Image+Caption+Generation+%28ABICG%29+using+Encoder-Decoder+Architecture&rft.au=Kulkarni%2C+Uday&rft.au=Tomar%2C+Kushagra&rft.au=Kalmat%2C+Mayuri&rft.au=Bandi%2C+Rakshita&rft.date=2023-01-23&rft.pub=IEEE&rft.eissn=2832-3017&rft.spage=1564&rft.epage=1572&rft_id=info:doi/10.1109%2FICSSIT55814.2023.10061040&rft.externalDocID=10061040 |