Crafting Captivating Narratives: Enhancing Image Captioning Through Encoder-Decoder Architectures

This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image features, while an innovative encoder-decoder architecture is used to generate descriptive captions. To ensure compatibility with the model, pre...

Full description

Saved in:
Bibliographic Details
Published in:2024 International Conference on Data Science and Network Security (ICDSNS) pp. 1 - 6
Main Authors: Arul Mary S A, Sahaaya, Kumar Pandey, Adamya, Jha, Aadarsh, Sharma, Kartikay, Shohel, Mohmad, B, Natarajan
Format: Conference Proceeding
Language:English
Published: IEEE 26.07.2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image features, while an innovative encoder-decoder architecture is used to generate descriptive captions. To ensure compatibility with the model, pre-processing techniques are applied to the dataset, which comprises images and their corresponding descriptions. Tokenization, vocabulary generation, and effectively utilizing a data generator are integral parts of the training process, facilitating the handling vast datasets. Multiple training epochs are employed to optimize the model's ability to associate captions with specific picture attributes. The model's performance is evaluated using BLEU-1 and BLEU-2 scores on a designated test set, providing quantitative insights into the quality of the generated captions. Additionally, this work includes a visualization component enabling users to input names of specific images, generating captions in real-time for comparative analysis with actual captions. This comprehensive approach integrates methodologies from computer vision and natural language processing, showcasing how deep learning can enhance the interpretability and comprehension of images akin to human perception. BLEU is a widely used, automated metric for evaluating image captioning models, offering objective and quantitative comparisons.
AbstractList This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image features, while an innovative encoder-decoder architecture is used to generate descriptive captions. To ensure compatibility with the model, pre-processing techniques are applied to the dataset, which comprises images and their corresponding descriptions. Tokenization, vocabulary generation, and effectively utilizing a data generator are integral parts of the training process, facilitating the handling vast datasets. Multiple training epochs are employed to optimize the model's ability to associate captions with specific picture attributes. The model's performance is evaluated using BLEU-1 and BLEU-2 scores on a designated test set, providing quantitative insights into the quality of the generated captions. Additionally, this work includes a visualization component enabling users to input names of specific images, generating captions in real-time for comparative analysis with actual captions. This comprehensive approach integrates methodologies from computer vision and natural language processing, showcasing how deep learning can enhance the interpretability and comprehension of images akin to human perception. BLEU is a widely used, automated metric for evaluating image captioning models, offering objective and quantitative comparisons.
Author B, Natarajan
Kumar Pandey, Adamya
Sharma, Kartikay
Arul Mary S A, Sahaaya
Jha, Aadarsh
Shohel, Mohmad
Author_xml – sequence: 1
  givenname: Sahaaya
  surname: Arul Mary S A
  fullname: Arul Mary S A, Sahaaya
  email: samjessi@gmail.com
  organization: School of Computer Science and Engineering, Vellore Institute of Technology,Vellore,India
– sequence: 2
  givenname: Adamya
  surname: Kumar Pandey
  fullname: Kumar Pandey, Adamya
  email: adamyakumar.pandey2021@vitstudent.ac.in
  organization: School of Computer Science and Engineering, Vellore Institute of Technology,Vellore,India
– sequence: 3
  givenname: Aadarsh
  surname: Jha
  fullname: Jha, Aadarsh
  email: aadarsh.jha2021@vitstudent.ac.in
  organization: School of Computer Science and Engineering, Vellore Institute of Technology,Vellore,India
– sequence: 4
  givenname: Kartikay
  surname: Sharma
  fullname: Sharma, Kartikay
  email: kartikay.sharma2021@vitstudent.ac.in
  organization: School of Computer Science and Engineering, Vellore Institute of Technology,Vellore,India
– sequence: 5
  givenname: Mohmad
  surname: Shohel
  fullname: Shohel, Mohmad
  email: mohmad.shohel2021@vitstudent.ac.in
  organization: School of Information Technology, Vellore Institute of Technology,Vellore,India
– sequence: 6
  givenname: Natarajan
  surname: B
  fullname: B, Natarajan
  email: rec.natarajan@gmail.com
  organization: School of Computer Science and Engineering, Vellore Institute of Technology,Chennai,India
BookMark eNo1j8tOwkAYhcdEF4q8gYv6AK1zaacz7khBbEJwQffkn1s7iUzJtJD49hbQ1Tn58uUk5wndhz5YhF4JzgjB8q2ulrvtjlNCaEYxzTOCuSRU8Ds0l6UUrMCsZJP6iKCK4EYf2qSC4-jPcO1biHFqZzu8J6vQQdAXWh-gtTevDxfQdLE_td2k6N7YmC7tNZNF1J0frR5P0Q7P6MHB92DnfzlDzceqqT7Tzde6rhab1EsypoV0YIQDVYI0RDOhFAVFqDK5M5KIkmqKOXWiMIIxlZeU84IbIzW3udKSzdDLbdZba_fH6A8Qf_b_x9kv8TFVbw
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICDSNS62112.2024.10691286
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350373110
EndPage 6
ExternalDocumentID 10691286
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i91t-59fad8fab7a9d1c38bb2ab12bd4fd91872c2062f85d833b4726656dd9c6e4bc93
IEDL.DBID RIE
IngestDate Wed Oct 09 06:12:58 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i91t-59fad8fab7a9d1c38bb2ab12bd4fd91872c2062f85d833b4726656dd9c6e4bc93
PageCount 6
ParticipantIDs ieee_primary_10691286
PublicationCentury 2000
PublicationDate 2024-July-26
PublicationDateYYYYMMDD 2024-07-26
PublicationDate_xml – month: 07
  year: 2024
  text: 2024-July-26
  day: 26
PublicationDecade 2020
PublicationTitle 2024 International Conference on Data Science and Network Security (ICDSNS)
PublicationTitleAbbrev ICDSNS
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8774672
Snippet This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms BLEU score
Computer architecture
Deep learning
Feature extraction
Generators
image captioning
LSTM
Measurement
natural language processing
Network security
Real-time systems
Tokenization
Training
transfer learning
VGG-16
Vocabulary
Title Crafting Captivating Narratives: Enhancing Image Captioning Through Encoder-Decoder Architectures
URI https://ieeexplore.ieee.org/document/10691286
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5sEfGkYsU3K3hNdbPZPLzJtsWCLIX20FvJE3twW7atv98k2_o4ePCUYRgIzJAMM8n3DcA9z5TXkwwxZzkiufJHSjGFiOGZDXzfNItA4VdWlnw6FaMtWD1iYay18fOZ7QYxvuWbhd6EVpk_4VT4-5S2oMUYbcBaB3C35c18GBa9cTmmvqQJCCtMujv7X5NTYuIYHP1zy2PofEPwktFXcjmBPVudgixq6cI_5aSQy2YumZdLWTf03aunpF-9BQINrx2--5uisYsd12TSTOTxJgHGXqOejWvy_OMpYdWByaA_KV7QdkYCmot0jXLhpOFOKiaFSXXGlcJSpVgZ4oxIOcMaP1LseO59nynCfD7OqTFCU0uUFtkZtKtFZc8hocbHyBdHTkhNtMFcEi2ZCB0i5ZziF9AJ7pktGxaM2c4zl3_or-AwBCH0QTG9hva63tgb2Ncf6_mqvo2x-wR_h57D
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB60inpSseLbFbxutdlsNvEm25YW61LoHnoreWIPbsu29febZLc-Dh48JQwJgRmSIZN83wdwTyNh7TgKE6NpiGNht5RIRIgVjbTj-yaRBwoPkyyjkwkb1WB1j4XRWvvPZ7rluv4tX83l2pXK7A4nzJ6nZBt2nHRWDdfag7uaOfNhkHbG2ZjYS43DWCHc2sz4pZ3iU0fv8J-LHkHzG4QXjL7SyzFs6eIEeFpy434qBylfVMpktp_xsiLwXj4F3eLNUWhY6-DdnhXVOF9zDfJKk8cOcUD2Muxo3wbPPx4Tlk3Ie9087Ye1SkI4Y-1VGDPDFTVcJJyptoyoEIiLNhIKG8XaNEESPRJkaGy9Hwmc2IwcE6WYJBoLyaJTaBTzQp9BQJSNkr0eGcYllgpRjiVPmKsRCWMEPYemc890UfFgTDeeufjDfgv7_fx1OB0OspdLOHABcVVRRK6gsSrX-hp25cdqtixvfBw_AfBqogw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+International+Conference+on+Data+Science+and+Network+Security+%28ICDSNS%29&rft.atitle=Crafting+Captivating+Narratives%3A+Enhancing+Image+Captioning+Through+Encoder-Decoder+Architectures&rft.au=Arul+Mary+S+A%2C+Sahaaya&rft.au=Kumar+Pandey%2C+Adamya&rft.au=Jha%2C+Aadarsh&rft.au=Sharma%2C+Kartikay&rft.date=2024-07-26&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FICDSNS62112.2024.10691286&rft.externalDocID=10691286