Crafting Captivating Narratives: Enhancing Image Captioning Through Encoder-Decoder Architectures
This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image features, while an innovative encoder-decoder architecture is used to generate descriptive captions. To ensure compatibility with the model, pre...
Saved in:
| Published in: | 2024 International Conference on Data Science and Network Security (ICDSNS) pp. 1 - 6 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
26.07.2024
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image features, while an innovative encoder-decoder architecture is used to generate descriptive captions. To ensure compatibility with the model, pre-processing techniques are applied to the dataset, which comprises images and their corresponding descriptions. Tokenization, vocabulary generation, and effectively utilizing a data generator are integral parts of the training process, facilitating the handling vast datasets. Multiple training epochs are employed to optimize the model's ability to associate captions with specific picture attributes. The model's performance is evaluated using BLEU-1 and BLEU-2 scores on a designated test set, providing quantitative insights into the quality of the generated captions. Additionally, this work includes a visualization component enabling users to input names of specific images, generating captions in real-time for comparative analysis with actual captions. This comprehensive approach integrates methodologies from computer vision and natural language processing, showcasing how deep learning can enhance the interpretability and comprehension of images akin to human perception. BLEU is a widely used, automated metric for evaluating image captioning models, offering objective and quantitative comparisons. |
|---|---|
| AbstractList | This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image features, while an innovative encoder-decoder architecture is used to generate descriptive captions. To ensure compatibility with the model, pre-processing techniques are applied to the dataset, which comprises images and their corresponding descriptions. Tokenization, vocabulary generation, and effectively utilizing a data generator are integral parts of the training process, facilitating the handling vast datasets. Multiple training epochs are employed to optimize the model's ability to associate captions with specific picture attributes. The model's performance is evaluated using BLEU-1 and BLEU-2 scores on a designated test set, providing quantitative insights into the quality of the generated captions. Additionally, this work includes a visualization component enabling users to input names of specific images, generating captions in real-time for comparative analysis with actual captions. This comprehensive approach integrates methodologies from computer vision and natural language processing, showcasing how deep learning can enhance the interpretability and comprehension of images akin to human perception. BLEU is a widely used, automated metric for evaluating image captioning models, offering objective and quantitative comparisons. |
| Author | B, Natarajan Kumar Pandey, Adamya Sharma, Kartikay Arul Mary S A, Sahaaya Jha, Aadarsh Shohel, Mohmad |
| Author_xml | – sequence: 1 givenname: Sahaaya surname: Arul Mary S A fullname: Arul Mary S A, Sahaaya email: samjessi@gmail.com organization: School of Computer Science and Engineering, Vellore Institute of Technology,Vellore,India – sequence: 2 givenname: Adamya surname: Kumar Pandey fullname: Kumar Pandey, Adamya email: adamyakumar.pandey2021@vitstudent.ac.in organization: School of Computer Science and Engineering, Vellore Institute of Technology,Vellore,India – sequence: 3 givenname: Aadarsh surname: Jha fullname: Jha, Aadarsh email: aadarsh.jha2021@vitstudent.ac.in organization: School of Computer Science and Engineering, Vellore Institute of Technology,Vellore,India – sequence: 4 givenname: Kartikay surname: Sharma fullname: Sharma, Kartikay email: kartikay.sharma2021@vitstudent.ac.in organization: School of Computer Science and Engineering, Vellore Institute of Technology,Vellore,India – sequence: 5 givenname: Mohmad surname: Shohel fullname: Shohel, Mohmad email: mohmad.shohel2021@vitstudent.ac.in organization: School of Information Technology, Vellore Institute of Technology,Vellore,India – sequence: 6 givenname: Natarajan surname: B fullname: B, Natarajan email: rec.natarajan@gmail.com organization: School of Computer Science and Engineering, Vellore Institute of Technology,Chennai,India |
| BookMark | eNo1j8tOwkAYhcdEF4q8gYv6AK1zaacz7khBbEJwQffkn1s7iUzJtJD49hbQ1Tn58uUk5wndhz5YhF4JzgjB8q2ulrvtjlNCaEYxzTOCuSRU8Ds0l6UUrMCsZJP6iKCK4EYf2qSC4-jPcO1biHFqZzu8J6vQQdAXWh-gtTevDxfQdLE_td2k6N7YmC7tNZNF1J0frR5P0Q7P6MHB92DnfzlDzceqqT7Tzde6rhab1EsypoV0YIQDVYI0RDOhFAVFqDK5M5KIkmqKOXWiMIIxlZeU84IbIzW3udKSzdDLbdZba_fH6A8Qf_b_x9kv8TFVbw |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICDSNS62112.2024.10691286 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 9798350373110 |
| EndPage | 6 |
| ExternalDocumentID | 10691286 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i91t-59fad8fab7a9d1c38bb2ab12bd4fd91872c2062f85d833b4726656dd9c6e4bc93 |
| IEDL.DBID | RIE |
| IngestDate | Wed Oct 09 06:12:58 EDT 2024 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i91t-59fad8fab7a9d1c38bb2ab12bd4fd91872c2062f85d833b4726656dd9c6e4bc93 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_10691286 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-July-26 |
| PublicationDateYYYYMMDD | 2024-07-26 |
| PublicationDate_xml | – month: 07 year: 2024 text: 2024-July-26 day: 26 |
| PublicationDecade | 2020 |
| PublicationTitle | 2024 International Conference on Data Science and Network Security (ICDSNS) |
| PublicationTitleAbbrev | ICDSNS |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.8774672 |
| Snippet | This research work aims to develop an image captioning system utilizing deep learning techniques. The pre-trained VGG-16 model is employed to extract image... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | BLEU score Computer architecture Deep learning Feature extraction Generators image captioning LSTM Measurement natural language processing Network security Real-time systems Tokenization Training transfer learning VGG-16 Vocabulary |
| Title | Crafting Captivating Narratives: Enhancing Image Captioning Through Encoder-Decoder Architectures |
| URI | https://ieeexplore.ieee.org/document/10691286 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5sEfGkYsU3K3hNdbPZPLzJtsWCLIX20FvJE3twW7atv98k2_o4ePCUYRgIzJAMM8n3DcA9z5TXkwwxZzkiufJHSjGFiOGZDXzfNItA4VdWlnw6FaMtWD1iYay18fOZ7QYxvuWbhd6EVpk_4VT4-5S2oMUYbcBaB3C35c18GBa9cTmmvqQJCCtMujv7X5NTYuIYHP1zy2PofEPwktFXcjmBPVudgixq6cI_5aSQy2YumZdLWTf03aunpF-9BQINrx2--5uisYsd12TSTOTxJgHGXqOejWvy_OMpYdWByaA_KV7QdkYCmot0jXLhpOFOKiaFSXXGlcJSpVgZ4oxIOcMaP1LseO59nynCfD7OqTFCU0uUFtkZtKtFZc8hocbHyBdHTkhNtMFcEi2ZCB0i5ZziF9AJ7pktGxaM2c4zl3_or-AwBCH0QTG9hva63tgb2Ncf6_mqvo2x-wR_h57D |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB60inpSseLbFbxutdlsNvEm25YW61LoHnoreWIPbsu29febZLc-Dh48JQwJgRmSIZN83wdwTyNh7TgKE6NpiGNht5RIRIgVjbTj-yaRBwoPkyyjkwkb1WB1j4XRWvvPZ7rluv4tX83l2pXK7A4nzJ6nZBt2nHRWDdfag7uaOfNhkHbG2ZjYS43DWCHc2sz4pZ3iU0fv8J-LHkHzG4QXjL7SyzFs6eIEeFpy434qBylfVMpktp_xsiLwXj4F3eLNUWhY6-DdnhXVOF9zDfJKk8cOcUD2Muxo3wbPPx4Tlk3Ie9087Ye1SkI4Y-1VGDPDFTVcJJyptoyoEIiLNhIKG8XaNEESPRJkaGy9Hwmc2IwcE6WYJBoLyaJTaBTzQp9BQJSNkr0eGcYllgpRjiVPmKsRCWMEPYemc890UfFgTDeeufjDfgv7_fx1OB0OspdLOHABcVVRRK6gsSrX-hp25cdqtixvfBw_AfBqogw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+International+Conference+on+Data+Science+and+Network+Security+%28ICDSNS%29&rft.atitle=Crafting+Captivating+Narratives%3A+Enhancing+Image+Captioning+Through+Encoder-Decoder+Architectures&rft.au=Arul+Mary+S+A%2C+Sahaaya&rft.au=Kumar+Pandey%2C+Adamya&rft.au=Jha%2C+Aadarsh&rft.au=Sharma%2C+Kartikay&rft.date=2024-07-26&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FICDSNS62112.2024.10691286&rft.externalDocID=10691286 |