Character-level arabic text generation from sign language video using encoder–decoder model

Video to text conversion is a vital activity in the field of computer vision. In recent years, deep learning algorithms have dominated automatic text generation in English, but there are a few research works available for other languages. In this paper, we propose a novel encoding–decoding system th...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Displays Ročník 76; s. 102340
Hlavní autoři: Boukdir, Abdelbasset, Benaddy, Mohamed, Meslouhi, Othmane El, Kardouchi, Mustapha, Akhloufi, Moulay
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 01.01.2023
Témata:
ISSN:0141-9382, 1872-7387
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Video to text conversion is a vital activity in the field of computer vision. In recent years, deep learning algorithms have dominated automatic text generation in English, but there are a few research works available for other languages. In this paper, we propose a novel encoding–decoding system that generates character-level Arabic sentences from isolated RGB videos of Moroccan sign language. The video sequence was encoded by a spatiotemporal feature extraction using pose estimation models, while the label text of the video is transmitted to a sequence of representative vectors. Both the features and the label vector are joined and treated by a decoder layer to derive a final prediction. We trained the proposed system on an isolated Moroccan Sign Language dataset (MoSLD), composed of RGB videos from 125 MoSL signs. The experimental results reveal that the proposed model attains the best performance under several evaluation metrics. •A method of generating character-level Arabic text from Moroccan sign language datasets has been proposed.•To the best of our knowledge, the proposed model is the first neural encoder–decoder model for Arabic video captioning.•Different landmark estimation schemes are used at the Arabic character level to improve the accuracy and interpretive performance of the results.
AbstractList Video to text conversion is a vital activity in the field of computer vision. In recent years, deep learning algorithms have dominated automatic text generation in English, but there are a few research works available for other languages. In this paper, we propose a novel encoding–decoding system that generates character-level Arabic sentences from isolated RGB videos of Moroccan sign language. The video sequence was encoded by a spatiotemporal feature extraction using pose estimation models, while the label text of the video is transmitted to a sequence of representative vectors. Both the features and the label vector are joined and treated by a decoder layer to derive a final prediction. We trained the proposed system on an isolated Moroccan Sign Language dataset (MoSLD), composed of RGB videos from 125 MoSL signs. The experimental results reveal that the proposed model attains the best performance under several evaluation metrics. •A method of generating character-level Arabic text from Moroccan sign language datasets has been proposed.•To the best of our knowledge, the proposed model is the first neural encoder–decoder model for Arabic video captioning.•Different landmark estimation schemes are used at the Arabic character level to improve the accuracy and interpretive performance of the results.
ArticleNumber 102340
Author Meslouhi, Othmane El
Kardouchi, Mustapha
Benaddy, Mohamed
Boukdir, Abdelbasset
Akhloufi, Moulay
Author_xml – sequence: 1
  givenname: Abdelbasset
  orcidid: 0000-0002-5032-7445
  surname: Boukdir
  fullname: Boukdir, Abdelbasset
  email: abdelbasset.boukdir@edu.uiz.ac.ma
  organization: LabSI Laboratory, FSA/PFO, Ibn Zohr University, Ouarzazate, Morocco
– sequence: 2
  givenname: Mohamed
  surname: Benaddy
  fullname: Benaddy, Mohamed
  organization: LabSI Laboratory, FSA/PFO, Ibn Zohr University, Ouarzazate, Morocco
– sequence: 3
  givenname: Othmane El
  surname: Meslouhi
  fullname: Meslouhi, Othmane El
  organization: SARS Group, National School of Applied Sciences - Safi, Cadi Ayyad University, Morocco
– sequence: 4
  givenname: Mustapha
  surname: Kardouchi
  fullname: Kardouchi, Mustapha
  organization: PRIME Group, Department of Computer Sciences, Université de Moncton, Moncton, Canada
– sequence: 5
  givenname: Moulay
  surname: Akhloufi
  fullname: Akhloufi, Moulay
  organization: PRIME Group, Department of Computer Sciences, Université de Moncton, Moncton, Canada
BookMark eNqFkMtKAzEUhoNUsFbfwEVeYGpykpmMLgQp3qDgRpcSMsmZMWWaKcm06M538A19EqetKxe6OTf4DvzfMRmFLiAhZ5xNOePF-WLqfFq1ZgoMYDiBkOyAjHmpIFOiVCMyZlzy7EKUcESOU1owxkAqGJOX2auJxvYYsxY32NJhq7ylPb71tMGA0fS-C7SO3ZIm3wTamtCsTYN04x12dJ18aCgG2zmMXx-fDncTXQ61PSGHtWkTnv70CXm-vXma3Wfzx7uH2fU8s4IVfebAFgxBVWDyOrdGMOdqUTEJKAuUAnkJuSsqxVjNwTqQtRUF5pVCcIpVYkLk_q-NXUoRa72Kfmniu-ZMbxXphd4r0ltFeq9owC5_Ydb3u7x9NL79D77awzgE23iMOlk_eEDnI9peu87__eAbOjWJUw
CitedBy_id crossref_primary_10_3389_frai_2025_1630743
crossref_primary_10_1016_j_displa_2023_102489
crossref_primary_10_1109_ACCESS_2024_3485131
crossref_primary_10_1007_s11227_024_05898_0
Cites_doi 10.3390/bdcc3010014
10.1016/j.patrec.2018.07.024
10.1109/TIP.2018.2855422
10.1016/j.ipm.2020.102302
10.1016/j.patrec.2018.09.022
10.1109/CVPR.2016.308
10.1109/CVPR.2016.90
10.1145/3308561.3353781
10.1007/s13369-020-04384-y
10.1007/s11042-021-11106-5
10.1109/TCSVT.2018.2867286
10.1109/ICCV.2015.510
10.1109/TIP.2019.2941267
10.1016/j.cviu.2019.102840
10.1109/TETCI.2019.2892755
10.4310/SII.2009.v2.n3.a8
10.1016/j.neucom.2019.08.042
10.1007/s11554-014-0423-0
10.1145/3432246
10.3115/1073083.1073135
10.24963/ijcai.2019/106
10.1007/s13369-021-06167-5
10.4018/IJGCMS.2020040103
10.1007/s11801-021-0100-z
ContentType Journal Article
Copyright 2022 Elsevier B.V.
Copyright_xml – notice: 2022 Elsevier B.V.
DBID AAYXX
CITATION
DOI 10.1016/j.displa.2022.102340
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1872-7387
ExternalDocumentID 10_1016_j_displa_2022_102340
S0141938222001585
GroupedDBID --K
--M
.DC
.~1
0R~
1B1
1~.
1~5
29G
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AABXZ
AACTN
AAEDT
AAEDW
AAEPC
AAFJI
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABFNM
ABIVO
ABJNI
ABMAC
ABMMH
ABXDB
ABXRA
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AEZYN
AFKWA
AFRZQ
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
AKYCK
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOMHK
AOUOD
ASPBG
AVARZ
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LG9
LY7
M41
MAGPM
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PRBVW
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SES
SET
SEW
SPC
SPCBC
SSB
SSM
SSO
SST
SSV
SSZ
T5K
TN5
WUQ
XPP
ZMT
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c306t-d2c60e27b2a5f5ca30ddf3b042e46e43e1825d6b700f12cd24fc36e5b7e2d70b3
ISICitedReferencesCount 4
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000912028900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0141-9382
IngestDate Sat Nov 29 07:23:21 EST 2025
Tue Nov 18 21:52:00 EST 2025
Fri Feb 23 02:38:27 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Video caption
Deep learning
Gated Recurrent Unit
Pose estimation
Arabic text
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c306t-d2c60e27b2a5f5ca30ddf3b042e46e43e1825d6b700f12cd24fc36e5b7e2d70b3
ORCID 0000-0002-5032-7445
ParticipantIDs crossref_primary_10_1016_j_displa_2022_102340
crossref_citationtrail_10_1016_j_displa_2022_102340
elsevier_sciencedirect_doi_10_1016_j_displa_2022_102340
PublicationCentury 2000
PublicationDate January 2023
2023-01-00
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – month: 01
  year: 2023
  text: January 2023
PublicationDecade 2020
PublicationTitle Displays
PublicationYear 2023
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Pan, Wang, Duan, Gan, Hong (b15) 2021; 1861,1
D. Guo, S. Tang, M. Wang, Connectionist Temporal Modeling of Video and Language: a Joint Model for Translation and Sign Labeling, in: IJCAI, 2019, pp. 751–757.
Hori, Hori, Marks, Hershey (b9) 2017
Zhao, Yang, Guo (b16) 2021; 17
Nabati, Behrad (b35) 2020; 57
Zhou, Lei, Long, Zhang, Hu, Zhang (b4) 2018
Wang, Gao, Han (b31) 2020; 130
Tang, Guo, Hong, Wang (b30) 2021
Mahadi, Arifianto, Ramadhani (b20) 2020
K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a method for automatic evaluation of machine translation, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002, pp. 311–318.
Biswas, Barz, Hartmann, Sonntag (b21) 2021
Pawade, Sakhapara, Shah, Wala, Tripathi, Shah (b26) 2019
Islam, Jahan, Das (b5) 2014
Boukdir, Benaddy, Ellahyani, Meslouhi, Kardouchi (b13) 2022; 47
Guo, Zhou, Li, Li, Wang (b29) 2019; 29
Mishra, Dhir, Saha, Bhattacharyya (b18) 2021; 20
Liu, Ren, Yuan (b27) 2020
Yang, Zhou, Ai, Bin, Hanjalic, Shen, Ji (b23) 2018; 27
Bodini (b6) 2019; 3
Liu, Hu, Li, Yu, Guan (b17) 2020
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
Hastie, Rosset, Zhu, Zou (b34) 2009; 2
Jin, Li, Zhang (b25) 2019; 370
D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, Learning spatiotemporal features with 3d convolutional networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 4489–4497.
Li, Tao, Li, Fu (b1) 2019; 3
Nabati, Behrad (b33) 2020; 190
Wu, Yao, Fu, Jiang (b14) 2017
Alsmadi (b3) 2020; 45
S. Kafle, P. Yeung, M. Huenerfauth, Evaluating the Benefit of Highlighting Key Words in Captions for People who are Deaf or Hard of Hearing, in: The 21st International ACM SIGACCESS Conference on Computers and Accessibility, 2019, pp. 43–55.
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.
Singh, Singh, Bandyopadhyay (b19) 2021; 80
Vinodhini, Sathiyabhama, Sankar, Somula (b32) 2020; 12
Xu, Liu, Wong, Zhang, Nie, Su, Kankanhalli (b24) 2018; 29
Plyer, Le Besnerais, Champagnat (b7) 2016; 11
Simonyan, Vedaldi, Zisserman (b12) 2013
Daskalakis, Tzelepi, Tefas (b22) 2018; 116
Wu (10.1016/j.displa.2022.102340_b14) 2017
Singh (10.1016/j.displa.2022.102340_b19) 2021; 80
10.1016/j.displa.2022.102340_b36
10.1016/j.displa.2022.102340_b11
Zhao (10.1016/j.displa.2022.102340_b16) 2021; 17
10.1016/j.displa.2022.102340_b2
10.1016/j.displa.2022.102340_b10
Nabati (10.1016/j.displa.2022.102340_b33) 2020; 190
Tang (10.1016/j.displa.2022.102340_b30) 2021
Vinodhini (10.1016/j.displa.2022.102340_b32) 2020; 12
Biswas (10.1016/j.displa.2022.102340_b21) 2021
Hastie (10.1016/j.displa.2022.102340_b34) 2009; 2
Daskalakis (10.1016/j.displa.2022.102340_b22) 2018; 116
Xu (10.1016/j.displa.2022.102340_b24) 2018; 29
Li (10.1016/j.displa.2022.102340_b1) 2019; 3
Jin (10.1016/j.displa.2022.102340_b25) 2019; 370
10.1016/j.displa.2022.102340_b28
Hori (10.1016/j.displa.2022.102340_b9) 2017
Alsmadi (10.1016/j.displa.2022.102340_b3) 2020; 45
Simonyan (10.1016/j.displa.2022.102340_b12) 2013
Nabati (10.1016/j.displa.2022.102340_b35) 2020; 57
Zhou (10.1016/j.displa.2022.102340_b4) 2018
Boukdir (10.1016/j.displa.2022.102340_b13) 2022; 47
Mishra (10.1016/j.displa.2022.102340_b18) 2021; 20
Yang (10.1016/j.displa.2022.102340_b23) 2018; 27
Liu (10.1016/j.displa.2022.102340_b17) 2020
Liu (10.1016/j.displa.2022.102340_b27) 2020
Guo (10.1016/j.displa.2022.102340_b29) 2019; 29
Pan (10.1016/j.displa.2022.102340_b15) 2021; 1861,1
10.1016/j.displa.2022.102340_b8
Islam (10.1016/j.displa.2022.102340_b5) 2014
Wang (10.1016/j.displa.2022.102340_b31) 2020; 130
Bodini (10.1016/j.displa.2022.102340_b6) 2019; 3
Pawade (10.1016/j.displa.2022.102340_b26) 2019
Plyer (10.1016/j.displa.2022.102340_b7) 2016; 11
Mahadi (10.1016/j.displa.2022.102340_b20) 2020
References_xml – volume: 3
  start-page: 297
  year: 2019
  end-page: 312
  ident: b1
  article-title: Visual to text: Survey of image and video captioning
  publication-title: IEEE Trans. Emerg. Top. Comput. Intell.
– start-page: 313
  year: 2019
  end-page: 322
  ident: b26
  article-title: Text caption generation based on lip movement of speaker in video using neural network
  publication-title: International Conference on Advances in Computing and Data Sciences
– start-page: 430
  year: 2017
  end-page: 436
  ident: b9
  article-title: Early and late integration of audio features for automatic video description
  publication-title: 2017 IEEE Automatic Speech Recognition and Understanding Workshop
– reference: K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
– start-page: 1
  year: 2020
  end-page: 6
  ident: b20
  article-title: Adaptive attention generation for Indonesian image captioning
  publication-title: 2020 8th International Conference on Information and Communication Technology
– start-page: 108062M
  year: 2018
  ident: b4
  article-title: A novel real-time video mosaic block detection based on intensity order and shape feature
  publication-title: Tenth International Conference on Digital Image Processing, vol. 10806
– volume: 1861,1
  year: 2021
  ident: b15
  article-title: Chinese image caption of Inceptionv4 and double-layer GRUs based on attention mechanism
  publication-title: J. Phys.: Conf. Ser.
– reference: K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, Bleu: a method for automatic evaluation of machine translation, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002, pp. 311–318.
– volume: 11
  start-page: 713
  year: 2016
  end-page: 730
  ident: b7
  article-title: Massively parallel lucas kanade optical flow for real-time video processing applications
  publication-title: J. Real-Time Image Process.
– volume: 370
  start-page: 118
  year: 2019
  end-page: 127
  ident: b25
  article-title: Recurrent convolutional video captioning with global and local attention
  publication-title: Neurocomputing
– volume: 116
  start-page: 143
  year: 2018
  end-page: 149
  ident: b22
  article-title: Learning deep spatiotemporal features for video captioning
  publication-title: Pattern Recognit. Lett.
– start-page: 197
  year: 2014
  end-page: 202
  ident: b5
  article-title: Color feature based video content extraction and its application for poster generation with relevance feedback
  publication-title: 16th Int’L Conf. Computer and Information Technology
– reference: D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, Learning spatiotemporal features with 3d convolutional networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 4489–4497.
– start-page: 3
  year: 2021
  end-page: 14
  ident: b21
  article-title: Improving german image captions using machine translation and transfer learning
  publication-title: International Conference on Statistical Language and Speech Processing
– volume: 57
  year: 2020
  ident: b35
  article-title: Multi-sentence video captioning using content-oriented beam searching and multi-stage refining algorithm
  publication-title: Inf. Process. Manage.
– year: 2020
  ident: b27
  article-title: Sibnet: Sibling convolutional encoder for video captioning
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– year: 2013
  ident: b12
  article-title: Deep inside convolutional networks: Visualising image classification models and saliency maps
– volume: 2
  start-page: 349
  year: 2009
  end-page: 360
  ident: b34
  article-title: Multi-class adaboost
  publication-title: Stat. Interface
– reference: D. Guo, S. Tang, M. Wang, Connectionist Temporal Modeling of Video and Language: a Joint Model for Translation and Sign Labeling, in: IJCAI, 2019, pp. 751–757.
– volume: 130
  start-page: 327
  year: 2020
  end-page: 334
  ident: b31
  article-title: Sequence in sequence for video captioning
  publication-title: Pattern Recognit. Lett.
– volume: 47
  start-page: 2187
  year: 2022
  end-page: 2199
  ident: b13
  article-title: Isolated video-based arabic sign language recognition using convolutional and recursive neural networks
  publication-title: Arab. J. Sci. Eng.
– year: 2020
  ident: b17
  article-title: Chinese image caption generation via visual attention and topic modeling
  publication-title: IEEE Trans. Cybern.
– volume: 190
  year: 2020
  ident: b33
  article-title: Video captioning using boosted and parallel long short-term memory networks
  publication-title: Comput. Vis. Image Underst.
– volume: 29
  start-page: 2482
  year: 2018
  end-page: 2493
  ident: b24
  article-title: Dual-stream recurrent neural network for video captioning
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– volume: 12
  start-page: 44
  year: 2020
  end-page: 56
  ident: b32
  article-title: A deep structured model for video captioning
  publication-title: Int. J. Gaming Comput.-Mediat. Simul. (IJGCMS)
– volume: 20
  start-page: 1
  year: 2021
  end-page: 19
  ident: b18
  article-title: A Hindi image caption generation framework using deep learning
  publication-title: Trans. Asian Low-Resour. Lang. Inf. Process.
– year: 2021
  ident: b30
  article-title: Graph-based multimodal sequential embedding for sign language translation
  publication-title: IEEE Trans. Multimed.
– start-page: 3
  year: 2017
  end-page: 29
  ident: b14
  article-title: Deep learning for video classification and captioning
  publication-title: Frontiers of Multimedia Research
– reference: S. Kafle, P. Yeung, M. Huenerfauth, Evaluating the Benefit of Highlighting Key Words in Captions for People who are Deaf or Hard of Hearing, in: The 21st International ACM SIGACCESS Conference on Computers and Accessibility, 2019, pp. 43–55.
– volume: 17
  start-page: 361
  year: 2021
  end-page: 366
  ident: b16
  article-title: A lightweight convolutional neural network for large-scale Chinese image caption
  publication-title: Optoelectron. Lett.
– volume: 3
  start-page: 14
  year: 2019
  ident: b6
  article-title: A review of facial landmark extraction in 2d images and videos using deep learning
  publication-title: Big Data Cogn. Comput.
– volume: 80
  start-page: 35721
  year: 2021
  end-page: 35740
  ident: b19
  article-title: An encoder-decoder based framework for hindi image caption generation
  publication-title: Multimedia Tools Appl.
– volume: 29
  start-page: 1575
  year: 2019
  end-page: 1590
  ident: b29
  article-title: Hierarchical recurrent deep fusion using adaptive clip summarization for sign language translation
  publication-title: IEEE Trans. Image Process.
– reference: C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.
– volume: 27
  start-page: 5600
  year: 2018
  end-page: 5611
  ident: b23
  article-title: Video captioning by adversarial LSTM
  publication-title: IEEE Trans. Image Process.
– volume: 45
  start-page: 3317
  year: 2020
  end-page: 3330
  ident: b3
  article-title: Content-based image retrieval using color, shape and texture descriptors and features
  publication-title: Arab. J. Sci. Eng.
– volume: 3
  start-page: 14
  issue: 1
  year: 2019
  ident: 10.1016/j.displa.2022.102340_b6
  article-title: A review of facial landmark extraction in 2d images and videos using deep learning
  publication-title: Big Data Cogn. Comput.
  doi: 10.3390/bdcc3010014
– volume: 130
  start-page: 327
  year: 2020
  ident: 10.1016/j.displa.2022.102340_b31
  article-title: Sequence in sequence for video captioning
  publication-title: Pattern Recognit. Lett.
  doi: 10.1016/j.patrec.2018.07.024
– volume: 27
  start-page: 5600
  issue: 11
  year: 2018
  ident: 10.1016/j.displa.2022.102340_b23
  article-title: Video captioning by adversarial LSTM
  publication-title: IEEE Trans. Image Process.
  doi: 10.1109/TIP.2018.2855422
– start-page: 430
  year: 2017
  ident: 10.1016/j.displa.2022.102340_b9
  article-title: Early and late integration of audio features for automatic video description
– start-page: 3
  year: 2017
  ident: 10.1016/j.displa.2022.102340_b14
  article-title: Deep learning for video classification and captioning
– start-page: 197
  year: 2014
  ident: 10.1016/j.displa.2022.102340_b5
  article-title: Color feature based video content extraction and its application for poster generation with relevance feedback
– volume: 57
  issue: 6
  year: 2020
  ident: 10.1016/j.displa.2022.102340_b35
  article-title: Multi-sentence video captioning using content-oriented beam searching and multi-stage refining algorithm
  publication-title: Inf. Process. Manage.
  doi: 10.1016/j.ipm.2020.102302
– volume: 116
  start-page: 143
  year: 2018
  ident: 10.1016/j.displa.2022.102340_b22
  article-title: Learning deep spatiotemporal features for video captioning
  publication-title: Pattern Recognit. Lett.
  doi: 10.1016/j.patrec.2018.09.022
– start-page: 313
  year: 2019
  ident: 10.1016/j.displa.2022.102340_b26
  article-title: Text caption generation based on lip movement of speaker in video using neural network
– ident: 10.1016/j.displa.2022.102340_b11
  doi: 10.1109/CVPR.2016.308
– year: 2021
  ident: 10.1016/j.displa.2022.102340_b30
  article-title: Graph-based multimodal sequential embedding for sign language translation
  publication-title: IEEE Trans. Multimed.
– ident: 10.1016/j.displa.2022.102340_b10
  doi: 10.1109/CVPR.2016.90
– ident: 10.1016/j.displa.2022.102340_b2
  doi: 10.1145/3308561.3353781
– volume: 45
  start-page: 3317
  issue: 4
  year: 2020
  ident: 10.1016/j.displa.2022.102340_b3
  article-title: Content-based image retrieval using color, shape and texture descriptors and features
  publication-title: Arab. J. Sci. Eng.
  doi: 10.1007/s13369-020-04384-y
– start-page: 108062M
  year: 2018
  ident: 10.1016/j.displa.2022.102340_b4
  article-title: A novel real-time video mosaic block detection based on intensity order and shape feature
– volume: 80
  start-page: 35721
  issue: 28
  year: 2021
  ident: 10.1016/j.displa.2022.102340_b19
  article-title: An encoder-decoder based framework for hindi image caption generation
  publication-title: Multimedia Tools Appl.
  doi: 10.1007/s11042-021-11106-5
– volume: 29
  start-page: 2482
  issue: 8
  year: 2018
  ident: 10.1016/j.displa.2022.102340_b24
  article-title: Dual-stream recurrent neural network for video captioning
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2018.2867286
– ident: 10.1016/j.displa.2022.102340_b8
  doi: 10.1109/ICCV.2015.510
– volume: 29
  start-page: 1575
  year: 2019
  ident: 10.1016/j.displa.2022.102340_b29
  article-title: Hierarchical recurrent deep fusion using adaptive clip summarization for sign language translation
  publication-title: IEEE Trans. Image Process.
  doi: 10.1109/TIP.2019.2941267
– year: 2020
  ident: 10.1016/j.displa.2022.102340_b27
  article-title: Sibnet: Sibling convolutional encoder for video captioning
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– volume: 190
  year: 2020
  ident: 10.1016/j.displa.2022.102340_b33
  article-title: Video captioning using boosted and parallel long short-term memory networks
  publication-title: Comput. Vis. Image Underst.
  doi: 10.1016/j.cviu.2019.102840
– year: 2020
  ident: 10.1016/j.displa.2022.102340_b17
  article-title: Chinese image caption generation via visual attention and topic modeling
  publication-title: IEEE Trans. Cybern.
– volume: 3
  start-page: 297
  issue: 4
  year: 2019
  ident: 10.1016/j.displa.2022.102340_b1
  article-title: Visual to text: Survey of image and video captioning
  publication-title: IEEE Trans. Emerg. Top. Comput. Intell.
  doi: 10.1109/TETCI.2019.2892755
– volume: 2
  start-page: 349
  issue: 3
  year: 2009
  ident: 10.1016/j.displa.2022.102340_b34
  article-title: Multi-class adaboost
  publication-title: Stat. Interface
  doi: 10.4310/SII.2009.v2.n3.a8
– volume: 370
  start-page: 118
  year: 2019
  ident: 10.1016/j.displa.2022.102340_b25
  article-title: Recurrent convolutional video captioning with global and local attention
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2019.08.042
– volume: 11
  start-page: 713
  issue: 4
  year: 2016
  ident: 10.1016/j.displa.2022.102340_b7
  article-title: Massively parallel lucas kanade optical flow for real-time video processing applications
  publication-title: J. Real-Time Image Process.
  doi: 10.1007/s11554-014-0423-0
– volume: 1861,1
  year: 2021
  ident: 10.1016/j.displa.2022.102340_b15
  article-title: Chinese image caption of Inceptionv4 and double-layer GRUs based on attention mechanism
– volume: 20
  start-page: 1
  issue: 2
  year: 2021
  ident: 10.1016/j.displa.2022.102340_b18
  article-title: A Hindi image caption generation framework using deep learning
  publication-title: Trans. Asian Low-Resour. Lang. Inf. Process.
  doi: 10.1145/3432246
– ident: 10.1016/j.displa.2022.102340_b36
  doi: 10.3115/1073083.1073135
– start-page: 1
  year: 2020
  ident: 10.1016/j.displa.2022.102340_b20
  article-title: Adaptive attention generation for Indonesian image captioning
– year: 2013
  ident: 10.1016/j.displa.2022.102340_b12
– ident: 10.1016/j.displa.2022.102340_b28
  doi: 10.24963/ijcai.2019/106
– volume: 47
  start-page: 2187
  issue: 2
  year: 2022
  ident: 10.1016/j.displa.2022.102340_b13
  article-title: Isolated video-based arabic sign language recognition using convolutional and recursive neural networks
  publication-title: Arab. J. Sci. Eng.
  doi: 10.1007/s13369-021-06167-5
– start-page: 3
  year: 2021
  ident: 10.1016/j.displa.2022.102340_b21
  article-title: Improving german image captions using machine translation and transfer learning
– volume: 12
  start-page: 44
  issue: 2
  year: 2020
  ident: 10.1016/j.displa.2022.102340_b32
  article-title: A deep structured model for video captioning
  publication-title: Int. J. Gaming Comput.-Mediat. Simul. (IJGCMS)
  doi: 10.4018/IJGCMS.2020040103
– volume: 17
  start-page: 361
  issue: 6
  year: 2021
  ident: 10.1016/j.displa.2022.102340_b16
  article-title: A lightweight convolutional neural network for large-scale Chinese image caption
  publication-title: Optoelectron. Lett.
  doi: 10.1007/s11801-021-0100-z
SSID ssj0002472
Score 2.3586574
Snippet Video to text conversion is a vital activity in the field of computer vision. In recent years, deep learning algorithms have dominated automatic text...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 102340
SubjectTerms Arabic text
Deep learning
Gated Recurrent Unit
Pose estimation
Video caption
Title Character-level arabic text generation from sign language video using encoder–decoder model
URI https://dx.doi.org/10.1016/j.displa.2022.102340
Volume 76
WOSCitedRecordID wos000912028900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1872-7387
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002472
  issn: 0141-9382
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lj9MwELZKlwMcEE-xvOQDt8irxE7i9LgsRTzEwmGRekGRHdtql2xSbdvVcoL_wD_klzB-JK1axEviUjVR3bSeT-NvRjPfIPQ0l5TCMZQTYMOGpNwUpDBVSgyQByW4EUkh3bAJfnxcTCaj94PBl64X5qLmTVNcXo7m_9XUcA-MbVtn_8Lc_ZfCDXgPRodXMDu8_pHhjzoJZlLbgqAIruSsimyFh52XrIPJXV-Jrd7oU5aRbclro5XLHlh9S6stEmohmNLu2k_O2WS0z2eLeS0-r7Pt7eoTnJPO50groQXsfN1k_Uw34OqcYd-2U3EWWqucAvCiblduxHD0bjk9E8B-x335xxsAcmvntriVtulrPhWbGQvKtjIWu600IbOZkBHzk4gOtPfGBQf6z8KJHNy1Hxez4_l9EuL0QLl_DYE_pU6WwotBbWlq25K2xD6M2pIyiJiuoD0KkVM8RHuHr8aT1_1hTlM3_6v_dV33pSsR3H3Wz9nNBmM5uYluhFADH3qI3EID3dxG1zcEKO-gj1tgwR4s2IIFr8GCLViwBQvuwIIdWLADCw5g-f71W4AJdjC5iz68GJ8cvSRh4AapIHJcEkWrPNaUSyoyk1WCxUoZJsGv6zTXKdMQjGYqlzyOTUIrRVNTsVxnkmuqeCzZPTRs2kbfR1gYCUzQmEy7rdaFFGkSjwxXRuospfuIdRtVVkGN3g5Fqcuu7PC09Ntb2u0t_fbuI9Kvmns1lt98nnc2KAOj9EyxBNj8cuWDf175EF1bg_4RGi7PV_oxulpdLGeL8ycBXz8AUzWfiw
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Character-level+arabic+text+generation+from+sign+language+video+using+encoder%E2%80%93decoder+model&rft.jtitle=Displays&rft.au=Boukdir%2C+Abdelbasset&rft.au=Benaddy%2C+Mohamed&rft.au=Meslouhi%2C+Othmane+El&rft.au=Kardouchi%2C+Mustapha&rft.date=2023-01-01&rft.pub=Elsevier+B.V&rft.issn=0141-9382&rft.eissn=1872-7387&rft.volume=76&rft_id=info:doi/10.1016%2Fj.displa.2022.102340&rft.externalDocID=S0141938222001585
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0141-9382&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0141-9382&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0141-9382&client=summon