Contribution of recurrent connectionist language models in improving LSTM-based Arabic text recognition in videos

Unconstrained text recognition in videos is a very challenging task that begins to draw the attention of the OCR community. However, for Arabic video contents, this problem is much less addressed compared at least with Latin script. This work presents our latest contribution to this task, introducin...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Pattern recognition Ročník 64; s. 245 - 254
Hlavní autoři: Yousfi, Sonia, Berrani, Sid-Ahmed, Garcia, Christophe
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.04.2017
Elsevier
Témata:
ISSN:0031-3203, 1873-5142
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Unconstrained text recognition in videos is a very challenging task that begins to draw the attention of the OCR community. However, for Arabic video contents, this problem is much less addressed compared at least with Latin script. This work presents our latest contribution to this task, introducing recurrent connectionist language modeling in order to improve Long-Short Term Memory (LSTM) based Arabic text recognition in videos. For a LSTM OCR system that basically yields high recognition rates, introducing proper language models can easily deteriorate results. In this work, we focus on two main factors to reach better improvements. First, we propose to use Recurrent Neural Network (RNN) for language modeling that are able to capture long range linguistic dependencies. We use simple RNN models and models that are learned jointly with a Maximum Entropy language model. Second, for the decoding schema, we are not limited to a n-best rescoring of the OCR hypotheses. Instead, we propose a modified beam search algorithm that uses both OCR and language model probabilities in parallel at each decoding time-step. We introduce a set of hyper-parameters to the algorithm in order to boost recognition results and to control the decoding time. The method is used for Arabic text recognition in TV Broadcast. We conduct an extensive evaluation of the method and study the impact of the language models and the decoding parameters. Results show an improvement of 16% in terms of word recognition rate (WRR) over the baseline that uses only the OCR responses, while keeping a reasonable response time. Moreover, the proposed recurrent connectionist models outperform frequency-based models by more than 4% in terms of WRR. The final recognition schema provides outstanding results that outperform well-known commercial OCR engine by more than 36% in terms of WRR. •Different recurrent connectionist language models to improve LSTM-based Arabic text recognition in videos.•Efficient joint decoding paradigm using language model and LSTM responses.•Additional decoding hyper-parameters, extensively evaluated, that improve recognition results and optimize running time.•Significant recognition improvement by integrating connectionist language models that outperform n-grams contribution.•Final Arabic OCR system that significantly outperforms commercial OCR engine.
AbstractList Unconstrained text recognition in videos is a very challenging task that begins to draw the attention of the OCR community. However, for Arabic video contents, this problem is much less addressed compared at least with Latin script. This work presents our latest contribution to this task, introducing recurrent connectionist language modeling in order to improve Long-Short Term Memory (LSTM) based Arabic text recognition in videos. For a LSTM OCR system that basically yields high recognition rates, introducing proper language models can easily deteriorate results. In this work, we focus on two main factors to reach better improvements. First, we propose to use Recurrent Neural Network (RNN) for language modeling that are able to capture long range linguistic dependencies. We use simple RNN models and models that are learned jointly with a Maximum Entropy language model. Second, for the decoding schema, we are not limited to a n-best rescoring of the OCR hypotheses. Instead, we propose a modified beam search algorithm that uses both OCR and language model probabilities in parallel at each decoding time-step. We introduce a set of hyper-parameters to the algorithm in order to boost recognition results and to control the decoding time. The method is used for Arabic text recognition in TV Broadcast. We conduct an extensive evaluation of the method and study the impact of the language models and the decoding parameters. Results show an improvement of 16% in terms of word recognition rate (WRR) over the baseline that uses only the OCR responses, while keeping a reasonable response time. Moreover, the proposed recurrent connectionist models outperform frequency-based models by more than 4% in terms of WRR. The final recognition schema provides outstanding results that outperform well-known commercial OCR engine by more than 36% in terms of WRR.
Unconstrained text recognition in videos is a very challenging task that begins to draw the attention of the OCR community. However, for Arabic video contents, this problem is much less addressed compared at least with Latin script. This work presents our latest contribution to this task, introducing recurrent connectionist language modeling in order to improve Long-Short Term Memory (LSTM) based Arabic text recognition in videos. For a LSTM OCR system that basically yields high recognition rates, introducing proper language models can easily deteriorate results. In this work, we focus on two main factors to reach better improvements. First, we propose to use Recurrent Neural Network (RNN) for language modeling that are able to capture long range linguistic dependencies. We use simple RNN models and models that are learned jointly with a Maximum Entropy language model. Second, for the decoding schema, we are not limited to a n-best rescoring of the OCR hypotheses. Instead, we propose a modified beam search algorithm that uses both OCR and language model probabilities in parallel at each decoding time-step. We introduce a set of hyper-parameters to the algorithm in order to boost recognition results and to control the decoding time. The method is used for Arabic text recognition in TV Broadcast. We conduct an extensive evaluation of the method and study the impact of the language models and the decoding parameters. Results show an improvement of 16% in terms of word recognition rate (WRR) over the baseline that uses only the OCR responses, while keeping a reasonable response time. Moreover, the proposed recurrent connectionist models outperform frequency-based models by more than 4% in terms of WRR. The final recognition schema provides outstanding results that outperform well-known commercial OCR engine by more than 36% in terms of WRR. •Different recurrent connectionist language models to improve LSTM-based Arabic text recognition in videos.•Efficient joint decoding paradigm using language model and LSTM responses.•Additional decoding hyper-parameters, extensively evaluated, that improve recognition results and optimize running time.•Significant recognition improvement by integrating connectionist language models that outperform n-grams contribution.•Final Arabic OCR system that significantly outperforms commercial OCR engine.
Author Yousfi, Sonia
Garcia, Christophe
Berrani, Sid-Ahmed
Author_xml – sequence: 1
  givenname: Sonia
  surname: Yousfi
  fullname: Yousfi, Sonia
  email: sonia.yousfi@orange.com
  organization: Orange Labs-France Telecom, 35510 Cesson-Sévigné, France
– sequence: 2
  givenname: Sid-Ahmed
  surname: Berrani
  fullname: Berrani, Sid-Ahmed
  email: sidahmed.berrani@orange.com
  organization: Orange Labs-France Telecom, 35510 Cesson-Sévigné, France
– sequence: 3
  givenname: Christophe
  surname: Garcia
  fullname: Garcia, Christophe
  email: christophe.garcia@liris.cnrs.fr
  organization: University of Lyon, INSA-Lyon, LIRIS, UMR5205 CNRS, 69621 Villeurbanne, France
BackLink https://hal.science/hal-01413629$$DView record in HAL
BookMark eNqFkM1uEzEUhS1UJNLCG7DwlsVMfceePxaVooj-SEEsKGvLvvYMjiZ2sJ2ofXtmCGy6KCvL1-d81_ouyYUP3hLyEVgJDJrrXXlQGcNYVvOtBCgZwBuygq7lRQ2iuiArxjgUvGL8HblMaccYtPPDivzaBJ-j08fsgqdhoNHiMUbrM8XgvcVl7lKmk_LjUY2W7oOxU6LOU7c_xHByfqTb749fC62SNXQdlXZIs33KCyuM3v1Bz_mTMzak9-TtoKZkP_w9r8iP2y-Pm_ti--3uYbPeFsgFy8XQaivQCFBoeq16ZnQvmDFQdXVnkDWqYhXXNda6bWCouehqbBm2Dfa6E5ZfkU9n7k81yUN0exWfZVBO3q-3cpkxEMCbqj_BnP18zmIMKUU7SHRZLf_OUblJApOLaLmTZ9FyES0BZsZSFi_K_7b9p3Zzrs027cnZKBM669EaN3vL0gT3OuA36seeBQ
CitedBy_id crossref_primary_10_1109_ACCESS_2020_3012542
crossref_primary_10_32604_cmc_2023_034669
crossref_primary_10_1109_ACCESS_2022_3144844
crossref_primary_10_1007_s00477_025_02921_5
crossref_primary_10_3390_e21101025
crossref_primary_10_1109_ACCESS_2020_3001605
crossref_primary_10_3390_app14125021
crossref_primary_10_1007_s11042_019_07855_z
crossref_primary_10_1016_j_energy_2018_05_052
crossref_primary_10_1016_j_patcog_2020_107800
crossref_primary_10_1155_2022_9407999
crossref_primary_10_3390_fi12090156
crossref_primary_10_1109_ACCESS_2021_3053618
crossref_primary_10_1016_j_apenergy_2021_118296
crossref_primary_10_1007_s11042_022_12039_3
crossref_primary_10_1016_j_jconhyd_2023_104287
crossref_primary_10_1109_TCSVT_2018_2817642
crossref_primary_10_3390_math8040565
crossref_primary_10_1007_s10462_020_09838_1
crossref_primary_10_3390_fi10120123
crossref_primary_10_1016_j_patcog_2018_02_014
crossref_primary_10_1109_ACCESS_2021_3100717
crossref_primary_10_1007_s00521_021_05727_y
crossref_primary_10_3390_w16213125
crossref_primary_10_1145_3532609
crossref_primary_10_1016_j_patcog_2020_107791
crossref_primary_10_1109_TII_2021_3065930
crossref_primary_10_1016_j_jocs_2018_03_010
crossref_primary_10_1016_j_scs_2019_102000
crossref_primary_10_1016_j_patcog_2020_107636
crossref_primary_10_1016_j_patcog_2020_107412
crossref_primary_10_1016_j_scitotenv_2019_07_246
crossref_primary_10_1016_j_jhydrol_2022_127901
crossref_primary_10_1016_j_jocs_2017_11_011
crossref_primary_10_1371_journal_pone_0294460
crossref_primary_10_1109_ACCESS_2021_3053289
crossref_primary_10_1007_s00500_024_09930_6
crossref_primary_10_3390_app12010181
Cites_doi 10.1109/ICASSP.2011.5947611
10.1145/2505377.2505394
10.1109/DAS.2012.45
10.1016/j.patcog.2012.07.012
10.1109/ICDAR.2013.140
10.1109/ICDAR.2015.7333958
10.1016/j.patcog.2013.10.020
10.1016/j.patcog.2013.09.015
10.1109/ASRU.2009.5373380
10.1016/j.patcog.2008.05.012
10.1109/TPAMI.2008.137
10.21437/ICSLP.2002-303
10.1207/s15516709cog1402_1
10.1016/j.patcog.2014.04.025
10.3115/112405.112464
10.1109/ICIP.2014.7025612
10.1109/ACPR.2013.60
10.1007/s10032-013-0202-7
10.21437/Interspeech.2011-242
10.1109/ICDAR.2015.7333917
10.1109/CVPR.2012.6247990
10.1016/j.csl.2006.01.003
10.1109/ICASSP.2002.5743830
10.1117/12.783598
10.1109/ICCV.2013.102
ContentType Journal Article
Copyright 2016 Elsevier Ltd
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: 2016 Elsevier Ltd
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
1XC
DOI 10.1016/j.patcog.2016.11.011
DatabaseName CrossRef
Hyper Article en Ligne (HAL)
DatabaseTitle CrossRef
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1873-5142
EndPage 254
ExternalDocumentID oai:HAL:hal-01413629v1
10_1016_j_patcog_2016_11_011
S0031320316303697
GroupedDBID --K
--M
-D8
-DT
-~X
.DC
.~1
0R~
123
1B1
1RT
1~.
1~5
29O
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABEFU
ABFNM
ABFRF
ABHFT
ABJNI
ABMAC
ABTAH
ABXDB
ABYKQ
ACBEA
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADMXK
ADTZH
AEBSH
AECPX
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FD6
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
G8K
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
KOM
KZ1
LG9
LMP
LY1
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
RNS
ROL
RPZ
SBC
SDF
SDG
SDP
SDS
SES
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TN5
UNMZH
VOH
WUQ
XJE
XPP
ZMT
ZY4
~G-
9DU
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
1XC
ID FETCH-LOGICAL-c340t-f7be4cd41acd9ba90db940dd12858dc06a2023b5c5b761f53485c70c76c9b84e3
ISICitedReferencesCount 48
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000392682400020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0031-3203
IngestDate Sat Nov 29 15:08:44 EST 2025
Sat Nov 29 03:52:19 EST 2025
Tue Nov 18 21:59:22 EST 2025
Fri Feb 23 02:25:26 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Convolutional Neural Network
Arabic video OCR
Connectionist language model
RNN
LSTM
Decoding
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c340t-f7be4cd41acd9ba90db940dd12858dc06a2023b5c5b761f53485c70c76c9b84e3
ORCID 0000-0001-7997-9837
PageCount 10
ParticipantIDs hal_primary_oai_HAL_hal_01413629v1
crossref_citationtrail_10_1016_j_patcog_2016_11_011
crossref_primary_10_1016_j_patcog_2016_11_011
elsevier_sciencedirect_doi_10_1016_j_patcog_2016_11_011
PublicationCentury 2000
PublicationDate April 2017
2017-04-00
2017-04
PublicationDateYYYYMMDD 2017-04-01
PublicationDate_xml – month: 04
  year: 2017
  text: April 2017
PublicationDecade 2010
PublicationTitle Pattern recognition
PublicationYear 2017
Publisher Elsevier Ltd
Elsevier
Publisher_xml – name: Elsevier Ltd
– name: Elsevier
References A. Stolcke, et al., SRILM-an extensible language modeling toolkit, in: INTERSPEECH, Denver, Colorado, USA, 2002, pp. 901–904.
N. Tomeh, N. Habash, R. Roth, N. Farra, P. Dasigi, M. Diab, Reranking with linguistic and semantic features for Arabic optical character recognition in: Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 2013, pp. 549–555.
F. Jelinek, B. Merialdo, S. Roukos, M. Strauss, A dynamic language model for speech recognition, in: DARPA Workshop on Speech and Natural Language, Pacific Grove, California, USA, 1991, pp. 293–295.
S. Kombrink, T. Mikolov, M. Karafiát, L. Burget, Recurrent neural network based language modeling in meeting recognition, in: INTERSPEECH, Florence, Italy, 2011, pp. 2877–2880.
T. Mikolov, M. Karafiát, L. Burget, J. Cernocky`, S. Khudanpur, Recurrent neural network based language model, in: INTERSPEECH, Makuhari, Japan, 2010, pp. 1045–1048.
S. Roy, P.P. Roy, P. Shivakumara, G. Louloudis, C.L. Tan, U. Pal, HMM-based multi oriented text recognition in natural scene image, in: Asian Conference on Pattern Recognition, Naha, Japan, 2013, pp. 288–292.
Abbas, Smaïli, Berkani (bib32) 2011; 9
T. Mikolov, Statistical language models based on neural networks (Ph.D. thesis), Brno University of Technology, 2012.
T.M. Breuel, The OCRopus open source OCR system, in: SPIE Proceedings, 2008, p. 68150F.
A. Bissacco, M. Cummins, Y. Netzer, H. Neven, Photoocr: reading text in uncontrolled conditions, in: International Conference on Computer Vision, Sidney, Australia, 2013, pp. 785–792.
M.K. Saad, W. Ashour, Arabic morphological tools for text mining, in: International Symposium on Electrical and Electronics Engineering and Computer Science, Lefke, North Cyprus, 2010, pp. 112–117.
S. Yousfi, S.-A. Berrani, C. Garcia, Deep learning and recurrent connectionist-based approaches for Arabic text recognition in videos, in: International Conference on Document Analysis and Recognition, Nancy, France, 2015, pp. 1026–1030.
T.M. Breuel, A. Ul-Hasan, M.A. Al-Azawi, F. Shafait, High-performance OCR for printed English and Fraktur using LSTM networks, in: International Conference on Document Analysis and Recognition, Washington, DC, USA, 2013, pp. 683–687.
S.F. Chen, L. Mangu, B. Ramabhadran, R. Sarikaya, A. Sethy, Scaling shrinkage-based language models, in: IEEE Workshop on Automatic Speech Recognition & Understanding, Merano, Italy, 2009, pp. 299–304.
Parvez, Mahmoud (bib1) 2013; 46
A. Graves, Generating sequences with recurrent neural networks, arXiv preprint
A. Graves, N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, in: International Conference on Machine Learning, Beijing, China, 2014, pp. 1764–1772.
S. Yousfi, S.-A. Berrani, C. Garcia, Arabic text detection in videos using neural and boosting-based approaches: application to video indexing, in: International Conference on Image Processing, Paris, France, 2014, pp. 3028–3032.
A. Mishra, K. Alahari, C. Jawahar, Top-down and bottom-up cues for scene text recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 2012, pp. 785–792.
H. Schwenk, J.-L. Gauvain, Connectionist language modeling for large vocabulary continuous speech recognition, in: ICASSP, Orlando, Florida, USA, 2002, pp. 765–768.
Bengio, Ducharme, Vincent, Jauvin (bib20) 2003; 3
Chherawala, Cheriet (bib2) 2014; 47
Su, Zhang, Guan, Huang (bib30) 2009; 42
A. Ul-Hasan, T.M. Breuel, Can we build language-independent OCR using LSTM networks?, in: International Workshop on Multilingual OCR, Washington, D.C., USA, 2013, pp. 9:1–9:5.
Ben Halima, Karray, Alimi, Vila (bib3) 2012; 3
S. Yousfi, S.-A. Berrani, C. Garcia, Alif: a dataset for Arabic embedded text recognition in TV broadcast, in: International Conference on Document Analysis and Recognition, Nancy, France, 2015, pp. 1221–1225.
.
Elagouni, Garcia, Mamalet, Sébillot (bib13) 2014; 17
Xu, Jelinek (bib17) 2007; 21
Zamora-Martínez, Frinken, Espana-Boquera, Castro-Bleda, Fischer, Bunke (bib22) 2014; 47
D.-S. Lee, R. Smith, Improving book OCR by adaptive language and image models, in: International Workshop on Document Analysis Systems, Gold Coast, Queenslands, Australia, 2012, pp. 115–119.
Wang, Yin, Liu (bib12) 2014; 47
Elman (bib19) 1990; 14
Graves, Liwicki, Fernandez, Bertolami, Bunke, Schmidhuber (bib29) 2009; 31
T. Mikolov, A. Deoras, S. Kombrink, L. Burget, J. Cernocky`, Empirical evaluation and combination of advanced language modeling techniques, in: INTERSPEECH, Florence, Italy, 2011, pp. 605–608.
Ben Halima (10.1016/j.patcog.2016.11.011_bib3) 2012; 3
10.1016/j.patcog.2016.11.011_bib18
10.1016/j.patcog.2016.11.011_bib16
10.1016/j.patcog.2016.11.011_bib14
Xu (10.1016/j.patcog.2016.11.011_bib17) 2007; 21
10.1016/j.patcog.2016.11.011_bib15
10.1016/j.patcog.2016.11.011_bib34
10.1016/j.patcog.2016.11.011_bib10
10.1016/j.patcog.2016.11.011_bib11
10.1016/j.patcog.2016.11.011_bib33
Graves (10.1016/j.patcog.2016.11.011_bib29) 2009; 31
Abbas (10.1016/j.patcog.2016.11.011_bib32) 2011; 9
Wang (10.1016/j.patcog.2016.11.011_bib12) 2014; 47
10.1016/j.patcog.2016.11.011_bib31
Zamora-Martínez (10.1016/j.patcog.2016.11.011_bib22) 2014; 47
Elman (10.1016/j.patcog.2016.11.011_bib19) 1990; 14
Chherawala (10.1016/j.patcog.2016.11.011_bib2) 2014; 47
Su (10.1016/j.patcog.2016.11.011_bib30) 2009; 42
Elagouni (10.1016/j.patcog.2016.11.011_bib13) 2014; 17
10.1016/j.patcog.2016.11.011_bib27
10.1016/j.patcog.2016.11.011_bib28
10.1016/j.patcog.2016.11.011_bib8
10.1016/j.patcog.2016.11.011_bib25
10.1016/j.patcog.2016.11.011_bib7
10.1016/j.patcog.2016.11.011_bib26
10.1016/j.patcog.2016.11.011_bib23
10.1016/j.patcog.2016.11.011_bib9
10.1016/j.patcog.2016.11.011_bib24
10.1016/j.patcog.2016.11.011_bib4
10.1016/j.patcog.2016.11.011_bib21
10.1016/j.patcog.2016.11.011_bib6
10.1016/j.patcog.2016.11.011_bib5
Parvez (10.1016/j.patcog.2016.11.011_bib1) 2013; 46
Bengio (10.1016/j.patcog.2016.11.011_bib20) 2003; 3
References_xml – reference: A. Ul-Hasan, T.M. Breuel, Can we build language-independent OCR using LSTM networks?, in: International Workshop on Multilingual OCR, Washington, D.C., USA, 2013, pp. 9:1–9:5.
– reference: T. Mikolov, Statistical language models based on neural networks (Ph.D. thesis), Brno University of Technology, 2012.
– reference: S.F. Chen, L. Mangu, B. Ramabhadran, R. Sarikaya, A. Sethy, Scaling shrinkage-based language models, in: IEEE Workshop on Automatic Speech Recognition & Understanding, Merano, Italy, 2009, pp. 299–304.
– reference: F. Jelinek, B. Merialdo, S. Roukos, M. Strauss, A dynamic language model for speech recognition, in: DARPA Workshop on Speech and Natural Language, Pacific Grove, California, USA, 1991, pp. 293–295.
– reference: T.M. Breuel, The OCRopus open source OCR system, in: SPIE Proceedings, 2008, p. 68150F.
– reference: H. Schwenk, J.-L. Gauvain, Connectionist language modeling for large vocabulary continuous speech recognition, in: ICASSP, Orlando, Florida, USA, 2002, pp. 765–768.
– reference: A. Graves, Generating sequences with recurrent neural networks, arXiv preprint
– reference: T. Mikolov, M. Karafiát, L. Burget, J. Cernocky`, S. Khudanpur, Recurrent neural network based language model, in: INTERSPEECH, Makuhari, Japan, 2010, pp. 1045–1048.
– volume: 17
  start-page: 19
  year: 2014
  end-page: 31
  ident: bib13
  article-title: Text recognition in multimedia documents
  publication-title: Int. J. Doc. Anal. Recognit.
– reference: T. Mikolov, A. Deoras, S. Kombrink, L. Burget, J. Cernocky`, Empirical evaluation and combination of advanced language modeling techniques, in: INTERSPEECH, Florence, Italy, 2011, pp. 605–608.
– reference: S. Yousfi, S.-A. Berrani, C. Garcia, Alif: a dataset for Arabic embedded text recognition in TV broadcast, in: International Conference on Document Analysis and Recognition, Nancy, France, 2015, pp. 1221–1225.
– reference: D.-S. Lee, R. Smith, Improving book OCR by adaptive language and image models, in: International Workshop on Document Analysis Systems, Gold Coast, Queenslands, Australia, 2012, pp. 115–119.
– reference: M.K. Saad, W. Ashour, Arabic morphological tools for text mining, in: International Symposium on Electrical and Electronics Engineering and Computer Science, Lefke, North Cyprus, 2010, pp. 112–117.
– reference: N. Tomeh, N. Habash, R. Roth, N. Farra, P. Dasigi, M. Diab, Reranking with linguistic and semantic features for Arabic optical character recognition in: Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 2013, pp. 549–555.
– volume: 47
  start-page: 1202
  year: 2014
  end-page: 1216
  ident: bib12
  article-title: Unsupervised language model adaptation for handwritten Chinese text recognition
  publication-title: Pattern Recognit.
– volume: 14
  start-page: 179
  year: 1990
  end-page: 211
  ident: bib19
  article-title: Finding structure in time
  publication-title: Cogn. Sci.
– reference: S. Yousfi, S.-A. Berrani, C. Garcia, Arabic text detection in videos using neural and boosting-based approaches: application to video indexing, in: International Conference on Image Processing, Paris, France, 2014, pp. 3028–3032.
– volume: 31
  start-page: 855
  year: 2009
  end-page: 868
  ident: bib29
  article-title: A novel connectionist system for unconstrained handwriting recognition
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– reference: A. Graves, N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, in: International Conference on Machine Learning, Beijing, China, 2014, pp. 1764–1772.
– volume: 9
  start-page: 185
  year: 2011
  end-page: 192
  ident: bib32
  article-title: Evaluation of topic identification methods on arabic corpora
  publication-title: J. Digit. Inf. Manage.
– volume: 21
  start-page: 105
  year: 2007
  end-page: 152
  ident: bib17
  article-title: Random forests and the data sparseness problem in language modeling
  publication-title: Comput. Speech Lang.
– volume: 47
  start-page: 1642
  year: 2014
  end-page: 1652
  ident: bib22
  article-title: Neural network language models for off-line handwriting recognition
  publication-title: Pattern Recognit.
– reference: T.M. Breuel, A. Ul-Hasan, M.A. Al-Azawi, F. Shafait, High-performance OCR for printed English and Fraktur using LSTM networks, in: International Conference on Document Analysis and Recognition, Washington, DC, USA, 2013, pp. 683–687.
– reference: S. Roy, P.P. Roy, P. Shivakumara, G. Louloudis, C.L. Tan, U. Pal, HMM-based multi oriented text recognition in natural scene image, in: Asian Conference on Pattern Recognition, Naha, Japan, 2013, pp. 288–292.
– reference: A. Bissacco, M. Cummins, Y. Netzer, H. Neven, Photoocr: reading text in uncontrolled conditions, in: International Conference on Computer Vision, Sidney, Australia, 2013, pp. 785–792.
– reference: .
– volume: 47
  start-page: 3477
  year: 2014
  end-page: 3486
  ident: bib2
  article-title: Arabic word descriptor for handwritten word indexing and lexicon reduction
  publication-title: Pattern Recognit.
– reference: S. Kombrink, T. Mikolov, M. Karafiát, L. Burget, Recurrent neural network based language modeling in meeting recognition, in: INTERSPEECH, Florence, Italy, 2011, pp. 2877–2880.
– reference: A. Mishra, K. Alahari, C. Jawahar, Top-down and bottom-up cues for scene text recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 2012, pp. 785–792.
– volume: 42
  start-page: 167
  year: 2009
  end-page: 182
  ident: bib30
  article-title: Off-line recognition of realistic Chinese handwriting using segmentation-free strategy
  publication-title: Pattern Recognit.
– volume: 3
  start-page: 1137
  year: 2003
  end-page: 1155
  ident: bib20
  article-title: A neural probabilistic language model
  publication-title: J. Mach. Learn. Res.
– reference: S. Yousfi, S.-A. Berrani, C. Garcia, Deep learning and recurrent connectionist-based approaches for Arabic text recognition in videos, in: International Conference on Document Analysis and Recognition, Nancy, France, 2015, pp. 1026–1030.
– volume: 46
  start-page: 141
  year: 2013
  end-page: 154
  ident: bib1
  article-title: Arabic handwriting recognition using structural and syntactic pattern attributes
  publication-title: Pattern Recognit.
– volume: 3
  start-page: 128
  year: 2012
  end-page: 136
  ident: bib3
  article-title: NF-SAVO
  publication-title: Int. J. Adv. Comput. Sci. Appl.
– reference: A. Stolcke, et al., SRILM-an extensible language modeling toolkit, in: INTERSPEECH, Denver, Colorado, USA, 2002, pp. 901–904.
– ident: 10.1016/j.patcog.2016.11.011_bib7
  doi: 10.1109/ICASSP.2011.5947611
– volume: 3
  start-page: 1137
  year: 2003
  ident: 10.1016/j.patcog.2016.11.011_bib20
  article-title: A neural probabilistic language model
  publication-title: J. Mach. Learn. Res.
– ident: 10.1016/j.patcog.2016.11.011_bib4
  doi: 10.1145/2505377.2505394
– ident: 10.1016/j.patcog.2016.11.011_bib10
  doi: 10.1109/DAS.2012.45
– volume: 3
  start-page: 128
  issue: 10
  year: 2012
  ident: 10.1016/j.patcog.2016.11.011_bib3
  article-title: NF-SAVO
  publication-title: Int. J. Adv. Comput. Sci. Appl.
– volume: 46
  start-page: 141
  issue: 1
  year: 2013
  ident: 10.1016/j.patcog.2016.11.011_bib1
  article-title: Arabic handwriting recognition using structural and syntactic pattern attributes
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2012.07.012
– ident: 10.1016/j.patcog.2016.11.011_bib33
– ident: 10.1016/j.patcog.2016.11.011_bib5
  doi: 10.1109/ICDAR.2013.140
– ident: 10.1016/j.patcog.2016.11.011_bib8
  doi: 10.1109/ICDAR.2015.7333958
– volume: 47
  start-page: 1642
  issue: 4
  year: 2014
  ident: 10.1016/j.patcog.2016.11.011_bib22
  article-title: Neural network language models for off-line handwriting recognition
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2013.10.020
– volume: 47
  start-page: 1202
  issue: 3
  year: 2014
  ident: 10.1016/j.patcog.2016.11.011_bib12
  article-title: Unsupervised language model adaptation for handwritten Chinese text recognition
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2013.09.015
– ident: 10.1016/j.patcog.2016.11.011_bib18
  doi: 10.1109/ASRU.2009.5373380
– ident: 10.1016/j.patcog.2016.11.011_bib23
  doi: 10.1109/ICASSP.2011.5947611
– volume: 42
  start-page: 167
  issue: 1
  year: 2009
  ident: 10.1016/j.patcog.2016.11.011_bib30
  article-title: Off-line recognition of realistic Chinese handwriting using segmentation-free strategy
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2008.05.012
– ident: 10.1016/j.patcog.2016.11.011_bib6
– volume: 31
  start-page: 855
  issue: 5
  year: 2009
  ident: 10.1016/j.patcog.2016.11.011_bib29
  article-title: A novel connectionist system for unconstrained handwriting recognition
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2008.137
– volume: 9
  start-page: 185
  issue: 5
  year: 2011
  ident: 10.1016/j.patcog.2016.11.011_bib32
  article-title: Evaluation of topic identification methods on arabic corpora
  publication-title: J. Digit. Inf. Manage.
– ident: 10.1016/j.patcog.2016.11.011_bib34
  doi: 10.21437/ICSLP.2002-303
– volume: 14
  start-page: 179
  issue: 2
  year: 1990
  ident: 10.1016/j.patcog.2016.11.011_bib19
  article-title: Finding structure in time
  publication-title: Cogn. Sci.
  doi: 10.1207/s15516709cog1402_1
– volume: 47
  start-page: 3477
  issue: 10
  year: 2014
  ident: 10.1016/j.patcog.2016.11.011_bib2
  article-title: Arabic word descriptor for handwritten word indexing and lexicon reduction
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2014.04.025
– ident: 10.1016/j.patcog.2016.11.011_bib25
– ident: 10.1016/j.patcog.2016.11.011_bib11
– ident: 10.1016/j.patcog.2016.11.011_bib16
  doi: 10.3115/112405.112464
– ident: 10.1016/j.patcog.2016.11.011_bib27
  doi: 10.1109/ICIP.2014.7025612
– ident: 10.1016/j.patcog.2016.11.011_bib31
  doi: 10.1109/ACPR.2013.60
– volume: 17
  start-page: 19
  issue: 1
  year: 2014
  ident: 10.1016/j.patcog.2016.11.011_bib13
  article-title: Text recognition in multimedia documents
  publication-title: Int. J. Doc. Anal. Recognit.
  doi: 10.1007/s10032-013-0202-7
– ident: 10.1016/j.patcog.2016.11.011_bib24
  doi: 10.21437/Interspeech.2011-242
– ident: 10.1016/j.patcog.2016.11.011_bib28
  doi: 10.1109/ICDAR.2015.7333917
– ident: 10.1016/j.patcog.2016.11.011_bib15
  doi: 10.1109/CVPR.2012.6247990
– volume: 21
  start-page: 105
  issue: 1
  year: 2007
  ident: 10.1016/j.patcog.2016.11.011_bib17
  article-title: Random forests and the data sparseness problem in language modeling
  publication-title: Comput. Speech Lang.
  doi: 10.1016/j.csl.2006.01.003
– ident: 10.1016/j.patcog.2016.11.011_bib21
  doi: 10.1109/ICASSP.2002.5743830
– ident: 10.1016/j.patcog.2016.11.011_bib26
– ident: 10.1016/j.patcog.2016.11.011_bib9
  doi: 10.1117/12.783598
– ident: 10.1016/j.patcog.2016.11.011_bib14
  doi: 10.1109/ICCV.2013.102
SSID ssj0017142
Score 2.4242291
Snippet Unconstrained text recognition in videos is a very challenging task that begins to draw the attention of the OCR community. However, for Arabic video contents,...
SourceID hal
crossref
elsevier
SourceType Open Access Repository
Enrichment Source
Index Database
Publisher
StartPage 245
SubjectTerms Arabic video OCR
Artificial Intelligence
Computer Science
Computer Vision and Pattern Recognition
Connectionist language model
Convolutional Neural Network
Decoding
Document and Text Processing
Image Processing
LSTM
Machine Learning
Multimedia
Neural and Evolutionary Computing
RNN
Signal and Image Processing
Title Contribution of recurrent connectionist language models in improving LSTM-based Arabic text recognition in videos
URI https://dx.doi.org/10.1016/j.patcog.2016.11.011
https://hal.science/hal-01413629
Volume 64
WOSCitedRecordID wos000392682400020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1873-5142
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017142
  issn: 0031-3203
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Nb9MwFLfKxoEL34iND1mIW-UpiRM7OUawqYNSVWqRerNiJ6aZSlrSrtqdf5znOF_VhDYOXKzKdVrL79fn59effw-hj8wLNRz_NfEZc4ifaU2MBjTJdBYwxVOaqsrSYz6ZhItFNB0Mfjd3YfYrXhThzU20-a-mhj4wtrk6-w_mbj8UOuA1GB1aMDu09zK80ZtqqliZULA0GfVKg0kZUkt1jwFs22YqbTGcihabtxmG8Wz-jZgdLh3GZSJzNTQMkWFLN7IESXOHb73tx7fTSq6z6A_s-ZWtrrgDM5hAlwfIytLWlRrO8pTEy5_1dStDCjKFjpJDDYR-mgK2vo7dUuXO6o2-74upS6jn0L4vtormt9y6zTBcnW1ge1r_MIQ8dma0V2s_faCiPYpnYvr5QowvJ18P3-1RD0fxGNplsiKG6AqbeLSHI_Sxx4MIvPtxfHm--NL-GcVd34rO1xNubmBWNMHbU_pbhPNg2eTqq9hl_hQ9rg8dOLZgeYYGWfEcPWkKeuDav79Av_rYwWuNW-zgA-zgBjvYYgfnBW6xgzvsYIsdbLCDe5Aw4y12XqLvF-fzTyNS1-QgivrOjmguM1-lvpuoNJJJ5KQy8p00hTAnCFPlsMSDKFAGKpCcuTqgfhgo7ijOVCRDP6Ov0FGxLrLXCGvP40noalj_xJeujjhNmGaS8cyTAZUniDbrKFQtWG_qpqxEw0y8Enb1hVl9OMsKWP0TRNqnNlaw5Y7xvDGRqINOG0wKAN8dT34Ai7ZfYnTaAVbC9HWgOr3PoDfoUfeTeYuOduV19g49VPtdvi3f13D8A33ws2M
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Contribution+of+recurrent+connectionist+language+models+in+improving+LSTM-based+Arabic+text+recognition+in+videos&rft.jtitle=Pattern+recognition&rft.au=Yousfi%2C+Sonia&rft.au=Berrani%2C+Sid-Ahmed&rft.au=Garcia%2C+Christophe&rft.date=2017-04-01&rft.pub=Elsevier&rft.issn=0031-3203&rft.volume=64&rft_id=info:doi/10.1016%2Fj.patcog.2016.11.011&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai%3AHAL%3Ahal-01413629v1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0031-3203&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0031-3203&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0031-3203&client=summon