Contribution of recurrent connectionist language models in improving LSTM-based Arabic text recognition in videos
Unconstrained text recognition in videos is a very challenging task that begins to draw the attention of the OCR community. However, for Arabic video contents, this problem is much less addressed compared at least with Latin script. This work presents our latest contribution to this task, introducin...
Uloženo v:
| Vydáno v: | Pattern recognition Ročník 64; s. 245 - 254 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier Ltd
01.04.2017
Elsevier |
| Témata: | |
| ISSN: | 0031-3203, 1873-5142 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Unconstrained text recognition in videos is a very challenging task that begins to draw the attention of the OCR community. However, for Arabic video contents, this problem is much less addressed compared at least with Latin script. This work presents our latest contribution to this task, introducing recurrent connectionist language modeling in order to improve Long-Short Term Memory (LSTM) based Arabic text recognition in videos. For a LSTM OCR system that basically yields high recognition rates, introducing proper language models can easily deteriorate results. In this work, we focus on two main factors to reach better improvements. First, we propose to use Recurrent Neural Network (RNN) for language modeling that are able to capture long range linguistic dependencies. We use simple RNN models and models that are learned jointly with a Maximum Entropy language model. Second, for the decoding schema, we are not limited to a n-best rescoring of the OCR hypotheses. Instead, we propose a modified beam search algorithm that uses both OCR and language model probabilities in parallel at each decoding time-step. We introduce a set of hyper-parameters to the algorithm in order to boost recognition results and to control the decoding time. The method is used for Arabic text recognition in TV Broadcast. We conduct an extensive evaluation of the method and study the impact of the language models and the decoding parameters. Results show an improvement of 16% in terms of word recognition rate (WRR) over the baseline that uses only the OCR responses, while keeping a reasonable response time. Moreover, the proposed recurrent connectionist models outperform frequency-based models by more than 4% in terms of WRR. The final recognition schema provides outstanding results that outperform well-known commercial OCR engine by more than 36% in terms of WRR.
•Different recurrent connectionist language models to improve LSTM-based Arabic text recognition in videos.•Efficient joint decoding paradigm using language model and LSTM responses.•Additional decoding hyper-parameters, extensively evaluated, that improve recognition results and optimize running time.•Significant recognition improvement by integrating connectionist language models that outperform n-grams contribution.•Final Arabic OCR system that significantly outperforms commercial OCR engine. |
|---|---|
| AbstractList | Unconstrained text recognition in videos is a very challenging task that begins to draw the attention of the OCR community. However, for Arabic video contents, this problem is much less addressed compared at least with Latin script. This work presents our latest contribution to this task, introducing recurrent connectionist language modeling in order to improve Long-Short Term Memory (LSTM) based Arabic text recognition in videos. For a LSTM OCR system that basically yields high recognition rates, introducing proper language models can easily deteriorate results. In this work, we focus on two main factors to reach better improvements. First, we propose to use Recurrent Neural Network (RNN) for language modeling that are able to capture long range linguistic dependencies. We use simple RNN models and models that are learned jointly with a Maximum Entropy language model. Second, for the decoding schema, we are not limited to a n-best rescoring of the OCR hypotheses. Instead, we propose a modified beam search algorithm that uses both OCR and language model probabilities in parallel at each decoding time-step. We introduce a set of hyper-parameters to the algorithm in order to boost recognition results and to control the decoding time. The method is used for Arabic text recognition in TV Broadcast. We conduct an extensive evaluation of the method and study the impact of the language models and the decoding parameters. Results show an improvement of 16% in terms of word recognition rate (WRR) over the baseline that uses only the OCR responses, while keeping a reasonable response time. Moreover, the proposed recurrent connectionist models outperform frequency-based models by more than 4% in terms of WRR. The final recognition schema provides outstanding results that outperform well-known commercial OCR engine by more than 36% in terms of WRR. Unconstrained text recognition in videos is a very challenging task that begins to draw the attention of the OCR community. However, for Arabic video contents, this problem is much less addressed compared at least with Latin script. This work presents our latest contribution to this task, introducing recurrent connectionist language modeling in order to improve Long-Short Term Memory (LSTM) based Arabic text recognition in videos. For a LSTM OCR system that basically yields high recognition rates, introducing proper language models can easily deteriorate results. In this work, we focus on two main factors to reach better improvements. First, we propose to use Recurrent Neural Network (RNN) for language modeling that are able to capture long range linguistic dependencies. We use simple RNN models and models that are learned jointly with a Maximum Entropy language model. Second, for the decoding schema, we are not limited to a n-best rescoring of the OCR hypotheses. Instead, we propose a modified beam search algorithm that uses both OCR and language model probabilities in parallel at each decoding time-step. We introduce a set of hyper-parameters to the algorithm in order to boost recognition results and to control the decoding time. The method is used for Arabic text recognition in TV Broadcast. We conduct an extensive evaluation of the method and study the impact of the language models and the decoding parameters. Results show an improvement of 16% in terms of word recognition rate (WRR) over the baseline that uses only the OCR responses, while keeping a reasonable response time. Moreover, the proposed recurrent connectionist models outperform frequency-based models by more than 4% in terms of WRR. The final recognition schema provides outstanding results that outperform well-known commercial OCR engine by more than 36% in terms of WRR. •Different recurrent connectionist language models to improve LSTM-based Arabic text recognition in videos.•Efficient joint decoding paradigm using language model and LSTM responses.•Additional decoding hyper-parameters, extensively evaluated, that improve recognition results and optimize running time.•Significant recognition improvement by integrating connectionist language models that outperform n-grams contribution.•Final Arabic OCR system that significantly outperforms commercial OCR engine. |
| Author | Yousfi, Sonia Garcia, Christophe Berrani, Sid-Ahmed |
| Author_xml | – sequence: 1 givenname: Sonia surname: Yousfi fullname: Yousfi, Sonia email: sonia.yousfi@orange.com organization: Orange Labs-France Telecom, 35510 Cesson-Sévigné, France – sequence: 2 givenname: Sid-Ahmed surname: Berrani fullname: Berrani, Sid-Ahmed email: sidahmed.berrani@orange.com organization: Orange Labs-France Telecom, 35510 Cesson-Sévigné, France – sequence: 3 givenname: Christophe surname: Garcia fullname: Garcia, Christophe email: christophe.garcia@liris.cnrs.fr organization: University of Lyon, INSA-Lyon, LIRIS, UMR5205 CNRS, 69621 Villeurbanne, France |
| BackLink | https://hal.science/hal-01413629$$DView record in HAL |
| BookMark | eNqFkM1uEzEUhS1UJNLCG7DwlsVMfceePxaVooj-SEEsKGvLvvYMjiZ2sJ2ofXtmCGy6KCvL1-d81_ouyYUP3hLyEVgJDJrrXXlQGcNYVvOtBCgZwBuygq7lRQ2iuiArxjgUvGL8HblMaccYtPPDivzaBJ-j08fsgqdhoNHiMUbrM8XgvcVl7lKmk_LjUY2W7oOxU6LOU7c_xHByfqTb749fC62SNXQdlXZIs33KCyuM3v1Bz_mTMzak9-TtoKZkP_w9r8iP2y-Pm_ti--3uYbPeFsgFy8XQaivQCFBoeq16ZnQvmDFQdXVnkDWqYhXXNda6bWCouehqbBm2Dfa6E5ZfkU9n7k81yUN0exWfZVBO3q-3cpkxEMCbqj_BnP18zmIMKUU7SHRZLf_OUblJApOLaLmTZ9FyES0BZsZSFi_K_7b9p3Zzrs027cnZKBM669EaN3vL0gT3OuA36seeBQ |
| CitedBy_id | crossref_primary_10_1109_ACCESS_2020_3012542 crossref_primary_10_32604_cmc_2023_034669 crossref_primary_10_1109_ACCESS_2022_3144844 crossref_primary_10_1007_s00477_025_02921_5 crossref_primary_10_3390_e21101025 crossref_primary_10_1109_ACCESS_2020_3001605 crossref_primary_10_3390_app14125021 crossref_primary_10_1007_s11042_019_07855_z crossref_primary_10_1016_j_energy_2018_05_052 crossref_primary_10_1016_j_patcog_2020_107800 crossref_primary_10_1155_2022_9407999 crossref_primary_10_3390_fi12090156 crossref_primary_10_1109_ACCESS_2021_3053618 crossref_primary_10_1016_j_apenergy_2021_118296 crossref_primary_10_1007_s11042_022_12039_3 crossref_primary_10_1016_j_jconhyd_2023_104287 crossref_primary_10_1109_TCSVT_2018_2817642 crossref_primary_10_3390_math8040565 crossref_primary_10_1007_s10462_020_09838_1 crossref_primary_10_3390_fi10120123 crossref_primary_10_1016_j_patcog_2018_02_014 crossref_primary_10_1109_ACCESS_2021_3100717 crossref_primary_10_1007_s00521_021_05727_y crossref_primary_10_3390_w16213125 crossref_primary_10_1145_3532609 crossref_primary_10_1016_j_patcog_2020_107791 crossref_primary_10_1109_TII_2021_3065930 crossref_primary_10_1016_j_jocs_2018_03_010 crossref_primary_10_1016_j_scs_2019_102000 crossref_primary_10_1016_j_patcog_2020_107636 crossref_primary_10_1016_j_patcog_2020_107412 crossref_primary_10_1016_j_scitotenv_2019_07_246 crossref_primary_10_1016_j_jhydrol_2022_127901 crossref_primary_10_1016_j_jocs_2017_11_011 crossref_primary_10_1371_journal_pone_0294460 crossref_primary_10_1109_ACCESS_2021_3053289 crossref_primary_10_1007_s00500_024_09930_6 crossref_primary_10_3390_app12010181 |
| Cites_doi | 10.1109/ICASSP.2011.5947611 10.1145/2505377.2505394 10.1109/DAS.2012.45 10.1016/j.patcog.2012.07.012 10.1109/ICDAR.2013.140 10.1109/ICDAR.2015.7333958 10.1016/j.patcog.2013.10.020 10.1016/j.patcog.2013.09.015 10.1109/ASRU.2009.5373380 10.1016/j.patcog.2008.05.012 10.1109/TPAMI.2008.137 10.21437/ICSLP.2002-303 10.1207/s15516709cog1402_1 10.1016/j.patcog.2014.04.025 10.3115/112405.112464 10.1109/ICIP.2014.7025612 10.1109/ACPR.2013.60 10.1007/s10032-013-0202-7 10.21437/Interspeech.2011-242 10.1109/ICDAR.2015.7333917 10.1109/CVPR.2012.6247990 10.1016/j.csl.2006.01.003 10.1109/ICASSP.2002.5743830 10.1117/12.783598 10.1109/ICCV.2013.102 |
| ContentType | Journal Article |
| Copyright | 2016 Elsevier Ltd Distributed under a Creative Commons Attribution 4.0 International License |
| Copyright_xml | – notice: 2016 Elsevier Ltd – notice: Distributed under a Creative Commons Attribution 4.0 International License |
| DBID | AAYXX CITATION 1XC |
| DOI | 10.1016/j.patcog.2016.11.011 |
| DatabaseName | CrossRef Hyper Article en Ligne (HAL) |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1873-5142 |
| EndPage | 254 |
| ExternalDocumentID | oai:HAL:hal-01413629v1 10_1016_j_patcog_2016_11_011 S0031320316303697 |
| GroupedDBID | --K --M -D8 -DT -~X .DC .~1 0R~ 123 1B1 1RT 1~. 1~5 29O 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABEFU ABFNM ABFRF ABHFT ABJNI ABMAC ABTAH ABXDB ABYKQ ACBEA ACDAQ ACGFO ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADMXK ADTZH AEBSH AECPX AEFWE AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F0J F5P FD6 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q G8K GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM KZ1 LG9 LMP LY1 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG RNS ROL RPZ SBC SDF SDG SDP SDS SES SEW SPC SPCBC SST SSV SSZ T5K TN5 UNMZH VOH WUQ XJE XPP ZMT ZY4 ~G- 9DU AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD 1XC |
| ID | FETCH-LOGICAL-c340t-f7be4cd41acd9ba90db940dd12858dc06a2023b5c5b761f53485c70c76c9b84e3 |
| ISICitedReferencesCount | 48 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000392682400020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0031-3203 |
| IngestDate | Sat Nov 29 15:08:44 EST 2025 Sat Nov 29 03:52:19 EST 2025 Tue Nov 18 21:59:22 EST 2025 Fri Feb 23 02:25:26 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Convolutional Neural Network Arabic video OCR Connectionist language model RNN LSTM Decoding |
| Language | English |
| License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c340t-f7be4cd41acd9ba90db940dd12858dc06a2023b5c5b761f53485c70c76c9b84e3 |
| ORCID | 0000-0001-7997-9837 |
| PageCount | 10 |
| ParticipantIDs | hal_primary_oai_HAL_hal_01413629v1 crossref_citationtrail_10_1016_j_patcog_2016_11_011 crossref_primary_10_1016_j_patcog_2016_11_011 elsevier_sciencedirect_doi_10_1016_j_patcog_2016_11_011 |
| PublicationCentury | 2000 |
| PublicationDate | April 2017 2017-04-00 2017-04 |
| PublicationDateYYYYMMDD | 2017-04-01 |
| PublicationDate_xml | – month: 04 year: 2017 text: April 2017 |
| PublicationDecade | 2010 |
| PublicationTitle | Pattern recognition |
| PublicationYear | 2017 |
| Publisher | Elsevier Ltd Elsevier |
| Publisher_xml | – name: Elsevier Ltd – name: Elsevier |
| References | A. Stolcke, et al., SRILM-an extensible language modeling toolkit, in: INTERSPEECH, Denver, Colorado, USA, 2002, pp. 901–904. N. Tomeh, N. Habash, R. Roth, N. Farra, P. Dasigi, M. Diab, Reranking with linguistic and semantic features for Arabic optical character recognition in: Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 2013, pp. 549–555. F. Jelinek, B. Merialdo, S. Roukos, M. Strauss, A dynamic language model for speech recognition, in: DARPA Workshop on Speech and Natural Language, Pacific Grove, California, USA, 1991, pp. 293–295. S. Kombrink, T. Mikolov, M. Karafiát, L. Burget, Recurrent neural network based language modeling in meeting recognition, in: INTERSPEECH, Florence, Italy, 2011, pp. 2877–2880. T. Mikolov, M. Karafiát, L. Burget, J. Cernocky`, S. Khudanpur, Recurrent neural network based language model, in: INTERSPEECH, Makuhari, Japan, 2010, pp. 1045–1048. S. Roy, P.P. Roy, P. Shivakumara, G. Louloudis, C.L. Tan, U. Pal, HMM-based multi oriented text recognition in natural scene image, in: Asian Conference on Pattern Recognition, Naha, Japan, 2013, pp. 288–292. Abbas, Smaïli, Berkani (bib32) 2011; 9 T. Mikolov, Statistical language models based on neural networks (Ph.D. thesis), Brno University of Technology, 2012. T.M. Breuel, The OCRopus open source OCR system, in: SPIE Proceedings, 2008, p. 68150F. A. Bissacco, M. Cummins, Y. Netzer, H. Neven, Photoocr: reading text in uncontrolled conditions, in: International Conference on Computer Vision, Sidney, Australia, 2013, pp. 785–792. M.K. Saad, W. Ashour, Arabic morphological tools for text mining, in: International Symposium on Electrical and Electronics Engineering and Computer Science, Lefke, North Cyprus, 2010, pp. 112–117. S. Yousfi, S.-A. Berrani, C. Garcia, Deep learning and recurrent connectionist-based approaches for Arabic text recognition in videos, in: International Conference on Document Analysis and Recognition, Nancy, France, 2015, pp. 1026–1030. T.M. Breuel, A. Ul-Hasan, M.A. Al-Azawi, F. Shafait, High-performance OCR for printed English and Fraktur using LSTM networks, in: International Conference on Document Analysis and Recognition, Washington, DC, USA, 2013, pp. 683–687. S.F. Chen, L. Mangu, B. Ramabhadran, R. Sarikaya, A. Sethy, Scaling shrinkage-based language models, in: IEEE Workshop on Automatic Speech Recognition & Understanding, Merano, Italy, 2009, pp. 299–304. Parvez, Mahmoud (bib1) 2013; 46 A. Graves, Generating sequences with recurrent neural networks, arXiv preprint A. Graves, N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, in: International Conference on Machine Learning, Beijing, China, 2014, pp. 1764–1772. S. Yousfi, S.-A. Berrani, C. Garcia, Arabic text detection in videos using neural and boosting-based approaches: application to video indexing, in: International Conference on Image Processing, Paris, France, 2014, pp. 3028–3032. A. Mishra, K. Alahari, C. Jawahar, Top-down and bottom-up cues for scene text recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 2012, pp. 785–792. H. Schwenk, J.-L. Gauvain, Connectionist language modeling for large vocabulary continuous speech recognition, in: ICASSP, Orlando, Florida, USA, 2002, pp. 765–768. Bengio, Ducharme, Vincent, Jauvin (bib20) 2003; 3 Chherawala, Cheriet (bib2) 2014; 47 Su, Zhang, Guan, Huang (bib30) 2009; 42 A. Ul-Hasan, T.M. Breuel, Can we build language-independent OCR using LSTM networks?, in: International Workshop on Multilingual OCR, Washington, D.C., USA, 2013, pp. 9:1–9:5. Ben Halima, Karray, Alimi, Vila (bib3) 2012; 3 S. Yousfi, S.-A. Berrani, C. Garcia, Alif: a dataset for Arabic embedded text recognition in TV broadcast, in: International Conference on Document Analysis and Recognition, Nancy, France, 2015, pp. 1221–1225. . Elagouni, Garcia, Mamalet, Sébillot (bib13) 2014; 17 Xu, Jelinek (bib17) 2007; 21 Zamora-Martínez, Frinken, Espana-Boquera, Castro-Bleda, Fischer, Bunke (bib22) 2014; 47 D.-S. Lee, R. Smith, Improving book OCR by adaptive language and image models, in: International Workshop on Document Analysis Systems, Gold Coast, Queenslands, Australia, 2012, pp. 115–119. Wang, Yin, Liu (bib12) 2014; 47 Elman (bib19) 1990; 14 Graves, Liwicki, Fernandez, Bertolami, Bunke, Schmidhuber (bib29) 2009; 31 T. Mikolov, A. Deoras, S. Kombrink, L. Burget, J. Cernocky`, Empirical evaluation and combination of advanced language modeling techniques, in: INTERSPEECH, Florence, Italy, 2011, pp. 605–608. Ben Halima (10.1016/j.patcog.2016.11.011_bib3) 2012; 3 10.1016/j.patcog.2016.11.011_bib18 10.1016/j.patcog.2016.11.011_bib16 10.1016/j.patcog.2016.11.011_bib14 Xu (10.1016/j.patcog.2016.11.011_bib17) 2007; 21 10.1016/j.patcog.2016.11.011_bib15 10.1016/j.patcog.2016.11.011_bib34 10.1016/j.patcog.2016.11.011_bib10 10.1016/j.patcog.2016.11.011_bib11 10.1016/j.patcog.2016.11.011_bib33 Graves (10.1016/j.patcog.2016.11.011_bib29) 2009; 31 Abbas (10.1016/j.patcog.2016.11.011_bib32) 2011; 9 Wang (10.1016/j.patcog.2016.11.011_bib12) 2014; 47 10.1016/j.patcog.2016.11.011_bib31 Zamora-Martínez (10.1016/j.patcog.2016.11.011_bib22) 2014; 47 Elman (10.1016/j.patcog.2016.11.011_bib19) 1990; 14 Chherawala (10.1016/j.patcog.2016.11.011_bib2) 2014; 47 Su (10.1016/j.patcog.2016.11.011_bib30) 2009; 42 Elagouni (10.1016/j.patcog.2016.11.011_bib13) 2014; 17 10.1016/j.patcog.2016.11.011_bib27 10.1016/j.patcog.2016.11.011_bib28 10.1016/j.patcog.2016.11.011_bib8 10.1016/j.patcog.2016.11.011_bib25 10.1016/j.patcog.2016.11.011_bib7 10.1016/j.patcog.2016.11.011_bib26 10.1016/j.patcog.2016.11.011_bib23 10.1016/j.patcog.2016.11.011_bib9 10.1016/j.patcog.2016.11.011_bib24 10.1016/j.patcog.2016.11.011_bib4 10.1016/j.patcog.2016.11.011_bib21 10.1016/j.patcog.2016.11.011_bib6 10.1016/j.patcog.2016.11.011_bib5 Parvez (10.1016/j.patcog.2016.11.011_bib1) 2013; 46 Bengio (10.1016/j.patcog.2016.11.011_bib20) 2003; 3 |
| References_xml | – reference: A. Ul-Hasan, T.M. Breuel, Can we build language-independent OCR using LSTM networks?, in: International Workshop on Multilingual OCR, Washington, D.C., USA, 2013, pp. 9:1–9:5. – reference: T. Mikolov, Statistical language models based on neural networks (Ph.D. thesis), Brno University of Technology, 2012. – reference: S.F. Chen, L. Mangu, B. Ramabhadran, R. Sarikaya, A. Sethy, Scaling shrinkage-based language models, in: IEEE Workshop on Automatic Speech Recognition & Understanding, Merano, Italy, 2009, pp. 299–304. – reference: F. Jelinek, B. Merialdo, S. Roukos, M. Strauss, A dynamic language model for speech recognition, in: DARPA Workshop on Speech and Natural Language, Pacific Grove, California, USA, 1991, pp. 293–295. – reference: T.M. Breuel, The OCRopus open source OCR system, in: SPIE Proceedings, 2008, p. 68150F. – reference: H. Schwenk, J.-L. Gauvain, Connectionist language modeling for large vocabulary continuous speech recognition, in: ICASSP, Orlando, Florida, USA, 2002, pp. 765–768. – reference: A. Graves, Generating sequences with recurrent neural networks, arXiv preprint – reference: T. Mikolov, M. Karafiát, L. Burget, J. Cernocky`, S. Khudanpur, Recurrent neural network based language model, in: INTERSPEECH, Makuhari, Japan, 2010, pp. 1045–1048. – volume: 17 start-page: 19 year: 2014 end-page: 31 ident: bib13 article-title: Text recognition in multimedia documents publication-title: Int. J. Doc. Anal. Recognit. – reference: T. Mikolov, A. Deoras, S. Kombrink, L. Burget, J. Cernocky`, Empirical evaluation and combination of advanced language modeling techniques, in: INTERSPEECH, Florence, Italy, 2011, pp. 605–608. – reference: S. Yousfi, S.-A. Berrani, C. Garcia, Alif: a dataset for Arabic embedded text recognition in TV broadcast, in: International Conference on Document Analysis and Recognition, Nancy, France, 2015, pp. 1221–1225. – reference: D.-S. Lee, R. Smith, Improving book OCR by adaptive language and image models, in: International Workshop on Document Analysis Systems, Gold Coast, Queenslands, Australia, 2012, pp. 115–119. – reference: M.K. Saad, W. Ashour, Arabic morphological tools for text mining, in: International Symposium on Electrical and Electronics Engineering and Computer Science, Lefke, North Cyprus, 2010, pp. 112–117. – reference: N. Tomeh, N. Habash, R. Roth, N. Farra, P. Dasigi, M. Diab, Reranking with linguistic and semantic features for Arabic optical character recognition in: Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 2013, pp. 549–555. – volume: 47 start-page: 1202 year: 2014 end-page: 1216 ident: bib12 article-title: Unsupervised language model adaptation for handwritten Chinese text recognition publication-title: Pattern Recognit. – volume: 14 start-page: 179 year: 1990 end-page: 211 ident: bib19 article-title: Finding structure in time publication-title: Cogn. Sci. – reference: S. Yousfi, S.-A. Berrani, C. Garcia, Arabic text detection in videos using neural and boosting-based approaches: application to video indexing, in: International Conference on Image Processing, Paris, France, 2014, pp. 3028–3032. – volume: 31 start-page: 855 year: 2009 end-page: 868 ident: bib29 article-title: A novel connectionist system for unconstrained handwriting recognition publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – reference: A. Graves, N. Jaitly, Towards end-to-end speech recognition with recurrent neural networks, in: International Conference on Machine Learning, Beijing, China, 2014, pp. 1764–1772. – volume: 9 start-page: 185 year: 2011 end-page: 192 ident: bib32 article-title: Evaluation of topic identification methods on arabic corpora publication-title: J. Digit. Inf. Manage. – volume: 21 start-page: 105 year: 2007 end-page: 152 ident: bib17 article-title: Random forests and the data sparseness problem in language modeling publication-title: Comput. Speech Lang. – volume: 47 start-page: 1642 year: 2014 end-page: 1652 ident: bib22 article-title: Neural network language models for off-line handwriting recognition publication-title: Pattern Recognit. – reference: T.M. Breuel, A. Ul-Hasan, M.A. Al-Azawi, F. Shafait, High-performance OCR for printed English and Fraktur using LSTM networks, in: International Conference on Document Analysis and Recognition, Washington, DC, USA, 2013, pp. 683–687. – reference: S. Roy, P.P. Roy, P. Shivakumara, G. Louloudis, C.L. Tan, U. Pal, HMM-based multi oriented text recognition in natural scene image, in: Asian Conference on Pattern Recognition, Naha, Japan, 2013, pp. 288–292. – reference: A. Bissacco, M. Cummins, Y. Netzer, H. Neven, Photoocr: reading text in uncontrolled conditions, in: International Conference on Computer Vision, Sidney, Australia, 2013, pp. 785–792. – reference: . – volume: 47 start-page: 3477 year: 2014 end-page: 3486 ident: bib2 article-title: Arabic word descriptor for handwritten word indexing and lexicon reduction publication-title: Pattern Recognit. – reference: S. Kombrink, T. Mikolov, M. Karafiát, L. Burget, Recurrent neural network based language modeling in meeting recognition, in: INTERSPEECH, Florence, Italy, 2011, pp. 2877–2880. – reference: A. Mishra, K. Alahari, C. Jawahar, Top-down and bottom-up cues for scene text recognition, in: IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 2012, pp. 785–792. – volume: 42 start-page: 167 year: 2009 end-page: 182 ident: bib30 article-title: Off-line recognition of realistic Chinese handwriting using segmentation-free strategy publication-title: Pattern Recognit. – volume: 3 start-page: 1137 year: 2003 end-page: 1155 ident: bib20 article-title: A neural probabilistic language model publication-title: J. Mach. Learn. Res. – reference: S. Yousfi, S.-A. Berrani, C. Garcia, Deep learning and recurrent connectionist-based approaches for Arabic text recognition in videos, in: International Conference on Document Analysis and Recognition, Nancy, France, 2015, pp. 1026–1030. – volume: 46 start-page: 141 year: 2013 end-page: 154 ident: bib1 article-title: Arabic handwriting recognition using structural and syntactic pattern attributes publication-title: Pattern Recognit. – volume: 3 start-page: 128 year: 2012 end-page: 136 ident: bib3 article-title: NF-SAVO publication-title: Int. J. Adv. Comput. Sci. Appl. – reference: A. Stolcke, et al., SRILM-an extensible language modeling toolkit, in: INTERSPEECH, Denver, Colorado, USA, 2002, pp. 901–904. – ident: 10.1016/j.patcog.2016.11.011_bib7 doi: 10.1109/ICASSP.2011.5947611 – volume: 3 start-page: 1137 year: 2003 ident: 10.1016/j.patcog.2016.11.011_bib20 article-title: A neural probabilistic language model publication-title: J. Mach. Learn. Res. – ident: 10.1016/j.patcog.2016.11.011_bib4 doi: 10.1145/2505377.2505394 – ident: 10.1016/j.patcog.2016.11.011_bib10 doi: 10.1109/DAS.2012.45 – volume: 3 start-page: 128 issue: 10 year: 2012 ident: 10.1016/j.patcog.2016.11.011_bib3 article-title: NF-SAVO publication-title: Int. J. Adv. Comput. Sci. Appl. – volume: 46 start-page: 141 issue: 1 year: 2013 ident: 10.1016/j.patcog.2016.11.011_bib1 article-title: Arabic handwriting recognition using structural and syntactic pattern attributes publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2012.07.012 – ident: 10.1016/j.patcog.2016.11.011_bib33 – ident: 10.1016/j.patcog.2016.11.011_bib5 doi: 10.1109/ICDAR.2013.140 – ident: 10.1016/j.patcog.2016.11.011_bib8 doi: 10.1109/ICDAR.2015.7333958 – volume: 47 start-page: 1642 issue: 4 year: 2014 ident: 10.1016/j.patcog.2016.11.011_bib22 article-title: Neural network language models for off-line handwriting recognition publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2013.10.020 – volume: 47 start-page: 1202 issue: 3 year: 2014 ident: 10.1016/j.patcog.2016.11.011_bib12 article-title: Unsupervised language model adaptation for handwritten Chinese text recognition publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2013.09.015 – ident: 10.1016/j.patcog.2016.11.011_bib18 doi: 10.1109/ASRU.2009.5373380 – ident: 10.1016/j.patcog.2016.11.011_bib23 doi: 10.1109/ICASSP.2011.5947611 – volume: 42 start-page: 167 issue: 1 year: 2009 ident: 10.1016/j.patcog.2016.11.011_bib30 article-title: Off-line recognition of realistic Chinese handwriting using segmentation-free strategy publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2008.05.012 – ident: 10.1016/j.patcog.2016.11.011_bib6 – volume: 31 start-page: 855 issue: 5 year: 2009 ident: 10.1016/j.patcog.2016.11.011_bib29 article-title: A novel connectionist system for unconstrained handwriting recognition publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2008.137 – volume: 9 start-page: 185 issue: 5 year: 2011 ident: 10.1016/j.patcog.2016.11.011_bib32 article-title: Evaluation of topic identification methods on arabic corpora publication-title: J. Digit. Inf. Manage. – ident: 10.1016/j.patcog.2016.11.011_bib34 doi: 10.21437/ICSLP.2002-303 – volume: 14 start-page: 179 issue: 2 year: 1990 ident: 10.1016/j.patcog.2016.11.011_bib19 article-title: Finding structure in time publication-title: Cogn. Sci. doi: 10.1207/s15516709cog1402_1 – volume: 47 start-page: 3477 issue: 10 year: 2014 ident: 10.1016/j.patcog.2016.11.011_bib2 article-title: Arabic word descriptor for handwritten word indexing and lexicon reduction publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2014.04.025 – ident: 10.1016/j.patcog.2016.11.011_bib25 – ident: 10.1016/j.patcog.2016.11.011_bib11 – ident: 10.1016/j.patcog.2016.11.011_bib16 doi: 10.3115/112405.112464 – ident: 10.1016/j.patcog.2016.11.011_bib27 doi: 10.1109/ICIP.2014.7025612 – ident: 10.1016/j.patcog.2016.11.011_bib31 doi: 10.1109/ACPR.2013.60 – volume: 17 start-page: 19 issue: 1 year: 2014 ident: 10.1016/j.patcog.2016.11.011_bib13 article-title: Text recognition in multimedia documents publication-title: Int. J. Doc. Anal. Recognit. doi: 10.1007/s10032-013-0202-7 – ident: 10.1016/j.patcog.2016.11.011_bib24 doi: 10.21437/Interspeech.2011-242 – ident: 10.1016/j.patcog.2016.11.011_bib28 doi: 10.1109/ICDAR.2015.7333917 – ident: 10.1016/j.patcog.2016.11.011_bib15 doi: 10.1109/CVPR.2012.6247990 – volume: 21 start-page: 105 issue: 1 year: 2007 ident: 10.1016/j.patcog.2016.11.011_bib17 article-title: Random forests and the data sparseness problem in language modeling publication-title: Comput. Speech Lang. doi: 10.1016/j.csl.2006.01.003 – ident: 10.1016/j.patcog.2016.11.011_bib21 doi: 10.1109/ICASSP.2002.5743830 – ident: 10.1016/j.patcog.2016.11.011_bib26 – ident: 10.1016/j.patcog.2016.11.011_bib9 doi: 10.1117/12.783598 – ident: 10.1016/j.patcog.2016.11.011_bib14 doi: 10.1109/ICCV.2013.102 |
| SSID | ssj0017142 |
| Score | 2.4242291 |
| Snippet | Unconstrained text recognition in videos is a very challenging task that begins to draw the attention of the OCR community. However, for Arabic video contents,... |
| SourceID | hal crossref elsevier |
| SourceType | Open Access Repository Enrichment Source Index Database Publisher |
| StartPage | 245 |
| SubjectTerms | Arabic video OCR Artificial Intelligence Computer Science Computer Vision and Pattern Recognition Connectionist language model Convolutional Neural Network Decoding Document and Text Processing Image Processing LSTM Machine Learning Multimedia Neural and Evolutionary Computing RNN Signal and Image Processing |
| Title | Contribution of recurrent connectionist language models in improving LSTM-based Arabic text recognition in videos |
| URI | https://dx.doi.org/10.1016/j.patcog.2016.11.011 https://hal.science/hal-01413629 |
| Volume | 64 |
| WOSCitedRecordID | wos000392682400020&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1873-5142 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017142 issn: 0031-3203 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Nb9MwFLfKxoEL34iND1mIW-UpiRM7OUawqYNSVWqRerNiJ6aZSlrSrtqdf5znOF_VhDYOXKzKdVrL79fn59effw-hj8wLNRz_NfEZc4ifaU2MBjTJdBYwxVOaqsrSYz6ZhItFNB0Mfjd3YfYrXhThzU20-a-mhj4wtrk6-w_mbj8UOuA1GB1aMDu09zK80ZtqqliZULA0GfVKg0kZUkt1jwFs22YqbTGcihabtxmG8Wz-jZgdLh3GZSJzNTQMkWFLN7IESXOHb73tx7fTSq6z6A_s-ZWtrrgDM5hAlwfIytLWlRrO8pTEy5_1dStDCjKFjpJDDYR-mgK2vo7dUuXO6o2-74upS6jn0L4vtormt9y6zTBcnW1ge1r_MIQ8dma0V2s_faCiPYpnYvr5QowvJ18P3-1RD0fxGNplsiKG6AqbeLSHI_Sxx4MIvPtxfHm--NL-GcVd34rO1xNubmBWNMHbU_pbhPNg2eTqq9hl_hQ9rg8dOLZgeYYGWfEcPWkKeuDav79Av_rYwWuNW-zgA-zgBjvYYgfnBW6xgzvsYIsdbLCDe5Aw4y12XqLvF-fzTyNS1-QgivrOjmguM1-lvpuoNJJJ5KQy8p00hTAnCFPlsMSDKFAGKpCcuTqgfhgo7ijOVCRDP6Ov0FGxLrLXCGvP40noalj_xJeujjhNmGaS8cyTAZUniDbrKFQtWG_qpqxEw0y8Enb1hVl9OMsKWP0TRNqnNlaw5Y7xvDGRqINOG0wKAN8dT34Ai7ZfYnTaAVbC9HWgOr3PoDfoUfeTeYuOduV19g49VPtdvi3f13D8A33ws2M |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Contribution+of+recurrent+connectionist+language+models+in+improving+LSTM-based+Arabic+text+recognition+in+videos&rft.jtitle=Pattern+recognition&rft.au=Yousfi%2C+Sonia&rft.au=Berrani%2C+Sid-Ahmed&rft.au=Garcia%2C+Christophe&rft.date=2017-04-01&rft.pub=Elsevier&rft.issn=0031-3203&rft.volume=64&rft_id=info:doi/10.1016%2Fj.patcog.2016.11.011&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai%3AHAL%3Ahal-01413629v1 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0031-3203&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0031-3203&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0031-3203&client=summon |