An extended attention mechanism for scene text recognition
Scene text recognition (STR) refers to obtaining text information from natural text images. The task is more challenging than the optical character recognition(OCR) due to the variability of scenes. Attention mechanism, which assigns different weights to each feature vector at each time step, guides...
Uložené v:
| Vydané v: | Expert systems with applications Ročník 203; s. 117377 |
|---|---|
| Hlavní autori: | , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Elsevier Ltd
01.10.2022
|
| Predmet: | |
| ISSN: | 0957-4174, 1873-6793 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Scene text recognition (STR) refers to obtaining text information from natural text images. The task is more challenging than the optical character recognition(OCR) due to the variability of scenes. Attention mechanism, which assigns different weights to each feature vector at each time step, guides the text recognition decoding process. However, when the given query and the key/value are not related, the generated attention result will contain irrelevant information, which could lead the model to give wrong results. In this paper, we propose an extended attention-based framework for STR tasks. In particular, we have integrated an extended attention mechanism named Attention on Attention (AoA), which is able to determine the relevance between attention results and queries, into both the encoder and the decoder of a common text recognition framework. By two separate linear functions, the AoA module generates an information vector and an attention gate using the attention result and the current context. Then AoA adds new attention by applying element-wise multiplication to acquire final attended information. Our method is compared with seven benchmarks over eight datasets. Experimental results show that our method outperforms all the seven benchmarks, by 6.7% and 1.4% than the worst and best works on average.
•An extended attention mechanism is designed to recognize scene texts.•Attention on Attention(AoA) is applied in the Encoder and Decoder modules.•The method outperforms seven benchmarks by 4.5% averagely over 10 datasets. |
|---|---|
| AbstractList | Scene text recognition (STR) refers to obtaining text information from natural text images. The task is more challenging than the optical character recognition(OCR) due to the variability of scenes. Attention mechanism, which assigns different weights to each feature vector at each time step, guides the text recognition decoding process. However, when the given query and the key/value are not related, the generated attention result will contain irrelevant information, which could lead the model to give wrong results. In this paper, we propose an extended attention-based framework for STR tasks. In particular, we have integrated an extended attention mechanism named Attention on Attention (AoA), which is able to determine the relevance between attention results and queries, into both the encoder and the decoder of a common text recognition framework. By two separate linear functions, the AoA module generates an information vector and an attention gate using the attention result and the current context. Then AoA adds new attention by applying element-wise multiplication to acquire final attended information. Our method is compared with seven benchmarks over eight datasets. Experimental results show that our method outperforms all the seven benchmarks, by 6.7% and 1.4% than the worst and best works on average.
•An extended attention mechanism is designed to recognize scene texts.•Attention on Attention(AoA) is applied in the Encoder and Decoder modules.•The method outperforms seven benchmarks by 4.5% averagely over 10 datasets. |
| ArticleNumber | 117377 |
| Author | Xiao, Zheng Song, Chao Chronopoulos, Anthony Theodore Nie, Zhenyu |
| Author_xml | – sequence: 1 givenname: Zheng orcidid: 0000-0003-1144-7599 surname: Xiao fullname: Xiao, Zheng email: zxiao@hnu.edu.cn organization: College of Information Science and Engineering, Hunan University, Changsha, China – sequence: 2 givenname: Zhenyu orcidid: 0000-0002-7714-7903 surname: Nie fullname: Nie, Zhenyu email: niezhenyu@hnu.edu.cn organization: College of Information Science and Engineering, Hunan University, Changsha, China – sequence: 3 givenname: Chao orcidid: 0000-0002-8141-8585 surname: Song fullname: Song, Chao email: song_chao@hnu.edu.cn organization: College of Information Science and Engineering, Hunan University, Changsha, China – sequence: 4 givenname: Anthony Theodore orcidid: 0000-0002-0094-1017 surname: Chronopoulos fullname: Chronopoulos, Anthony Theodore email: anthony.chronopoulos@utsa.edu organization: Department of Computer Science, University of Texas, San Antonio, TX 78249, USA |
| BookMark | eNp9j8tOwzAQRS0EEm3hB1j5BxLGsRsniE1V8ZIqsYG1NbEn4Kp1kG3x-HsSlRWLbmZmcc_onjk7DUMgxq4ElAJEfb0tKX1hWUFVlUJoqfUJm4lGy6LWrTxlM2iXulBCq3M2T2kLIDSAnrGbVeD0nSk4chzzeGQ_BL4n-47Bpz3vh8iTpUA8jzkeyQ5vwU-hC3bW4y7R5d9esNf7u5f1Y7F5fnharzaFlQC50FJIrNF1pDQqJWun0KJTHekalm3TgtNEqu3GCbISODbuZYM1VNBS08kFqw5_bRxSitSbj-j3GH-MADPZm62Z7M1kbw72I9T8g6zPONXOEf3uOHp7QGmU-vQUTbKegiXnR_ts3OCP4b8EUHft |
| CitedBy_id | crossref_primary_10_1016_j_measurement_2024_115405 crossref_primary_10_1109_ACCESS_2023_3333338 crossref_primary_10_1371_journal_pone_0294943 crossref_primary_10_1016_j_heliyon_2023_e18992 crossref_primary_10_1109_TCE_2024_3505265 crossref_primary_10_1016_j_eswa_2024_123753 crossref_primary_10_1016_j_jksuci_2024_102010 crossref_primary_10_1016_j_eswa_2024_125747 crossref_primary_10_1016_j_oregeorev_2022_105262 crossref_primary_10_1016_j_asoc_2024_112548 crossref_primary_10_1016_j_eswa_2023_122769 crossref_primary_10_1108_DTA_08_2023_0414 crossref_primary_10_1016_j_eswa_2023_119647 crossref_primary_10_1016_j_eswa_2023_121622 crossref_primary_10_1016_j_knosys_2023_111178 crossref_primary_10_1016_j_ins_2023_119277 crossref_primary_10_1016_j_eswa_2025_128287 |
| Cites_doi | 10.1007/s10032-004-0134-3 10.1109/ICCV.2017.543 10.1109/ICCV.2015.123 10.1109/CVPR.2016.90 10.1609/aaai.v34i07.6903 10.1109/CVPR.2016.452 10.1109/CVPR.2016.245 10.1109/CVPR.2016.254 10.1609/aaai.v32i1.12246 10.1109/34.824820 10.1038/nrn755 10.1109/CVPR42600.2020.01354 10.1109/TPAMI.2016.2646371 10.1080/135062800394667 10.1109/ICCV.2019.00481 10.1145/3219819.3219861 10.1109/ICCV48922.2021.01467 10.1609/aaai.v34i07.6735 10.1109/ICCV.2013.102 10.1109/CVPR42600.2020.01213 10.1109/ICCV.2019.00473 10.1109/TPAMI.2018.2848939 10.1609/aaai.v33i01.33018714 10.1609/aaai.v32i1.12252 10.1109/ICCV.2019.00922 10.1016/j.eswa.2014.07.008 10.1109/ICCV.2013.76 10.1609/aaai.v34i07.6891 10.1007/s11263-015-0823-z |
| ContentType | Journal Article |
| Copyright | 2022 Elsevier Ltd |
| Copyright_xml | – notice: 2022 Elsevier Ltd |
| DBID | AAYXX CITATION |
| DOI | 10.1016/j.eswa.2022.117377 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1873-6793 |
| ExternalDocumentID | 10_1016_j_eswa_2022_117377 S0957417422007278 |
| GroupedDBID | --K --M .DC .~1 0R~ 13V 1B1 1RT 1~. 1~5 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN 9JO AAAKF AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AARIN AAXUO AAYFN ABBOA ABFNM ABMAC ABMVD ABUCO ABYKQ ACDAQ ACGFS ACHRH ACNTT ACRLP ACZNC ADBBV ADEZE ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGJBL AGUBO AGUMN AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJOXV ALEQD ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD APLSM AXJTR BJAXD BKOJK BLXMC BNSAS CS3 DU5 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HAMUX IHE J1W JJJVA KOM LG9 LY1 LY7 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 ROL RPZ SDF SDG SDP SDS SES SPC SPCBC SSB SSD SSL SST SSV SSZ T5K TN5 ~G- 29G 9DU AAAKG AAQXK AATTM AAXKI AAYWO AAYXX ABJNI ABKBG ABUFD ABWVN ABXDB ACLOT ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN CITATION EFKBS EJD FEDTE FGOYB G-2 HLZ HVGLF HZ~ R2- SBC SET SEW WUQ XPP ZMT ~HD |
| ID | FETCH-LOGICAL-c300t-7313a6adbe47a4436d4acad4be76059890d7ee49b7ee0321a679f38a60209e8b3 |
| ISICitedReferencesCount | 18 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000804932400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0957-4174 |
| IngestDate | Sat Nov 29 07:03:31 EST 2025 Tue Nov 18 21:34:47 EST 2025 Fri Feb 23 02:39:24 EST 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Attention on attention Deep neural network Encoder–decoder framework Scene text recognition |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c300t-7313a6adbe47a4436d4acad4be76059890d7ee49b7ee0321a679f38a60209e8b3 |
| ORCID | 0000-0002-7714-7903 0000-0002-8141-8585 0000-0003-1144-7599 0000-0002-0094-1017 |
| ParticipantIDs | crossref_primary_10_1016_j_eswa_2022_117377 crossref_citationtrail_10_1016_j_eswa_2022_117377 elsevier_sciencedirect_doi_10_1016_j_eswa_2022_117377 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-10-01 2022-10-00 |
| PublicationDateYYYYMMDD | 2022-10-01 |
| PublicationDate_xml | – month: 10 year: 2022 text: 2022-10-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | Expert systems with applications |
| PublicationYear | 2022 |
| Publisher | Elsevier Ltd |
| Publisher_xml | – name: Elsevier Ltd |
| References | Mishra, Alahari, Jawahar (b25) 2012 (pp. 14940–14949). Karatzas, Shafait, Uchida, Iwamura, i Bigorda, Mestre, Mas, Mota, Almazan, De Las Heras (b18) 2013 (pp. 13528–13537). (pp. 12113–12122). Bahdanau, Cho, Bengio (b2) 2014 Jaderberg, Simonyan, Zisserman, Kavukcuoglu (b16) 2015 (pp. 569–576). Xing, L., Tian, Z., Huang, W., & Scott, M. R. (2019). Convolutional character networks. In Shi, B., Wang, X., Lyu, P., Yao, C., & Bai, X. (2016). Robust scene text recognition with automatic rectification. In Borisyuk, F., Gordo, A., & Sivakumar, V. (2018). Rosetta: Large scale system for text detection and recognition in images. In Phan, T. Q., Shivakumara, P., Tian, S., & Tan, C. L. (2013). Recognizing text with perspective distortion in natural scenes. In Zheng, Chen, Fang, Xie, Jiang (b46) 2021 Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In Yin, Wu, Zhang, Liu (b43) 2017 (pp. 11005–11012). Lee, C.-Y., & Osindero, S. (2016). Recursive recurrent nets with attention modeling for ocr in the wild. In Wan, Z., He, M., Chen, H., Bai, X., & Yao, C. (2020). Textscanner: Reading characters in order for robust scene text recognition. In He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S. J., & Lee, H. (2019). What is wrong with scene text recognition model comparisons? dataset and model analysis. In . He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Risnumawan, Shivakumara, Chan, Tan (b30) 2014; 41 Shi, Yang, Wang, Lyu, Yao, Bai (b33) 2018; 41 (pp. 12216–12224). Jaderberg, Simonyan, Vedaldi, Zisserman (b15) 2016; 116 Wang, Babenko, Belongie (b37) 2011 Karatzas, Gomez-Bigorda, Nicolaou, Ghosh, Bagdanov, Iwamura, Matas, Neumann, Chandrasekhar, Lu (b17) 2015 Yu, D., Li, X., Zhang, C., Liu, T., Han, J., Liu, J., & Ding, E. (2020). Towards accurate scene text recognition with semantic reasoning networks. In (pp. 334–343). Rensink (b29) 2000; 7 Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., & Wang, W. (2020). Seed: Semantics enhanced encoder-decoder framework for scene text recognition. In Dauphin, Fan, Auli, Grangier (b8) 2017 Lucas, Panaretos, Sosa, Tang, Wong, Young, Ashida, Nagai, Okamoto, Yamamoto (b24) 2005; 7 Liu, Chen, Wong, Su, Han (b22) 2016 Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P., Yao, C., & Bai, X. (2019). Scene text recognition from two-dimensional perspective. In Shi, Bai, Yao (b31) 2016; 39 (pp. 4715–4723). Bhunia, A. K., Sain, A., Kumar, A., Ghose, S., Chowdhury, P. N., & Song, Y.-Z. (2021). Joint visual semantic reasoning: Multi-stage decoder for text recognition. In (pp. 1026–1034). Bissacco, A., Cummins, M., Netzer, Y., & Neven, H. (2013). Photoocr: Reading text in uncontrolled conditions. In Zeiler (b45) 2012 (pp. 9126–9136). Wang, T., Zhu, Y., Jin, L., Luo, C., Chen, X., Wu, Y., Wang, Q., & Cai, M. (2020). Decoupled attention network for text recognition. In Wojna, Gorban, Lee, Murphy, Yu, Li, Ibarz (b40) 2017; 1 (pp. 2315–2324). Nagy (b26) 2000; 22 (pp. 4168–4176). Wang, J., & Hu, X. (2017). Gated recurrent convolution neural network for ocr. In (pp. 8714–8721). (pp. 770–778). Hu, W., Cai, X., Hou, J., Yi, S., & Lin, Z. (2020). Gtc: Guided training of ctc towards efficient and accurate scene text recognition. In Liu, W., Chen, C., & Wong, K.-Y. (2018). Char-net: A character-aware neural network for distorted scene text recognition. In Veit, Matera, Neumann, Matas, Belongie (b35) 2016 (pp. 2231–2239). Yang, He, Zhou, Kifer, Giles (b42) 2017 Simonyan, Zisserman (b34) 2014 (pp. 785–792). (pp. 71–79). (pp. 5076–5084). Huang, L., Wang, W., Chen, J., & Wei, X.-Y. (2019). Attention on attention for image captioning. In (pp. 4634–4643). Corbetta, Shulman (b7) 2002; 3 (pp. 12120–12127). Liu, Z., Li, Y., Ren, F., Goh, W. L., & Yu, H. Squeezedtext: A real-time scene text recognition by binary convolutional encoder-decoder network. In Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., & Zhou, S. (2017). Focusing attention: Towards accurate text recognition in natural images. In Jaderberg, Simonyan, Vedaldi, Zisserman (b14) 2014 10.1016/j.eswa.2022.117377_b11 10.1016/j.eswa.2022.117377_b10 10.1016/j.eswa.2022.117377_b32 10.1016/j.eswa.2022.117377_b13 10.1016/j.eswa.2022.117377_b12 Yin (10.1016/j.eswa.2022.117377_b43) 2017 Lucas (10.1016/j.eswa.2022.117377_b24) 2005; 7 10.1016/j.eswa.2022.117377_b36 Wojna (10.1016/j.eswa.2022.117377_b40) 2017; 1 Zeiler (10.1016/j.eswa.2022.117377_b45) 2012 10.1016/j.eswa.2022.117377_b39 Jaderberg (10.1016/j.eswa.2022.117377_b15) 2016; 116 10.1016/j.eswa.2022.117377_b38 Yang (10.1016/j.eswa.2022.117377_b42) 2017 Dauphin (10.1016/j.eswa.2022.117377_b8) 2017 Rensink (10.1016/j.eswa.2022.117377_b29) 2000; 7 Corbetta (10.1016/j.eswa.2022.117377_b7) 2002; 3 Karatzas (10.1016/j.eswa.2022.117377_b18) 2013 10.1016/j.eswa.2022.117377_b9 Liu (10.1016/j.eswa.2022.117377_b22) 2016 10.1016/j.eswa.2022.117377_b5 10.1016/j.eswa.2022.117377_b6 Jaderberg (10.1016/j.eswa.2022.117377_b14) 2014 Wang (10.1016/j.eswa.2022.117377_b37) 2011 10.1016/j.eswa.2022.117377_b1 Nagy (10.1016/j.eswa.2022.117377_b26) 2000; 22 10.1016/j.eswa.2022.117377_b3 10.1016/j.eswa.2022.117377_b20 10.1016/j.eswa.2022.117377_b4 10.1016/j.eswa.2022.117377_b41 Jaderberg (10.1016/j.eswa.2022.117377_b16) 2015 10.1016/j.eswa.2022.117377_b44 10.1016/j.eswa.2022.117377_b21 10.1016/j.eswa.2022.117377_b23 10.1016/j.eswa.2022.117377_b28 Simonyan (10.1016/j.eswa.2022.117377_b34) 2014 10.1016/j.eswa.2022.117377_b27 10.1016/j.eswa.2022.117377_b19 Veit (10.1016/j.eswa.2022.117377_b35) 2016 Mishra (10.1016/j.eswa.2022.117377_b25) 2012 Zheng (10.1016/j.eswa.2022.117377_b46) 2021 Bahdanau (10.1016/j.eswa.2022.117377_b2) 2014 Shi (10.1016/j.eswa.2022.117377_b31) 2016; 39 Risnumawan (10.1016/j.eswa.2022.117377_b30) 2014; 41 Shi (10.1016/j.eswa.2022.117377_b33) 2018; 41 Karatzas (10.1016/j.eswa.2022.117377_b17) 2015 |
| References_xml | – reference: He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In – year: 2017 ident: b43 article-title: Scene text recognition with sliding convolutional character models – reference: Lee, C.-Y., & Osindero, S. (2016). Recursive recurrent nets with attention modeling for ocr in the wild. In – reference: He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In – reference: Shi, B., Wang, X., Lyu, P., Yao, C., & Bai, X. (2016). Robust scene text recognition with automatic rectification. In – reference: Xing, L., Tian, Z., Huang, W., & Scott, M. R. (2019). Convolutional character networks. In – start-page: 3 year: 2017 ident: b42 article-title: Learning to read irregular text with attention mechanisms. publication-title: IJCAI, vol. 1 – year: 2016 ident: b35 article-title: Coco-text: Dataset and benchmark for text detection and recognition in natural images – reference: (pp. 12113–12122). – volume: 3 start-page: 201 year: 2002 end-page: 215 ident: b7 article-title: Control of goal-directed and stimulus-driven attention in the brain publication-title: Nature Reviews Neuroscience – reference: Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., & Wang, W. (2020). Seed: Semantics enhanced encoder-decoder framework for scene text recognition. In – reference: Yu, D., Li, X., Zhang, C., Liu, T., Han, J., Liu, J., & Ding, E. (2020). Towards accurate scene text recognition with semantic reasoning networks. In – reference: Bissacco, A., Cummins, M., Netzer, Y., & Neven, H. (2013). Photoocr: Reading text in uncontrolled conditions. In – volume: 116 start-page: 1 year: 2016 end-page: 20 ident: b15 article-title: Reading text in the wild with convolutional neural networks publication-title: International Journal of Computer Vision – reference: Liu, Z., Li, Y., Ren, F., Goh, W. L., & Yu, H. Squeezedtext: A real-time scene text recognition by binary convolutional encoder-decoder network. In – reference: Wang, T., Zhu, Y., Jin, L., Luo, C., Chen, X., Wu, Y., Wang, Q., & Cai, M. (2020). Decoupled attention network for text recognition. In – year: 2014 ident: b2 article-title: Neural machine translation by jointly learning to align and translate – volume: 22 start-page: 38 year: 2000 end-page: 62 ident: b26 article-title: Twenty years of document image analysis in PAMI publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence – reference: Hu, W., Cai, X., Hou, J., Yi, S., & Lin, Z. (2020). Gtc: Guided training of ctc towards efficient and accurate scene text recognition. In – reference: Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P., Yao, C., & Bai, X. (2019). Scene text recognition from two-dimensional perspective. In – volume: 41 start-page: 2035 year: 2018 end-page: 2048 ident: b33 article-title: Aster: An attentional scene text recognizer with flexible rectification publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence – reference: (pp. 9126–9136). – reference: (pp. 12216–12224). – reference: Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., & Zhou, S. (2017). Focusing attention: Towards accurate text recognition in natural images. In – reference: Huang, L., Wang, W., Chen, J., & Wei, X.-Y. (2019). Attention on attention for image captioning. In – start-page: 933 year: 2017 end-page: 941 ident: b8 article-title: Language modeling with gated convolutional networks publication-title: International Conference on Machine Learning – reference: (pp. 5076–5084). – reference: Liu, W., Chen, C., & Wong, K.-Y. (2018). Char-net: A character-aware neural network for distorted scene text recognition. In – reference: (pp. 13528–13537). – reference: (pp. 4715–4723). – volume: 1 start-page: 844 year: 2017 end-page: 850 ident: b40 article-title: Attention-based extraction of structured information from street view imagery publication-title: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) – reference: Bhunia, A. K., Sain, A., Kumar, A., Ghose, S., Chowdhury, P. N., & Song, Y.-Z. (2021). Joint visual semantic reasoning: Multi-stage decoder for text recognition. In – reference: (pp. 14940–14949). – reference: Wang, J., & Hu, X. (2017). Gated recurrent convolution neural network for ocr. In – reference: (pp. 4634–4643). – volume: 39 start-page: 2298 year: 2016 end-page: 2304 ident: b31 article-title: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence – reference: (pp. 71–79). – volume: 7 start-page: 17 year: 2000 end-page: 42 ident: b29 article-title: The dynamic representation of scenes publication-title: Visual Cognition – year: 2015 ident: b16 article-title: Spatial transformer networks – reference: (pp. 11005–11012). – reference: Wan, Z., He, M., Chen, H., Bai, X., & Yao, C. (2020). Textscanner: Reading characters in order for robust scene text recognition. In – reference: (pp. 2231–2239). – reference: (pp. 2315–2324). – reference: (pp. 8714–8721). – reference: (pp. 4168–4176). – reference: (pp. 785–792). – volume: 41 start-page: 8027 year: 2014 end-page: 8048 ident: b30 article-title: A robust arbitrary text detection system for natural scene images publication-title: Expert Systems with Applications – reference: (pp. 12120–12127). – start-page: 1484 year: 2013 end-page: 1493 ident: b18 article-title: Icdar 2013 robust reading competition publication-title: 2013 12th International Conference on Document Analysis and Recognition – reference: Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S. J., & Lee, H. (2019). What is wrong with scene text recognition model comparisons? dataset and model analysis. In – year: 2012 ident: b25 article-title: Scene text recognition using higher order language priors publication-title: BMVC-British Machine Vision Conference – reference: Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In – reference: (pp. 1026–1034). – reference: (pp. 770–778). – reference: Borisyuk, F., Gordo, A., & Sivakumar, V. (2018). Rosetta: Large scale system for text detection and recognition in images. In – reference: Phan, T. Q., Shivakumara, P., Tian, S., & Tan, C. L. (2013). Recognizing text with perspective distortion in natural scenes. In – year: 2014 ident: b34 article-title: Very deep convolutional networks for large-scale image recognition – reference: , (pp. 334–343). – start-page: 1457 year: 2011 end-page: 1464 ident: b37 article-title: End-to-end scene text recognition publication-title: 2011 International Conference on Computer Vision – year: 2021 ident: b46 article-title: CDistNet: Perceiving multi-domain character distance for robust text recognition – year: 2014 ident: b14 article-title: Synthetic data and artificial neural networks for natural scene text recognition – year: 2012 ident: b45 article-title: Adadelta: an adaptive learning rate method – reference: . – reference: (pp. 569–576). – start-page: 7 year: 2016 ident: b22 article-title: Star-net: a spatial attention residue network for scene text recognition. publication-title: BMVC, vol. 2 – volume: 7 start-page: 105 year: 2005 end-page: 122 ident: b24 article-title: Icdar 2003 robust reading competitions: entries, results, and future directions publication-title: International Journal of Document Analysis and Recognition (IJDAR) – start-page: 1156 year: 2015 end-page: 1160 ident: b17 article-title: Icdar 2015 competition on robust reading publication-title: 2015 13th International Conference on Document Analysis and Recognition (ICDAR) – volume: 7 start-page: 105 issue: 2–3 year: 2005 ident: 10.1016/j.eswa.2022.117377_b24 article-title: Icdar 2003 robust reading competitions: entries, results, and future directions publication-title: International Journal of Document Analysis and Recognition (IJDAR) doi: 10.1007/s10032-004-0134-3 – year: 2012 ident: 10.1016/j.eswa.2022.117377_b45 – ident: 10.1016/j.eswa.2022.117377_b6 doi: 10.1109/ICCV.2017.543 – ident: 10.1016/j.eswa.2022.117377_b10 doi: 10.1109/ICCV.2015.123 – ident: 10.1016/j.eswa.2022.117377_b11 doi: 10.1109/CVPR.2016.90 – year: 2015 ident: 10.1016/j.eswa.2022.117377_b16 – ident: 10.1016/j.eswa.2022.117377_b39 doi: 10.1609/aaai.v34i07.6903 – ident: 10.1016/j.eswa.2022.117377_b32 doi: 10.1109/CVPR.2016.452 – ident: 10.1016/j.eswa.2022.117377_b19 doi: 10.1109/CVPR.2016.245 – start-page: 1484 year: 2013 ident: 10.1016/j.eswa.2022.117377_b18 article-title: Icdar 2013 robust reading competition – year: 2014 ident: 10.1016/j.eswa.2022.117377_b34 – ident: 10.1016/j.eswa.2022.117377_b9 doi: 10.1109/CVPR.2016.254 – ident: 10.1016/j.eswa.2022.117377_b21 doi: 10.1609/aaai.v32i1.12246 – volume: 22 start-page: 38 issue: 1 year: 2000 ident: 10.1016/j.eswa.2022.117377_b26 article-title: Twenty years of document image analysis in PAMI publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence doi: 10.1109/34.824820 – volume: 3 start-page: 201 issue: 3 year: 2002 ident: 10.1016/j.eswa.2022.117377_b7 article-title: Control of goal-directed and stimulus-driven attention in the brain publication-title: Nature Reviews Neuroscience doi: 10.1038/nrn755 – ident: 10.1016/j.eswa.2022.117377_b28 doi: 10.1109/CVPR42600.2020.01354 – year: 2016 ident: 10.1016/j.eswa.2022.117377_b35 – volume: 39 start-page: 2298 issue: 11 year: 2016 ident: 10.1016/j.eswa.2022.117377_b31 article-title: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence doi: 10.1109/TPAMI.2016.2646371 – ident: 10.1016/j.eswa.2022.117377_b38 – volume: 7 start-page: 17 issue: 1–3 year: 2000 ident: 10.1016/j.eswa.2022.117377_b29 article-title: The dynamic representation of scenes publication-title: Visual Cognition doi: 10.1080/135062800394667 – start-page: 7 year: 2016 ident: 10.1016/j.eswa.2022.117377_b22 article-title: Star-net: a spatial attention residue network for scene text recognition. – ident: 10.1016/j.eswa.2022.117377_b1 doi: 10.1109/ICCV.2019.00481 – start-page: 933 year: 2017 ident: 10.1016/j.eswa.2022.117377_b8 article-title: Language modeling with gated convolutional networks – start-page: 3 year: 2017 ident: 10.1016/j.eswa.2022.117377_b42 article-title: Learning to read irregular text with attention mechanisms. – ident: 10.1016/j.eswa.2022.117377_b5 doi: 10.1145/3219819.3219861 – start-page: 1156 year: 2015 ident: 10.1016/j.eswa.2022.117377_b17 article-title: Icdar 2015 competition on robust reading – ident: 10.1016/j.eswa.2022.117377_b3 doi: 10.1109/ICCV48922.2021.01467 – ident: 10.1016/j.eswa.2022.117377_b12 doi: 10.1609/aaai.v34i07.6735 – volume: 1 start-page: 844 year: 2017 ident: 10.1016/j.eswa.2022.117377_b40 article-title: Attention-based extraction of structured information from street view imagery – ident: 10.1016/j.eswa.2022.117377_b4 doi: 10.1109/ICCV.2013.102 – year: 2014 ident: 10.1016/j.eswa.2022.117377_b14 – ident: 10.1016/j.eswa.2022.117377_b44 doi: 10.1109/CVPR42600.2020.01213 – ident: 10.1016/j.eswa.2022.117377_b13 doi: 10.1109/ICCV.2019.00473 – year: 2021 ident: 10.1016/j.eswa.2022.117377_b46 – volume: 41 start-page: 2035 issue: 9 year: 2018 ident: 10.1016/j.eswa.2022.117377_b33 article-title: Aster: An attentional scene text recognizer with flexible rectification publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence doi: 10.1109/TPAMI.2018.2848939 – year: 2012 ident: 10.1016/j.eswa.2022.117377_b25 article-title: Scene text recognition using higher order language priors – ident: 10.1016/j.eswa.2022.117377_b20 doi: 10.1609/aaai.v33i01.33018714 – ident: 10.1016/j.eswa.2022.117377_b23 doi: 10.1609/aaai.v32i1.12252 – start-page: 1457 year: 2011 ident: 10.1016/j.eswa.2022.117377_b37 article-title: End-to-end scene text recognition – ident: 10.1016/j.eswa.2022.117377_b41 doi: 10.1109/ICCV.2019.00922 – volume: 41 start-page: 8027 issue: 18 year: 2014 ident: 10.1016/j.eswa.2022.117377_b30 article-title: A robust arbitrary text detection system for natural scene images publication-title: Expert Systems with Applications doi: 10.1016/j.eswa.2014.07.008 – year: 2014 ident: 10.1016/j.eswa.2022.117377_b2 – ident: 10.1016/j.eswa.2022.117377_b27 doi: 10.1109/ICCV.2013.76 – ident: 10.1016/j.eswa.2022.117377_b36 doi: 10.1609/aaai.v34i07.6891 – year: 2017 ident: 10.1016/j.eswa.2022.117377_b43 – volume: 116 start-page: 1 issue: 1 year: 2016 ident: 10.1016/j.eswa.2022.117377_b15 article-title: Reading text in the wild with convolutional neural networks publication-title: International Journal of Computer Vision doi: 10.1007/s11263-015-0823-z |
| SSID | ssj0017007 |
| Score | 2.49401 |
| Snippet | Scene text recognition (STR) refers to obtaining text information from natural text images. The task is more challenging than the optical character... |
| SourceID | crossref elsevier |
| SourceType | Enrichment Source Index Database Publisher |
| StartPage | 117377 |
| SubjectTerms | Attention on attention Deep neural network Encoder–decoder framework Scene text recognition |
| Title | An extended attention mechanism for scene text recognition |
| URI | https://dx.doi.org/10.1016/j.eswa.2022.117377 |
| Volume | 203 |
| WOSCitedRecordID | wos000804932400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals 2021 customDbUrl: eissn: 1873-6793 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0017007 issn: 0957-4174 databaseCode: AIEXJ dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LaxsxEBau00Mv6ZsmTYsOvZkNu5J2JfVmQkLbQyg0BdPLVquVSYy7axw7j3_fmZW0dh-EttCLMMLympkP7Wj0zTeEvIGAzRrJ8sTCaTkRmteJ0qxKLLz-TG107uS0azYhT0_VZKI_DgZfYy3M1Vw2jbq50Yv_6mqYA2dj6exfuLv_UZiAz-B0GMHtMP6R48co2-8z2yMUz_R0xm8OS3yxIwbyClHCyY2Q9DHqGUTBP7Oem-eWqyD0HEvgti67o6MmF6ZLtn45d-Ed2F1yuDh3u-5zOIH8e3Ru2g2pYNk27aJdzz3bL0gZdIoB2AJlOycBx9nIbguJslgss2Em-YyjTETmm_IcOr_fKsmTQvomiXFDZp3qwa-bu88zzA7d5TUqRjGGN848dIH5UTT7Ez4Mn8UwF8ukukd2mMy1GpKd8fvjyYf-pkmmvqQ-_rlQWOU5gD8_6ffBy1ZAcvaI7IaTBB17BDwmA9c8IQ9jlw4aNu2n5O24oREQtAcE7QFBARC0AwRFQNAtQDwjn0-Oz47eJaFjRmJ5mq4SyTNuClNXTkgjBC9qYaypReUkHFu10mktnRO6gjHlLDNg-ilXpoBDg3aq4s_JsGkb94JQlhoI3qycmgJF3zJtcqsFBJ-5kcpO3R7JoilKG-TksavJvIy8wVmJ5ivRfKU33x4Z9WsWXkzlzm_n0cJlCAd9mFcCIO5Yt_-P616SBxssH5Dharl2r8h9e7W6uFy-Drj5Do3VhR8 |
| linkProvider | Elsevier |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+extended+attention+mechanism+for+scene+text+recognition&rft.jtitle=Expert+systems+with+applications&rft.au=Xiao%2C+Zheng&rft.au=Nie%2C+Zhenyu&rft.au=Song%2C+Chao&rft.au=Chronopoulos%2C+Anthony+Theodore&rft.date=2022-10-01&rft.pub=Elsevier+Ltd&rft.issn=0957-4174&rft.eissn=1873-6793&rft.volume=203&rft_id=info:doi/10.1016%2Fj.eswa.2022.117377&rft.externalDocID=S0957417422007278 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon |