An extended attention mechanism for scene text recognition

Scene text recognition (STR) refers to obtaining text information from natural text images. The task is more challenging than the optical character recognition(OCR) due to the variability of scenes. Attention mechanism, which assigns different weights to each feature vector at each time step, guides...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Expert systems with applications Ročník 203; s. 117377
Hlavní autoři: Xiao, Zheng, Nie, Zhenyu, Song, Chao, Chronopoulos, Anthony Theodore
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier Ltd 01.10.2022
Témata:
ISSN:0957-4174, 1873-6793
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Scene text recognition (STR) refers to obtaining text information from natural text images. The task is more challenging than the optical character recognition(OCR) due to the variability of scenes. Attention mechanism, which assigns different weights to each feature vector at each time step, guides the text recognition decoding process. However, when the given query and the key/value are not related, the generated attention result will contain irrelevant information, which could lead the model to give wrong results. In this paper, we propose an extended attention-based framework for STR tasks. In particular, we have integrated an extended attention mechanism named Attention on Attention (AoA), which is able to determine the relevance between attention results and queries, into both the encoder and the decoder of a common text recognition framework. By two separate linear functions, the AoA module generates an information vector and an attention gate using the attention result and the current context. Then AoA adds new attention by applying element-wise multiplication to acquire final attended information. Our method is compared with seven benchmarks over eight datasets. Experimental results show that our method outperforms all the seven benchmarks, by 6.7% and 1.4% than the worst and best works on average. •An extended attention mechanism is designed to recognize scene texts.•Attention on Attention(AoA) is applied in the Encoder and Decoder modules.•The method outperforms seven benchmarks by 4.5% averagely over 10 datasets.
AbstractList Scene text recognition (STR) refers to obtaining text information from natural text images. The task is more challenging than the optical character recognition(OCR) due to the variability of scenes. Attention mechanism, which assigns different weights to each feature vector at each time step, guides the text recognition decoding process. However, when the given query and the key/value are not related, the generated attention result will contain irrelevant information, which could lead the model to give wrong results. In this paper, we propose an extended attention-based framework for STR tasks. In particular, we have integrated an extended attention mechanism named Attention on Attention (AoA), which is able to determine the relevance between attention results and queries, into both the encoder and the decoder of a common text recognition framework. By two separate linear functions, the AoA module generates an information vector and an attention gate using the attention result and the current context. Then AoA adds new attention by applying element-wise multiplication to acquire final attended information. Our method is compared with seven benchmarks over eight datasets. Experimental results show that our method outperforms all the seven benchmarks, by 6.7% and 1.4% than the worst and best works on average. •An extended attention mechanism is designed to recognize scene texts.•Attention on Attention(AoA) is applied in the Encoder and Decoder modules.•The method outperforms seven benchmarks by 4.5% averagely over 10 datasets.
ArticleNumber 117377
Author Xiao, Zheng
Song, Chao
Chronopoulos, Anthony Theodore
Nie, Zhenyu
Author_xml – sequence: 1
  givenname: Zheng
  orcidid: 0000-0003-1144-7599
  surname: Xiao
  fullname: Xiao, Zheng
  email: zxiao@hnu.edu.cn
  organization: College of Information Science and Engineering, Hunan University, Changsha, China
– sequence: 2
  givenname: Zhenyu
  orcidid: 0000-0002-7714-7903
  surname: Nie
  fullname: Nie, Zhenyu
  email: niezhenyu@hnu.edu.cn
  organization: College of Information Science and Engineering, Hunan University, Changsha, China
– sequence: 3
  givenname: Chao
  orcidid: 0000-0002-8141-8585
  surname: Song
  fullname: Song, Chao
  email: song_chao@hnu.edu.cn
  organization: College of Information Science and Engineering, Hunan University, Changsha, China
– sequence: 4
  givenname: Anthony Theodore
  orcidid: 0000-0002-0094-1017
  surname: Chronopoulos
  fullname: Chronopoulos, Anthony Theodore
  email: anthony.chronopoulos@utsa.edu
  organization: Department of Computer Science, University of Texas, San Antonio, TX 78249, USA
BookMark eNp9j8tOwzAQRS0EEm3hB1j5BxLGsRsniE1V8ZIqsYG1NbEn4Kp1kG3x-HsSlRWLbmZmcc_onjk7DUMgxq4ElAJEfb0tKX1hWUFVlUJoqfUJm4lGy6LWrTxlM2iXulBCq3M2T2kLIDSAnrGbVeD0nSk4chzzeGQ_BL4n-47Bpz3vh8iTpUA8jzkeyQ5vwU-hC3bW4y7R5d9esNf7u5f1Y7F5fnharzaFlQC50FJIrNF1pDQqJWun0KJTHekalm3TgtNEqu3GCbISODbuZYM1VNBS08kFqw5_bRxSitSbj-j3GH-MADPZm62Z7M1kbw72I9T8g6zPONXOEf3uOHp7QGmU-vQUTbKegiXnR_ts3OCP4b8EUHft
CitedBy_id crossref_primary_10_1016_j_measurement_2024_115405
crossref_primary_10_1109_ACCESS_2023_3333338
crossref_primary_10_1371_journal_pone_0294943
crossref_primary_10_1016_j_heliyon_2023_e18992
crossref_primary_10_1109_TCE_2024_3505265
crossref_primary_10_1016_j_eswa_2024_123753
crossref_primary_10_1016_j_jksuci_2024_102010
crossref_primary_10_1016_j_eswa_2024_125747
crossref_primary_10_1016_j_oregeorev_2022_105262
crossref_primary_10_1016_j_asoc_2024_112548
crossref_primary_10_1016_j_eswa_2023_122769
crossref_primary_10_1108_DTA_08_2023_0414
crossref_primary_10_1016_j_eswa_2023_119647
crossref_primary_10_1016_j_eswa_2023_121622
crossref_primary_10_1016_j_knosys_2023_111178
crossref_primary_10_1016_j_ins_2023_119277
crossref_primary_10_1016_j_eswa_2025_128287
Cites_doi 10.1007/s10032-004-0134-3
10.1109/ICCV.2017.543
10.1109/ICCV.2015.123
10.1109/CVPR.2016.90
10.1609/aaai.v34i07.6903
10.1109/CVPR.2016.452
10.1109/CVPR.2016.245
10.1109/CVPR.2016.254
10.1609/aaai.v32i1.12246
10.1109/34.824820
10.1038/nrn755
10.1109/CVPR42600.2020.01354
10.1109/TPAMI.2016.2646371
10.1080/135062800394667
10.1109/ICCV.2019.00481
10.1145/3219819.3219861
10.1109/ICCV48922.2021.01467
10.1609/aaai.v34i07.6735
10.1109/ICCV.2013.102
10.1109/CVPR42600.2020.01213
10.1109/ICCV.2019.00473
10.1109/TPAMI.2018.2848939
10.1609/aaai.v33i01.33018714
10.1609/aaai.v32i1.12252
10.1109/ICCV.2019.00922
10.1016/j.eswa.2014.07.008
10.1109/ICCV.2013.76
10.1609/aaai.v34i07.6891
10.1007/s11263-015-0823-z
ContentType Journal Article
Copyright 2022 Elsevier Ltd
Copyright_xml – notice: 2022 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.eswa.2022.117377
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1873-6793
ExternalDocumentID 10_1016_j_eswa_2022_117377
S0957417422007278
GroupedDBID --K
--M
.DC
.~1
0R~
13V
1B1
1RT
1~.
1~5
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AARIN
AAXUO
AAYFN
ABBOA
ABFNM
ABMAC
ABMVD
ABUCO
ABYKQ
ACDAQ
ACGFS
ACHRH
ACNTT
ACRLP
ACZNC
ADBBV
ADEZE
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGJBL
AGUBO
AGUMN
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
ALEQD
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
AXJTR
BJAXD
BKOJK
BLXMC
BNSAS
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
IHE
J1W
JJJVA
KOM
LG9
LY1
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
ROL
RPZ
SDF
SDG
SDP
SDS
SES
SPC
SPCBC
SSB
SSD
SSL
SST
SSV
SSZ
T5K
TN5
~G-
29G
9DU
AAAKG
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABJNI
ABKBG
ABUFD
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EJD
FEDTE
FGOYB
G-2
HLZ
HVGLF
HZ~
R2-
SBC
SET
SEW
WUQ
XPP
ZMT
~HD
ID FETCH-LOGICAL-c300t-7313a6adbe47a4436d4acad4be76059890d7ee49b7ee0321a679f38a60209e8b3
ISICitedReferencesCount 18
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000804932400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0957-4174
IngestDate Sat Nov 29 07:03:31 EST 2025
Tue Nov 18 21:34:47 EST 2025
Fri Feb 23 02:39:24 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Attention on attention
Deep neural network
Encoder–decoder framework
Scene text recognition
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c300t-7313a6adbe47a4436d4acad4be76059890d7ee49b7ee0321a679f38a60209e8b3
ORCID 0000-0002-7714-7903
0000-0002-8141-8585
0000-0003-1144-7599
0000-0002-0094-1017
ParticipantIDs crossref_primary_10_1016_j_eswa_2022_117377
crossref_citationtrail_10_1016_j_eswa_2022_117377
elsevier_sciencedirect_doi_10_1016_j_eswa_2022_117377
PublicationCentury 2000
PublicationDate 2022-10-01
2022-10-00
PublicationDateYYYYMMDD 2022-10-01
PublicationDate_xml – month: 10
  year: 2022
  text: 2022-10-01
  day: 01
PublicationDecade 2020
PublicationTitle Expert systems with applications
PublicationYear 2022
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Mishra, Alahari, Jawahar (b25) 2012
(pp. 14940–14949).
Karatzas, Shafait, Uchida, Iwamura, i Bigorda, Mestre, Mas, Mota, Almazan, De Las Heras (b18) 2013
(pp. 13528–13537).
(pp. 12113–12122).
Bahdanau, Cho, Bengio (b2) 2014
Jaderberg, Simonyan, Zisserman, Kavukcuoglu (b16) 2015
(pp. 569–576).
Xing, L., Tian, Z., Huang, W., & Scott, M. R. (2019). Convolutional character networks. In
Shi, B., Wang, X., Lyu, P., Yao, C., & Bai, X. (2016). Robust scene text recognition with automatic rectification. In
Borisyuk, F., Gordo, A., & Sivakumar, V. (2018). Rosetta: Large scale system for text detection and recognition in images. In
Phan, T. Q., Shivakumara, P., Tian, S., & Tan, C. L. (2013). Recognizing text with perspective distortion in natural scenes. In
Zheng, Chen, Fang, Xie, Jiang (b46) 2021
Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In
Yin, Wu, Zhang, Liu (b43) 2017
(pp. 11005–11012).
Lee, C.-Y., & Osindero, S. (2016). Recursive recurrent nets with attention modeling for ocr in the wild. In
Wan, Z., He, M., Chen, H., Bai, X., & Yao, C. (2020). Textscanner: Reading characters in order for robust scene text recognition. In
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In
Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S. J., & Lee, H. (2019). What is wrong with scene text recognition model comparisons? dataset and model analysis. In
.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In
Risnumawan, Shivakumara, Chan, Tan (b30) 2014; 41
Shi, Yang, Wang, Lyu, Yao, Bai (b33) 2018; 41
(pp. 12216–12224).
Jaderberg, Simonyan, Vedaldi, Zisserman (b15) 2016; 116
Wang, Babenko, Belongie (b37) 2011
Karatzas, Gomez-Bigorda, Nicolaou, Ghosh, Bagdanov, Iwamura, Matas, Neumann, Chandrasekhar, Lu (b17) 2015
Yu, D., Li, X., Zhang, C., Liu, T., Han, J., Liu, J., & Ding, E. (2020). Towards accurate scene text recognition with semantic reasoning networks. In
(pp. 334–343).
Rensink (b29) 2000; 7
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., & Wang, W. (2020). Seed: Semantics enhanced encoder-decoder framework for scene text recognition. In
Dauphin, Fan, Auli, Grangier (b8) 2017
Lucas, Panaretos, Sosa, Tang, Wong, Young, Ashida, Nagai, Okamoto, Yamamoto (b24) 2005; 7
Liu, Chen, Wong, Su, Han (b22) 2016
Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P., Yao, C., & Bai, X. (2019). Scene text recognition from two-dimensional perspective. In
Shi, Bai, Yao (b31) 2016; 39
(pp. 4715–4723).
Bhunia, A. K., Sain, A., Kumar, A., Ghose, S., Chowdhury, P. N., & Song, Y.-Z. (2021). Joint visual semantic reasoning: Multi-stage decoder for text recognition. In
(pp. 1026–1034).
Bissacco, A., Cummins, M., Netzer, Y., & Neven, H. (2013). Photoocr: Reading text in uncontrolled conditions. In
Zeiler (b45) 2012
(pp. 9126–9136).
Wang, T., Zhu, Y., Jin, L., Luo, C., Chen, X., Wu, Y., Wang, Q., & Cai, M. (2020). Decoupled attention network for text recognition. In
Wojna, Gorban, Lee, Murphy, Yu, Li, Ibarz (b40) 2017; 1
(pp. 2315–2324).
Nagy (b26) 2000; 22
(pp. 4168–4176).
Wang, J., & Hu, X. (2017). Gated recurrent convolution neural network for ocr. In
(pp. 8714–8721).
(pp. 770–778).
Hu, W., Cai, X., Hou, J., Yi, S., & Lin, Z. (2020). Gtc: Guided training of ctc towards efficient and accurate scene text recognition. In
Liu, W., Chen, C., & Wong, K.-Y. (2018). Char-net: A character-aware neural network for distorted scene text recognition. In
Veit, Matera, Neumann, Matas, Belongie (b35) 2016
(pp. 2231–2239).
Yang, He, Zhou, Kifer, Giles (b42) 2017
Simonyan, Zisserman (b34) 2014
(pp. 785–792).
(pp. 71–79).
(pp. 5076–5084).
Huang, L., Wang, W., Chen, J., & Wei, X.-Y. (2019). Attention on attention for image captioning. In
(pp. 4634–4643).
Corbetta, Shulman (b7) 2002; 3
(pp. 12120–12127).
Liu, Z., Li, Y., Ren, F., Goh, W. L., & Yu, H. Squeezedtext: A real-time scene text recognition by binary convolutional encoder-decoder network. In
Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., & Zhou, S. (2017). Focusing attention: Towards accurate text recognition in natural images. In
Jaderberg, Simonyan, Vedaldi, Zisserman (b14) 2014
10.1016/j.eswa.2022.117377_b11
10.1016/j.eswa.2022.117377_b10
10.1016/j.eswa.2022.117377_b32
10.1016/j.eswa.2022.117377_b13
10.1016/j.eswa.2022.117377_b12
Yin (10.1016/j.eswa.2022.117377_b43) 2017
Lucas (10.1016/j.eswa.2022.117377_b24) 2005; 7
10.1016/j.eswa.2022.117377_b36
Wojna (10.1016/j.eswa.2022.117377_b40) 2017; 1
Zeiler (10.1016/j.eswa.2022.117377_b45) 2012
10.1016/j.eswa.2022.117377_b39
Jaderberg (10.1016/j.eswa.2022.117377_b15) 2016; 116
10.1016/j.eswa.2022.117377_b38
Yang (10.1016/j.eswa.2022.117377_b42) 2017
Dauphin (10.1016/j.eswa.2022.117377_b8) 2017
Rensink (10.1016/j.eswa.2022.117377_b29) 2000; 7
Corbetta (10.1016/j.eswa.2022.117377_b7) 2002; 3
Karatzas (10.1016/j.eswa.2022.117377_b18) 2013
10.1016/j.eswa.2022.117377_b9
Liu (10.1016/j.eswa.2022.117377_b22) 2016
10.1016/j.eswa.2022.117377_b5
10.1016/j.eswa.2022.117377_b6
Jaderberg (10.1016/j.eswa.2022.117377_b14) 2014
Wang (10.1016/j.eswa.2022.117377_b37) 2011
10.1016/j.eswa.2022.117377_b1
Nagy (10.1016/j.eswa.2022.117377_b26) 2000; 22
10.1016/j.eswa.2022.117377_b3
10.1016/j.eswa.2022.117377_b20
10.1016/j.eswa.2022.117377_b4
10.1016/j.eswa.2022.117377_b41
Jaderberg (10.1016/j.eswa.2022.117377_b16) 2015
10.1016/j.eswa.2022.117377_b44
10.1016/j.eswa.2022.117377_b21
10.1016/j.eswa.2022.117377_b23
10.1016/j.eswa.2022.117377_b28
Simonyan (10.1016/j.eswa.2022.117377_b34) 2014
10.1016/j.eswa.2022.117377_b27
10.1016/j.eswa.2022.117377_b19
Veit (10.1016/j.eswa.2022.117377_b35) 2016
Mishra (10.1016/j.eswa.2022.117377_b25) 2012
Zheng (10.1016/j.eswa.2022.117377_b46) 2021
Bahdanau (10.1016/j.eswa.2022.117377_b2) 2014
Shi (10.1016/j.eswa.2022.117377_b31) 2016; 39
Risnumawan (10.1016/j.eswa.2022.117377_b30) 2014; 41
Shi (10.1016/j.eswa.2022.117377_b33) 2018; 41
Karatzas (10.1016/j.eswa.2022.117377_b17) 2015
References_xml – reference: He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In
– year: 2017
  ident: b43
  article-title: Scene text recognition with sliding convolutional character models
– reference: Lee, C.-Y., & Osindero, S. (2016). Recursive recurrent nets with attention modeling for ocr in the wild. In
– reference: He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In
– reference: Shi, B., Wang, X., Lyu, P., Yao, C., & Bai, X. (2016). Robust scene text recognition with automatic rectification. In
– reference: Xing, L., Tian, Z., Huang, W., & Scott, M. R. (2019). Convolutional character networks. In
– start-page: 3
  year: 2017
  ident: b42
  article-title: Learning to read irregular text with attention mechanisms.
  publication-title: IJCAI, vol. 1
– year: 2016
  ident: b35
  article-title: Coco-text: Dataset and benchmark for text detection and recognition in natural images
– reference: (pp. 12113–12122).
– volume: 3
  start-page: 201
  year: 2002
  end-page: 215
  ident: b7
  article-title: Control of goal-directed and stimulus-driven attention in the brain
  publication-title: Nature Reviews Neuroscience
– reference: Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., & Wang, W. (2020). Seed: Semantics enhanced encoder-decoder framework for scene text recognition. In
– reference: Yu, D., Li, X., Zhang, C., Liu, T., Han, J., Liu, J., & Ding, E. (2020). Towards accurate scene text recognition with semantic reasoning networks. In
– reference: Bissacco, A., Cummins, M., Netzer, Y., & Neven, H. (2013). Photoocr: Reading text in uncontrolled conditions. In
– volume: 116
  start-page: 1
  year: 2016
  end-page: 20
  ident: b15
  article-title: Reading text in the wild with convolutional neural networks
  publication-title: International Journal of Computer Vision
– reference: Liu, Z., Li, Y., Ren, F., Goh, W. L., & Yu, H. Squeezedtext: A real-time scene text recognition by binary convolutional encoder-decoder network. In
– reference: Wang, T., Zhu, Y., Jin, L., Luo, C., Chen, X., Wu, Y., Wang, Q., & Cai, M. (2020). Decoupled attention network for text recognition. In
– year: 2014
  ident: b2
  article-title: Neural machine translation by jointly learning to align and translate
– volume: 22
  start-page: 38
  year: 2000
  end-page: 62
  ident: b26
  article-title: Twenty years of document image analysis in PAMI
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
– reference: Hu, W., Cai, X., Hou, J., Yi, S., & Lin, Z. (2020). Gtc: Guided training of ctc towards efficient and accurate scene text recognition. In
– reference: Liao, M., Zhang, J., Wan, Z., Xie, F., Liang, J., Lyu, P., Yao, C., & Bai, X. (2019). Scene text recognition from two-dimensional perspective. In
– volume: 41
  start-page: 2035
  year: 2018
  end-page: 2048
  ident: b33
  article-title: Aster: An attentional scene text recognizer with flexible rectification
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
– reference: (pp. 9126–9136).
– reference: (pp. 12216–12224).
– reference: Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., & Zhou, S. (2017). Focusing attention: Towards accurate text recognition in natural images. In
– reference: Huang, L., Wang, W., Chen, J., & Wei, X.-Y. (2019). Attention on attention for image captioning. In
– start-page: 933
  year: 2017
  end-page: 941
  ident: b8
  article-title: Language modeling with gated convolutional networks
  publication-title: International Conference on Machine Learning
– reference: (pp. 5076–5084).
– reference: Liu, W., Chen, C., & Wong, K.-Y. (2018). Char-net: A character-aware neural network for distorted scene text recognition. In
– reference: (pp. 13528–13537).
– reference: (pp. 4715–4723).
– volume: 1
  start-page: 844
  year: 2017
  end-page: 850
  ident: b40
  article-title: Attention-based extraction of structured information from street view imagery
  publication-title: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)
– reference: Bhunia, A. K., Sain, A., Kumar, A., Ghose, S., Chowdhury, P. N., & Song, Y.-Z. (2021). Joint visual semantic reasoning: Multi-stage decoder for text recognition. In
– reference: (pp. 14940–14949).
– reference: Wang, J., & Hu, X. (2017). Gated recurrent convolution neural network for ocr. In
– reference: (pp. 4634–4643).
– volume: 39
  start-page: 2298
  year: 2016
  end-page: 2304
  ident: b31
  article-title: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
– reference: (pp. 71–79).
– volume: 7
  start-page: 17
  year: 2000
  end-page: 42
  ident: b29
  article-title: The dynamic representation of scenes
  publication-title: Visual Cognition
– year: 2015
  ident: b16
  article-title: Spatial transformer networks
– reference: (pp. 11005–11012).
– reference: Wan, Z., He, M., Chen, H., Bai, X., & Yao, C. (2020). Textscanner: Reading characters in order for robust scene text recognition. In
– reference: (pp. 2231–2239).
– reference: (pp. 2315–2324).
– reference: (pp. 8714–8721).
– reference: (pp. 4168–4176).
– reference: (pp. 785–792).
– volume: 41
  start-page: 8027
  year: 2014
  end-page: 8048
  ident: b30
  article-title: A robust arbitrary text detection system for natural scene images
  publication-title: Expert Systems with Applications
– reference: (pp. 12120–12127).
– start-page: 1484
  year: 2013
  end-page: 1493
  ident: b18
  article-title: Icdar 2013 robust reading competition
  publication-title: 2013 12th International Conference on Document Analysis and Recognition
– reference: Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S. J., & Lee, H. (2019). What is wrong with scene text recognition model comparisons? dataset and model analysis. In
– year: 2012
  ident: b25
  article-title: Scene text recognition using higher order language priors
  publication-title: BMVC-British Machine Vision Conference
– reference: Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Synthetic data for text localisation in natural images. In
– reference: (pp. 1026–1034).
– reference: (pp. 770–778).
– reference: Borisyuk, F., Gordo, A., & Sivakumar, V. (2018). Rosetta: Large scale system for text detection and recognition in images. In
– reference: Phan, T. Q., Shivakumara, P., Tian, S., & Tan, C. L. (2013). Recognizing text with perspective distortion in natural scenes. In
– year: 2014
  ident: b34
  article-title: Very deep convolutional networks for large-scale image recognition
– reference: , (pp. 334–343).
– start-page: 1457
  year: 2011
  end-page: 1464
  ident: b37
  article-title: End-to-end scene text recognition
  publication-title: 2011 International Conference on Computer Vision
– year: 2021
  ident: b46
  article-title: CDistNet: Perceiving multi-domain character distance for robust text recognition
– year: 2014
  ident: b14
  article-title: Synthetic data and artificial neural networks for natural scene text recognition
– year: 2012
  ident: b45
  article-title: Adadelta: an adaptive learning rate method
– reference: .
– reference: (pp. 569–576).
– start-page: 7
  year: 2016
  ident: b22
  article-title: Star-net: a spatial attention residue network for scene text recognition.
  publication-title: BMVC, vol. 2
– volume: 7
  start-page: 105
  year: 2005
  end-page: 122
  ident: b24
  article-title: Icdar 2003 robust reading competitions: entries, results, and future directions
  publication-title: International Journal of Document Analysis and Recognition (IJDAR)
– start-page: 1156
  year: 2015
  end-page: 1160
  ident: b17
  article-title: Icdar 2015 competition on robust reading
  publication-title: 2015 13th International Conference on Document Analysis and Recognition (ICDAR)
– volume: 7
  start-page: 105
  issue: 2–3
  year: 2005
  ident: 10.1016/j.eswa.2022.117377_b24
  article-title: Icdar 2003 robust reading competitions: entries, results, and future directions
  publication-title: International Journal of Document Analysis and Recognition (IJDAR)
  doi: 10.1007/s10032-004-0134-3
– year: 2012
  ident: 10.1016/j.eswa.2022.117377_b45
– ident: 10.1016/j.eswa.2022.117377_b6
  doi: 10.1109/ICCV.2017.543
– ident: 10.1016/j.eswa.2022.117377_b10
  doi: 10.1109/ICCV.2015.123
– ident: 10.1016/j.eswa.2022.117377_b11
  doi: 10.1109/CVPR.2016.90
– year: 2015
  ident: 10.1016/j.eswa.2022.117377_b16
– ident: 10.1016/j.eswa.2022.117377_b39
  doi: 10.1609/aaai.v34i07.6903
– ident: 10.1016/j.eswa.2022.117377_b32
  doi: 10.1109/CVPR.2016.452
– ident: 10.1016/j.eswa.2022.117377_b19
  doi: 10.1109/CVPR.2016.245
– start-page: 1484
  year: 2013
  ident: 10.1016/j.eswa.2022.117377_b18
  article-title: Icdar 2013 robust reading competition
– year: 2014
  ident: 10.1016/j.eswa.2022.117377_b34
– ident: 10.1016/j.eswa.2022.117377_b9
  doi: 10.1109/CVPR.2016.254
– ident: 10.1016/j.eswa.2022.117377_b21
  doi: 10.1609/aaai.v32i1.12246
– volume: 22
  start-page: 38
  issue: 1
  year: 2000
  ident: 10.1016/j.eswa.2022.117377_b26
  article-title: Twenty years of document image analysis in PAMI
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
  doi: 10.1109/34.824820
– volume: 3
  start-page: 201
  issue: 3
  year: 2002
  ident: 10.1016/j.eswa.2022.117377_b7
  article-title: Control of goal-directed and stimulus-driven attention in the brain
  publication-title: Nature Reviews Neuroscience
  doi: 10.1038/nrn755
– ident: 10.1016/j.eswa.2022.117377_b28
  doi: 10.1109/CVPR42600.2020.01354
– year: 2016
  ident: 10.1016/j.eswa.2022.117377_b35
– volume: 39
  start-page: 2298
  issue: 11
  year: 2016
  ident: 10.1016/j.eswa.2022.117377_b31
  article-title: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
  doi: 10.1109/TPAMI.2016.2646371
– ident: 10.1016/j.eswa.2022.117377_b38
– volume: 7
  start-page: 17
  issue: 1–3
  year: 2000
  ident: 10.1016/j.eswa.2022.117377_b29
  article-title: The dynamic representation of scenes
  publication-title: Visual Cognition
  doi: 10.1080/135062800394667
– start-page: 7
  year: 2016
  ident: 10.1016/j.eswa.2022.117377_b22
  article-title: Star-net: a spatial attention residue network for scene text recognition.
– ident: 10.1016/j.eswa.2022.117377_b1
  doi: 10.1109/ICCV.2019.00481
– start-page: 933
  year: 2017
  ident: 10.1016/j.eswa.2022.117377_b8
  article-title: Language modeling with gated convolutional networks
– start-page: 3
  year: 2017
  ident: 10.1016/j.eswa.2022.117377_b42
  article-title: Learning to read irregular text with attention mechanisms.
– ident: 10.1016/j.eswa.2022.117377_b5
  doi: 10.1145/3219819.3219861
– start-page: 1156
  year: 2015
  ident: 10.1016/j.eswa.2022.117377_b17
  article-title: Icdar 2015 competition on robust reading
– ident: 10.1016/j.eswa.2022.117377_b3
  doi: 10.1109/ICCV48922.2021.01467
– ident: 10.1016/j.eswa.2022.117377_b12
  doi: 10.1609/aaai.v34i07.6735
– volume: 1
  start-page: 844
  year: 2017
  ident: 10.1016/j.eswa.2022.117377_b40
  article-title: Attention-based extraction of structured information from street view imagery
– ident: 10.1016/j.eswa.2022.117377_b4
  doi: 10.1109/ICCV.2013.102
– year: 2014
  ident: 10.1016/j.eswa.2022.117377_b14
– ident: 10.1016/j.eswa.2022.117377_b44
  doi: 10.1109/CVPR42600.2020.01213
– ident: 10.1016/j.eswa.2022.117377_b13
  doi: 10.1109/ICCV.2019.00473
– year: 2021
  ident: 10.1016/j.eswa.2022.117377_b46
– volume: 41
  start-page: 2035
  issue: 9
  year: 2018
  ident: 10.1016/j.eswa.2022.117377_b33
  article-title: Aster: An attentional scene text recognizer with flexible rectification
  publication-title: IEEE Transactions on Pattern Analysis and Machine Intelligence
  doi: 10.1109/TPAMI.2018.2848939
– year: 2012
  ident: 10.1016/j.eswa.2022.117377_b25
  article-title: Scene text recognition using higher order language priors
– ident: 10.1016/j.eswa.2022.117377_b20
  doi: 10.1609/aaai.v33i01.33018714
– ident: 10.1016/j.eswa.2022.117377_b23
  doi: 10.1609/aaai.v32i1.12252
– start-page: 1457
  year: 2011
  ident: 10.1016/j.eswa.2022.117377_b37
  article-title: End-to-end scene text recognition
– ident: 10.1016/j.eswa.2022.117377_b41
  doi: 10.1109/ICCV.2019.00922
– volume: 41
  start-page: 8027
  issue: 18
  year: 2014
  ident: 10.1016/j.eswa.2022.117377_b30
  article-title: A robust arbitrary text detection system for natural scene images
  publication-title: Expert Systems with Applications
  doi: 10.1016/j.eswa.2014.07.008
– year: 2014
  ident: 10.1016/j.eswa.2022.117377_b2
– ident: 10.1016/j.eswa.2022.117377_b27
  doi: 10.1109/ICCV.2013.76
– ident: 10.1016/j.eswa.2022.117377_b36
  doi: 10.1609/aaai.v34i07.6891
– year: 2017
  ident: 10.1016/j.eswa.2022.117377_b43
– volume: 116
  start-page: 1
  issue: 1
  year: 2016
  ident: 10.1016/j.eswa.2022.117377_b15
  article-title: Reading text in the wild with convolutional neural networks
  publication-title: International Journal of Computer Vision
  doi: 10.1007/s11263-015-0823-z
SSID ssj0017007
Score 2.4939363
Snippet Scene text recognition (STR) refers to obtaining text information from natural text images. The task is more challenging than the optical character...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 117377
SubjectTerms Attention on attention
Deep neural network
Encoder–decoder framework
Scene text recognition
Title An extended attention mechanism for scene text recognition
URI https://dx.doi.org/10.1016/j.eswa.2022.117377
Volume 203
WOSCitedRecordID wos000804932400004&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: ScienceDirect Freedom Collection - Elsevier
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9QwELbKlgMXSnmIUkA-cFulcuzEjntbVUWAUNVDkVZcIjtx1K7aZLUPKP--4_ixoYWKHrhYKyvxRplPzsz4m28Q-sCFttklklRUEwhQhExkrtKkUtafFZwT3avrfxUnJ8V0Kk99Qn_ZtxMQbVtcX8v5fzU1zIGxbensA8wdF4UJ-A1GhxHMDuM_GX5iZftdZntsxTMdnfHK2BJf2xHD8gqthJMZW9LHODKIvH1mkZtnFisv9BxK4AaH3cFQ0wvVJ1u_nxv_DewPOUyY-7WOORxP_j06V92GVLDo2m7erS8d289LGfSKAbYFyjAnAeFsYLdtkosiyVLXfyfss5SwwU4JQGCugcudTdzlE2YHZvnTKkNRerC5-HfF7FtfssgvDNS1WWnXKO0apVvjEdqmIpfFCG1PPh9Pv8QTJ0FcaX14cl9g5biAt5_kz07MwDE5e4ae-ogCTxwSdtGWaZ-jndCtA_vN-wU6nLQ4AANHYOAIDAzAwD0wsAUGHgDjJfr28fjs6FPiO2ckFSNklQiWMsVVrU0mVJYxXmeqUnWmjYDwVRaS1MKYTGoYCaOp4kI2rFAcggdpCs1eoVHbteY1wo001GjBtbSRM-WyafKiqbOa5hq8VbmH0vAqysrLytvuJpfl342wh8bxnrkTVbn36jy84dK7hc7dKwEw99z35kH_so-ebJD8Fo1Wi7V5hx5XP1YXy8V7j5YblGeDkQ
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+extended+attention+mechanism+for+scene+text+recognition&rft.jtitle=Expert+systems+with+applications&rft.au=Xiao%2C+Zheng&rft.au=Nie%2C+Zhenyu&rft.au=Song%2C+Chao&rft.au=Chronopoulos%2C+Anthony+Theodore&rft.date=2022-10-01&rft.issn=0957-4174&rft.volume=203&rft.spage=117377&rft_id=info:doi/10.1016%2Fj.eswa.2022.117377&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2022_117377
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon