An extended attention mechanism for scene text recognition

Scene text recognition (STR) refers to obtaining text information from natural text images. The task is more challenging than the optical character recognition(OCR) due to the variability of scenes. Attention mechanism, which assigns different weights to each feature vector at each time step, guides...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Expert systems with applications Jg. 203; S. 117377
Hauptverfasser:	Xiao, Zheng, Nie, Zhenyu, Song, Chao, Chronopoulos, Anthony Theodore
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Elsevier Ltd 01.10.2022
Schlagworte:	Attention on attention Deep neural network Encoder–decoder framework Scene text recognition Attention on attention Deep neural network Encoder–decoder framework Scene text recognition
ISSN:	0957-4174, 1873-6793
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Scene text recognition (STR) refers to obtaining text information from natural text images. The task is more challenging than the optical character recognition(OCR) due to the variability of scenes. Attention mechanism, which assigns different weights to each feature vector at each time step, guides the text recognition decoding process. However, when the given query and the key/value are not related, the generated attention result will contain irrelevant information, which could lead the model to give wrong results. In this paper, we propose an extended attention-based framework for STR tasks. In particular, we have integrated an extended attention mechanism named Attention on Attention (AoA), which is able to determine the relevance between attention results and queries, into both the encoder and the decoder of a common text recognition framework. By two separate linear functions, the AoA module generates an information vector and an attention gate using the attention result and the current context. Then AoA adds new attention by applying element-wise multiplication to acquire final attended information. Our method is compared with seven benchmarks over eight datasets. Experimental results show that our method outperforms all the seven benchmarks, by 6.7% and 1.4% than the worst and best works on average. •An extended attention mechanism is designed to recognize scene texts.•Attention on Attention(AoA) is applied in the Encoder and Decoder modules.•The method outperforms seven benchmarks by 4.5% averagely over 10 datasets.
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2022.117377