Bidirectional trained tree-structured decoder for Handwritten Mathematical Expression Recognition
The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of Optical Character Recognition (OCR). Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models. However, existing met...
Saved in:
| Published in: | Pattern recognition Vol. 165; p. 111599 |
|---|---|
| Main Authors: | , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Elsevier Ltd
01.09.2025
|
| Subjects: | |
| ISSN: | 0031-3203 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of Optical Character Recognition (OCR). Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models. However, existing methods fail to effectively utilize bidirectional context information during the inference stage. Furthermore, current bidirectional training methods are primarily designed for string decoders and cannot adequately generalize to tree decoders, which offer superior generalization capabilities and structural analysis capacity. To overcome these limitations, we propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure. Our method extends the bidirectional training strategy to the tree decoder, enabling more effective training by leveraging bidirectional information. Additionally, we analyze the impact of the visual and linguistic perception of the HMER model separately and introduce the Shared Language Modeling (SLM) mechanism. Through the SLM, we enhance the model’s robustness and generalization when dealing with visual ambiguity, especially in scenarios with abundant training data. Our approach has been validated through extensive experiments, demonstrating its ability to achieve new state-of-the-art results on the CROHME 2014, 2016, and 2019 datasets, as well as the HME100K dataset. The code used in our experiments will be publicly available at https://github.com/Hanbo-Cheng/BAT.git.
•We propose MF-SLT and BAT to add bidirectional context to Tree Decoder.•We propose SLM to enhance language perception without additional parameters.•Our method achieves SOTA results, generalizing well to tree and string decoders. |
|---|---|
| ISSN: | 0031-3203 |
| DOI: | 10.1016/j.patcog.2025.111599 |