MAE-CoReNet: Masking Autoencoder-Convolutional Reformer Net

Transformers are widely used in computer vision for feature extraction, object detection, and image classification. Many methods boost performance by adding convolutional or attention layers, yet large models cause high training costs. This manuscript suggests a Masking Autoencoder-Convolutional Ref...

Full description

Saved in:
Bibliographic Details
Published in:International journal of information system modeling and design Vol. 16; no. 1; pp. 1 - 23
Main Authors: Wang, Di, Wang, Li, Zhou, Yuyang
Format: Journal Article
Language:English
Published: Hershey IGI Global 01.01.2025
Subjects:
ISSN:1947-8186, 1947-8194
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Transformers are widely used in computer vision for feature extraction, object detection, and image classification. Many methods boost performance by adding convolutional or attention layers, yet large models cause high training costs. This manuscript suggests a Masking Autoencoder-Convolutional Reformer Net (MAE-CoReNet) to enhance accuracy with less training time. It employs an attention mechanism with Locality-sensitive Hashing (LSH) to cut training time and increase classification accuracy. Also, it uses the masking technique for module pre-training to improve results. The experimental results show that the model in this manuscript performs well on the CIFAR-100 dataset. Compared to the Convolutional Attention Transformer Network (CoAtNet), MAE-CoReNet's convergence speed decreases from 65 to 25 epochs, and its accuracy increases from 53.1% to 90.2%. Additionally, when compared with other models on the ImageNet22k dataset, this model achieves the highest accuracy and fastest convergence speed. Its convergence speed is 55 epochs and the accuracy is 89.5%.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1947-8186
1947-8194
DOI:10.4018/IJISMD.371399