MAE-CoReNet: Masking Autoencoder-Convolutional Reformer Net

Transformers are widely used in computer vision for feature extraction, object detection, and image classification. Many methods boost performance by adding convolutional or attention layers, yet large models cause high training costs. This manuscript suggests a Masking Autoencoder-Convolutional Ref...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:International journal of information system modeling and design Ročník 16; číslo 1; s. 1 - 23
Hlavní autori: Wang, Di, Wang, Li, Zhou, Yuyang
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Hershey IGI Global 01.01.2025
Predmet:
ISSN:1947-8186, 1947-8194
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Transformers are widely used in computer vision for feature extraction, object detection, and image classification. Many methods boost performance by adding convolutional or attention layers, yet large models cause high training costs. This manuscript suggests a Masking Autoencoder-Convolutional Reformer Net (MAE-CoReNet) to enhance accuracy with less training time. It employs an attention mechanism with Locality-sensitive Hashing (LSH) to cut training time and increase classification accuracy. Also, it uses the masking technique for module pre-training to improve results. The experimental results show that the model in this manuscript performs well on the CIFAR-100 dataset. Compared to the Convolutional Attention Transformer Network (CoAtNet), MAE-CoReNet's convergence speed decreases from 65 to 25 epochs, and its accuracy increases from 53.1% to 90.2%. Additionally, when compared with other models on the ImageNet22k dataset, this model achieves the highest accuracy and fastest convergence speed. Its convergence speed is 55 epochs and the accuracy is 89.5%.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1947-8186
1947-8194
DOI:10.4018/IJISMD.371399