TMAR: 3-D Transformer Network via Masked Autoencoder Regularization for Hyperspectral Sharpening

Fusion-based hyperspectral super-resolution techniques are utilized to increase the spatial resolution of a hyperspectral image (HSI) by fusing it with a high spatial resolution assistive image. Transformers have shown high efficiency in vision tasks due to their ability to learn global and long-ran...

Full description

Saved in:
Bibliographic Details
Published in:IEEE journal of selected topics in applied earth observations and remote sensing Vol. 18; pp. 15845 - 15862
Main Authors: Dehghan, Zeinab, Yang, Jingxiang, Yazdi, Mehran, Khader, Abdolraheem, Xiao, Liang
Format: Journal Article
Language:English
Published: Piscataway IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
ISSN:1939-1404, 2151-1535
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Fusion-based hyperspectral super-resolution techniques are utilized to increase the spatial resolution of a hyperspectral image (HSI) by fusing it with a high spatial resolution assistive image. Transformers have shown high efficiency in vision tasks due to their ability to learn global and long-range information. Many networks have utilized them in vision tasks such as super-resolution. However, the employment of convolutional neural networks (CNN) or transformers often results in considerable computational complexity. In addition, prior networks often overlook the regularization term separately, which is an absent factor in these networks. In this study, we focus on leveraging the power of CNN and transformer models and propose a multistage deep transformer-based super-resolution network that is regularized via an asymmetric autoencoder structure. In addition, we utilize a 3-D convolution layer in the light transformer structure because it allows for more flexible computation of correlations between HSI layers and better capturing of dependencies within spectral-spatial features. We apply a spectral masking autoencoder in an asymmetric structure to extract superior prior features from training data and regularize the network. Experimental results on remote sensing HSI datasets demonstrate that our proposed network provides superior efficiency compared to the state-of-the-art fusion-based super-resolution approaches.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1939-1404
2151-1535
DOI:10.1109/JSTARS.2025.3580093