Multi-Scale Spatial Perception Attention Network for Few-Shot Hyperspectral Image Classification

In hyperspectral image (HSI) classification, combining the strengths of convolutional neural networks (CNNs) and Transformers can significantly enhance classification performance and model robustness. However, neural networks that combine CNNs and Transformers face classification accuracy and genera...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	IEEE access Ročník 12; s. 173076 - 173090
Hlavní autori:	Li, Yang, Luo, Jian, Long, Haoyu, Jin, Qianqian
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Piscataway IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:	Artificial neural networks attention Classification Convolution Convolutional neural networks Data mining Decoding Encoders-Decoders Feature extraction few-shot learning fully convolutional network (FCN) Hyperspectral image (HSI) classification Hyperspectral imaging Image classification Image enhancement Kernel Modules multi-scale Neural networks Perception Semantics Spatial data Training transformer Transformers
ISSN:	2169-3536, 2169-3536
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	In hyperspectral image (HSI) classification, combining the strengths of convolutional neural networks (CNNs) and Transformers can significantly enhance classification performance and model robustness. However, neural networks that combine CNNs and Transformers face classification accuracy and generalization limitations when dealing with imbalanced class samples, particularly in few-shot training scenarios. To solve the above problems, we propose a multi-scale spatial perception attention network (Ms-SPA) for few-shot HSI classification in this article. This method is based on an encoder-decoder fully convolutional network (FCN) architecture, where the encoder combines a convolutional neural network (CNN) with a Transformer module to extract local and global spatial-spectral joint features simultaneously. In the encoder, the spatial contraction perception Transformer (SCPFormer) is first proposed to improve the model's capacity for perceiving global-local joint features. Next, the multi-scale spatial attention (MSSA) module is proposed to capture spatial information at different convolution kernel scales and cascade them to form a more comprehensive representation structure. In the decoder, adaptive residual aggregation (ARA) is proposed to embed high-level semantic information into low-level features using a residual structure, thereby enhancing the perception of contextual information. A weighted CL-MixedLoss function (CL-MixedLoss) is proposed to solve the problem of imbalanced heterogeneous pixels in HSIs. Experimental results on three renowned HSI datasets indicate that our model achieves optimal classification performance, exceeding 95%, even when trained with a limited number of class samples.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3501412