Hybrid WideResNet-Dual Spatial Embedding Vision Transformer with SE blocks: a high-accuracy model for Land Use and Land Cover classification in remote sensing

Uložené v:
Podrobná bibliografia
Názov: Hybrid WideResNet-Dual Spatial Embedding Vision Transformer with SE blocks: a high-accuracy model for Land Use and Land Cover classification in remote sensing
Autori: Ijaz Hussain, Wei Chen, Yasir Iqbal, Anjum Iqbal, Si-Liang Li
Zdroj: Big Earth Data, Pp 1-40 (2025)
Informácie o vydavateľovi: Taylor & Francis Group, 2025.
Rok vydania: 2025
Zbierka: LCC:Geography. Anthropology. Recreation
LCC:Geology
Predmety: Land Use and Land Cover, Wide Residual Network, Vision Transformers, remote sensing, Squeeze-and-Excitation, cross-modal transformers, Geography. Anthropology. Recreation, Geology, QE1-996.5
Popis: Land Use and Land Cover (LULC) classification is critical for environmental monitoring and sustainable resource management, but faces challenges in accurately capturing complex spatial-spectral features and long-range dependencies in remote sensing imagery. To address this, we introduce a Hybrid Wide Residual Network-Dual Spatial Positional Embedding Vision Transformer (WRN-DSPViT) framework enhanced with Squeeze-and-Excitation (SE) blocks and dual spatial positional embeddings. This model integrates a Wide Residual Network (WRN) for local spatial feature extraction and a Vision Transformer (ViT) with novel dual spatial encoding to capture global context via multi-head self-attention, where SE blocks dynamically recalibrate channel-wise features. Attention pooling is employed to fuse spatial features, allowing for adaptive weighting of important regions in the image, further enhancing classification accuracy. For multimodal hyperspectral-LiDAR data (Houston 2013), we extend this framework with parallel WRN-SE streams and cross-modal transformers, preserving spatial relationships through dual encodings. Evaluated on three benchmarks—EuroSAT (Multispectral), Houston 2013 (hyperspectral-LiDAR), and DeepGlobe (road extraction) —the hybrid WRN-DSPViT achieves state-of-the-art performance: 98.80% accuracy on EuroSAT (3.26 M parameters, 201.68 MFLOPS), 91.24% overall accuracy and 92.18% average accuracy on Houston 2013, and 0.802 F1-score/0.660 IoU on DeepGlobe.
Druh dokumentu: article
Popis súboru: electronic resource
Jazyk: English
ISSN: 2574-5417
2096-4471
Relation: https://doaj.org/toc/2096-4471; https://doaj.org/toc/2574-5417
DOI: 10.1080/20964471.2025.2587987
Prístupová URL adresa: https://doaj.org/article/76a60f54924449f08b48822a657b7002
Prístupové číslo: edsdoj.76a60f54924449f08b48822a657b7002
Databáza: Directory of Open Access Journals
Popis
Abstrakt:Land Use and Land Cover (LULC) classification is critical for environmental monitoring and sustainable resource management, but faces challenges in accurately capturing complex spatial-spectral features and long-range dependencies in remote sensing imagery. To address this, we introduce a Hybrid Wide Residual Network-Dual Spatial Positional Embedding Vision Transformer (WRN-DSPViT) framework enhanced with Squeeze-and-Excitation (SE) blocks and dual spatial positional embeddings. This model integrates a Wide Residual Network (WRN) for local spatial feature extraction and a Vision Transformer (ViT) with novel dual spatial encoding to capture global context via multi-head self-attention, where SE blocks dynamically recalibrate channel-wise features. Attention pooling is employed to fuse spatial features, allowing for adaptive weighting of important regions in the image, further enhancing classification accuracy. For multimodal hyperspectral-LiDAR data (Houston 2013), we extend this framework with parallel WRN-SE streams and cross-modal transformers, preserving spatial relationships through dual encodings. Evaluated on three benchmarks—EuroSAT (Multispectral), Houston 2013 (hyperspectral-LiDAR), and DeepGlobe (road extraction) —the hybrid WRN-DSPViT achieves state-of-the-art performance: 98.80% accuracy on EuroSAT (3.26 M parameters, 201.68 MFLOPS), 91.24% overall accuracy and 92.18% average accuracy on Houston 2013, and 0.802 F1-score/0.660 IoU on DeepGlobe.
ISSN:25745417
20964471
DOI:10.1080/20964471.2025.2587987