Hybrid WideResNet-Dual Spatial Embedding Vision Transformer with SE blocks: a high-accuracy model for Land Use and Land Cover classification in remote sensing

Uloženo v:
Podrobná bibliografie
Název: Hybrid WideResNet-Dual Spatial Embedding Vision Transformer with SE blocks: a high-accuracy model for Land Use and Land Cover classification in remote sensing
Autoři: Ijaz Hussain, Wei Chen, Yasir Iqbal, Anjum Iqbal, Si-Liang Li
Zdroj: Big Earth Data, Pp 1-40 (2025)
Informace o vydavateli: Taylor & Francis Group, 2025.
Rok vydání: 2025
Sbírka: LCC:Geography. Anthropology. Recreation
LCC:Geology
Témata: Land Use and Land Cover, Wide Residual Network, Vision Transformers, remote sensing, Squeeze-and-Excitation, cross-modal transformers, Geography. Anthropology. Recreation, Geology, QE1-996.5
Popis: Land Use and Land Cover (LULC) classification is critical for environmental monitoring and sustainable resource management, but faces challenges in accurately capturing complex spatial-spectral features and long-range dependencies in remote sensing imagery. To address this, we introduce a Hybrid Wide Residual Network-Dual Spatial Positional Embedding Vision Transformer (WRN-DSPViT) framework enhanced with Squeeze-and-Excitation (SE) blocks and dual spatial positional embeddings. This model integrates a Wide Residual Network (WRN) for local spatial feature extraction and a Vision Transformer (ViT) with novel dual spatial encoding to capture global context via multi-head self-attention, where SE blocks dynamically recalibrate channel-wise features. Attention pooling is employed to fuse spatial features, allowing for adaptive weighting of important regions in the image, further enhancing classification accuracy. For multimodal hyperspectral-LiDAR data (Houston 2013), we extend this framework with parallel WRN-SE streams and cross-modal transformers, preserving spatial relationships through dual encodings. Evaluated on three benchmarks—EuroSAT (Multispectral), Houston 2013 (hyperspectral-LiDAR), and DeepGlobe (road extraction) —the hybrid WRN-DSPViT achieves state-of-the-art performance: 98.80% accuracy on EuroSAT (3.26 M parameters, 201.68 MFLOPS), 91.24% overall accuracy and 92.18% average accuracy on Houston 2013, and 0.802 F1-score/0.660 IoU on DeepGlobe.
Druh dokumentu: article
Popis souboru: electronic resource
Jazyk: English
ISSN: 2574-5417
2096-4471
Relation: https://doaj.org/toc/2096-4471; https://doaj.org/toc/2574-5417
DOI: 10.1080/20964471.2025.2587987
Přístupová URL adresa: https://doaj.org/article/76a60f54924449f08b48822a657b7002
Přístupové číslo: edsdoj.76a60f54924449f08b48822a657b7002
Databáze: Directory of Open Access Journals
Popis
Abstrakt:Land Use and Land Cover (LULC) classification is critical for environmental monitoring and sustainable resource management, but faces challenges in accurately capturing complex spatial-spectral features and long-range dependencies in remote sensing imagery. To address this, we introduce a Hybrid Wide Residual Network-Dual Spatial Positional Embedding Vision Transformer (WRN-DSPViT) framework enhanced with Squeeze-and-Excitation (SE) blocks and dual spatial positional embeddings. This model integrates a Wide Residual Network (WRN) for local spatial feature extraction and a Vision Transformer (ViT) with novel dual spatial encoding to capture global context via multi-head self-attention, where SE blocks dynamically recalibrate channel-wise features. Attention pooling is employed to fuse spatial features, allowing for adaptive weighting of important regions in the image, further enhancing classification accuracy. For multimodal hyperspectral-LiDAR data (Houston 2013), we extend this framework with parallel WRN-SE streams and cross-modal transformers, preserving spatial relationships through dual encodings. Evaluated on three benchmarks—EuroSAT (Multispectral), Houston 2013 (hyperspectral-LiDAR), and DeepGlobe (road extraction) —the hybrid WRN-DSPViT achieves state-of-the-art performance: 98.80% accuracy on EuroSAT (3.26 M parameters, 201.68 MFLOPS), 91.24% overall accuracy and 92.18% average accuracy on Houston 2013, and 0.802 F1-score/0.660 IoU on DeepGlobe.
ISSN:25745417
20964471
DOI:10.1080/20964471.2025.2587987