Hybrid WideResNet-Dual Spatial Embedding Vision Transformer with SE blocks: a high-accuracy model for Land Use and Land Cover classification in remote sensing
Uložené v:
| Názov: | Hybrid WideResNet-Dual Spatial Embedding Vision Transformer with SE blocks: a high-accuracy model for Land Use and Land Cover classification in remote sensing |
|---|---|
| Autori: | Ijaz Hussain, Wei Chen, Yasir Iqbal, Anjum Iqbal, Si-Liang Li |
| Zdroj: | Big Earth Data, Pp 1-40 (2025) |
| Informácie o vydavateľovi: | Taylor & Francis Group, 2025. |
| Rok vydania: | 2025 |
| Zbierka: | LCC:Geography. Anthropology. Recreation LCC:Geology |
| Predmety: | Land Use and Land Cover, Wide Residual Network, Vision Transformers, remote sensing, Squeeze-and-Excitation, cross-modal transformers, Geography. Anthropology. Recreation, Geology, QE1-996.5 |
| Popis: | Land Use and Land Cover (LULC) classification is critical for environmental monitoring and sustainable resource management, but faces challenges in accurately capturing complex spatial-spectral features and long-range dependencies in remote sensing imagery. To address this, we introduce a Hybrid Wide Residual Network-Dual Spatial Positional Embedding Vision Transformer (WRN-DSPViT) framework enhanced with Squeeze-and-Excitation (SE) blocks and dual spatial positional embeddings. This model integrates a Wide Residual Network (WRN) for local spatial feature extraction and a Vision Transformer (ViT) with novel dual spatial encoding to capture global context via multi-head self-attention, where SE blocks dynamically recalibrate channel-wise features. Attention pooling is employed to fuse spatial features, allowing for adaptive weighting of important regions in the image, further enhancing classification accuracy. For multimodal hyperspectral-LiDAR data (Houston 2013), we extend this framework with parallel WRN-SE streams and cross-modal transformers, preserving spatial relationships through dual encodings. Evaluated on three benchmarks—EuroSAT (Multispectral), Houston 2013 (hyperspectral-LiDAR), and DeepGlobe (road extraction) —the hybrid WRN-DSPViT achieves state-of-the-art performance: 98.80% accuracy on EuroSAT (3.26 M parameters, 201.68 MFLOPS), 91.24% overall accuracy and 92.18% average accuracy on Houston 2013, and 0.802 F1-score/0.660 IoU on DeepGlobe. |
| Druh dokumentu: | article |
| Popis súboru: | electronic resource |
| Jazyk: | English |
| ISSN: | 2574-5417 2096-4471 |
| Relation: | https://doaj.org/toc/2096-4471; https://doaj.org/toc/2574-5417 |
| DOI: | 10.1080/20964471.2025.2587987 |
| Prístupová URL adresa: | https://doaj.org/article/76a60f54924449f08b48822a657b7002 |
| Prístupové číslo: | edsdoj.76a60f54924449f08b48822a657b7002 |
| Databáza: | Directory of Open Access Journals |
| Abstrakt: | Land Use and Land Cover (LULC) classification is critical for environmental monitoring and sustainable resource management, but faces challenges in accurately capturing complex spatial-spectral features and long-range dependencies in remote sensing imagery. To address this, we introduce a Hybrid Wide Residual Network-Dual Spatial Positional Embedding Vision Transformer (WRN-DSPViT) framework enhanced with Squeeze-and-Excitation (SE) blocks and dual spatial positional embeddings. This model integrates a Wide Residual Network (WRN) for local spatial feature extraction and a Vision Transformer (ViT) with novel dual spatial encoding to capture global context via multi-head self-attention, where SE blocks dynamically recalibrate channel-wise features. Attention pooling is employed to fuse spatial features, allowing for adaptive weighting of important regions in the image, further enhancing classification accuracy. For multimodal hyperspectral-LiDAR data (Houston 2013), we extend this framework with parallel WRN-SE streams and cross-modal transformers, preserving spatial relationships through dual encodings. Evaluated on three benchmarks—EuroSAT (Multispectral), Houston 2013 (hyperspectral-LiDAR), and DeepGlobe (road extraction) —the hybrid WRN-DSPViT achieves state-of-the-art performance: 98.80% accuracy on EuroSAT (3.26 M parameters, 201.68 MFLOPS), 91.24% overall accuracy and 92.18% average accuracy on Houston 2013, and 0.802 F1-score/0.660 IoU on DeepGlobe. |
|---|---|
| ISSN: | 25745417 20964471 |
| DOI: | 10.1080/20964471.2025.2587987 |
Nájsť tento článok vo Web of Science