Zobrazit v EDS

Hybrid WideResNet-Dual Spatial Embedding Vision Transformer with SE blocks: a high-accuracy model for Land Use and Land Cover classification in remote sensing

Uloženo v:

Podrobná bibliografie
Název:	Hybrid WideResNet-Dual Spatial Embedding Vision Transformer with SE blocks: a high-accuracy model for Land Use and Land Cover classification in remote sensing
Autoři:	Ijaz Hussain, Wei Chen, Yasir Iqbal, Anjum Iqbal, Si-Liang Li
Zdroj:	Big Earth Data, Pp 1-40 (2025)
Informace o vydavateli:	Taylor & Francis Group, 2025.
Rok vydání:	2025
Sbírka:	LCC:Geography. Anthropology. Recreation LCC:Geology
Témata:	Land Use and Land Cover, Wide Residual Network, Vision Transformers, remote sensing, Squeeze-and-Excitation, cross-modal transformers, Geography. Anthropology. Recreation, Geology, QE1-996.5
Popis:	Land Use and Land Cover (LULC) classification is critical for environmental monitoring and sustainable resource management, but faces challenges in accurately capturing complex spatial-spectral features and long-range dependencies in remote sensing imagery. To address this, we introduce a Hybrid Wide Residual Network-Dual Spatial Positional Embedding Vision Transformer (WRN-DSPViT) framework enhanced with Squeeze-and-Excitation (SE) blocks and dual spatial positional embeddings. This model integrates a Wide Residual Network (WRN) for local spatial feature extraction and a Vision Transformer (ViT) with novel dual spatial encoding to capture global context via multi-head self-attention, where SE blocks dynamically recalibrate channel-wise features. Attention pooling is employed to fuse spatial features, allowing for adaptive weighting of important regions in the image, further enhancing classification accuracy. For multimodal hyperspectral-LiDAR data (Houston 2013), we extend this framework with parallel WRN-SE streams and cross-modal transformers, preserving spatial relationships through dual encodings. Evaluated on three benchmarks—EuroSAT (Multispectral), Houston 2013 (hyperspectral-LiDAR), and DeepGlobe (road extraction) —the hybrid WRN-DSPViT achieves state-of-the-art performance: 98.80% accuracy on EuroSAT (3.26 M parameters, 201.68 MFLOPS), 91.24% overall accuracy and 92.18% average accuracy on Houston 2013, and 0.802 F1-score/0.660 IoU on DeepGlobe.
Druh dokumentu:	article
Popis souboru:	electronic resource
Jazyk:	English
ISSN:	2574-5417 2096-4471
Relation:	https://doaj.org/toc/2096-4471; https://doaj.org/toc/2574-5417
DOI:	10.1080/20964471.2025.2587987
Přístupová URL adresa:	https://doaj.org/article/76a60f54924449f08b48822a657b7002
Přístupové číslo:	edsdoj.76a60f54924449f08b48822a657b7002
Databáze:	Directory of Open Access Journals

View record in DOAJ

Nájsť tento článok vo Web of Science

Popis
Abstrakt:	Land Use and Land Cover (LULC) classification is critical for environmental monitoring and sustainable resource management, but faces challenges in accurately capturing complex spatial-spectral features and long-range dependencies in remote sensing imagery. To address this, we introduce a Hybrid Wide Residual Network-Dual Spatial Positional Embedding Vision Transformer (WRN-DSPViT) framework enhanced with Squeeze-and-Excitation (SE) blocks and dual spatial positional embeddings. This model integrates a Wide Residual Network (WRN) for local spatial feature extraction and a Vision Transformer (ViT) with novel dual spatial encoding to capture global context via multi-head self-attention, where SE blocks dynamically recalibrate channel-wise features. Attention pooling is employed to fuse spatial features, allowing for adaptive weighting of important regions in the image, further enhancing classification accuracy. For multimodal hyperspectral-LiDAR data (Houston 2013), we extend this framework with parallel WRN-SE streams and cross-modal transformers, preserving spatial relationships through dual encodings. Evaluated on three benchmarks—EuroSAT (Multispectral), Houston 2013 (hyperspectral-LiDAR), and DeepGlobe (road extraction) —the hybrid WRN-DSPViT achieves state-of-the-art performance: 98.80% accuracy on EuroSAT (3.26 M parameters, 201.68 MFLOPS), 91.24% overall accuracy and 92.18% average accuracy on Houston 2013, and 0.802 F1-score/0.660 IoU on DeepGlobe.
ISSN:	25745417 20964471
DOI:	10.1080/20964471.2025.2587987