Multi-Scale Kolmogorov-Arnold Network (KAN)-Based Linear Attention Network: Multi-Scale Feature Fusion with KAN and Deformable Convolution for Urban Scene Image Semantic Segmentation

The introduction of an attention mechanism in remote sensing image segmentation improves the accuracy of the segmentation. In this paper, a novel multi-scale KAN-based linear attention (MKLA) segmentation network of MKLANet is developed to promote a better segmentation result. A hybrid global–local...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Remote sensing (Basel, Switzerland) Ročník 17; číslo 5; s. 802
Hlavní autori:	Li, Yuanhang, Liu, Shuo, Wu, Jie, Sun, Weichao, Wen, Qingke, Wu, Yibiao, Qin, Xiujuan, Qiao, Yanyou
Médium:	Journal Article
Jazyk:	English
Vydavateľské údaje:	Basel MDPI AG 01.03.2025
Predmet:	Accuracy Analysis attention mechanism Blocking Coders Convolution Datasets deformable convolution Deformation Feature extraction Formability Image processing Image segmentation multi-scale feature fusion Remote sensing remote sensing image segmentation Semantic segmentation Semantics Spatial data Washington, D.C
ISSN:	2072-4292, 2072-4292
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	The introduction of an attention mechanism in remote sensing image segmentation improves the accuracy of the segmentation. In this paper, a novel multi-scale KAN-based linear attention (MKLA) segmentation network of MKLANet is developed to promote a better segmentation result. A hybrid global–local attention mechanism in a feature decoder is designed to enhance the ability of aggregating the global–local context and avoiding potential blocking artifacts for feature extraction and segmentation. The local attention channel adopts MKLA block by bringing the merits of KAN convolution in Mamba like the linear attention block to improve the ability of handling linear and nonlinear feature and complex function approximation with a few extra computations. The global attention channel uses long-range cascade encoder–decoder block, where it mainly employs the 7 × 7 depth-wise convolution token mixer and lightweight 7 × 7 dilated deep convolution to capture the long-distance spatial features field and retain key spatial information. In addition, to enrich the input of the attention block, a deformable convolution module is developed between the encoder output and corresponding scale decoder, which can improve the expression ability of the segmentation model without increasing the depth of the network. The experimental results of the Vaihingen dataset (83.68% in mIoU, 92.98 in OA, and 91.08 in mF1), the UAVid dataset (69.78% in mIoU, 96.51 in OA), the LoveDA dataset (51.53% in mIoU, 86.42% in OA, and 67.19% in mF1), and the Potsdam dataset (97.14% in mIoU, 92.64% in OA, and 93.8% in mF1) outperform other advanced attention-based approaches in terms of small targets and edges’ segmentation.
Bibliografia:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2072-4292 2072-4292
DOI:	10.3390/rs17050802