SRCBTFusion-Net: An Efficient Fusion Architecture via Stacked Residual Convolution Blocks and Transformer for Remote Sensing Image Semantic Segmentation

Convolutional neural network (CNN) and transformer-based self-attention models have their advantages in extracting local information and global semantic information, and it is a trend to design a model combining stacked residual convolution blocks (SRCBs) and transformer. How to efficiently integrat...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on geoscience and remote sensing Ročník 61; s. 1 - 16
Hlavní autoři: Chen, Junsong, Yi, Jizheng, Chen, Aibin, Lin, Hui
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:0196-2892, 1558-0644
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Convolutional neural network (CNN) and transformer-based self-attention models have their advantages in extracting local information and global semantic information, and it is a trend to design a model combining stacked residual convolution blocks (SRCBs) and transformer. How to efficiently integrate the two mechanisms to improve the segmentation effect of remote sensing (RS) images is an urgent problem to be solved. An efficient fusion via SRCB and transformer (SRCBTFusion-Net) is proposed as a new semantic segmentation architecture for RS images. The SRCBTFusion-Net adopts an encoder-decoder structure and the Transformer is embedded into SRCB to form a double coding structure, then the coding features are upsampled and fused with multiscale features of SRCB to form a decoding structure. First, a semantic information enhancement module (SIEM) is proposed to get global clues for enhancing deep semantic information. Subsequently, the relationship guidance module (RGM) is incorporated to reencode the decoder's upsampled feature maps, enhancing the edge segmentation performance. Second, a multipath atrous self-attention module (MASM) is developed to enhance the effective selection and weighting of low-level features, effectively reducing the potential confusion introduced by the skip connections between low- and high-level features. Finally, a multiscale feature aggregation module (MFAM) is developed to enhance the extraction of semantic and contextual information, thus alleviating the loss of image feature information and improving the ability to identify similar categories. The proposed SRCBTFusion-Net's performance on the Vaihingen and Potsdam datasets is superior to the state-of-the-art methods. The code will be freely available at https://github.com/js257/SRCBTFusion-Net .
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2023.3336689