SRCBTFusion-Net: An Efficient Fusion Architecture via Stacked Residual Convolution Blocks and Transformer for Remote Sensing Image Semantic Segmentation
Convolutional neural network (CNN) and transformer-based self-attention models have their advantages in extracting local information and global semantic information, and it is a trend to design a model combining stacked residual convolution blocks (SRCBs) and transformer. How to efficiently integrat...
Uloženo v:
| Vydáno v: | IEEE transactions on geoscience and remote sensing Ročník 61; s. 1 - 16 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York
IEEE
2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Témata: | |
| ISSN: | 0196-2892, 1558-0644 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Convolutional neural network (CNN) and transformer-based self-attention models have their advantages in extracting local information and global semantic information, and it is a trend to design a model combining stacked residual convolution blocks (SRCBs) and transformer. How to efficiently integrate the two mechanisms to improve the segmentation effect of remote sensing (RS) images is an urgent problem to be solved. An efficient fusion via SRCB and transformer (SRCBTFusion-Net) is proposed as a new semantic segmentation architecture for RS images. The SRCBTFusion-Net adopts an encoder-decoder structure and the Transformer is embedded into SRCB to form a double coding structure, then the coding features are upsampled and fused with multiscale features of SRCB to form a decoding structure. First, a semantic information enhancement module (SIEM) is proposed to get global clues for enhancing deep semantic information. Subsequently, the relationship guidance module (RGM) is incorporated to reencode the decoder's upsampled feature maps, enhancing the edge segmentation performance. Second, a multipath atrous self-attention module (MASM) is developed to enhance the effective selection and weighting of low-level features, effectively reducing the potential confusion introduced by the skip connections between low- and high-level features. Finally, a multiscale feature aggregation module (MFAM) is developed to enhance the extraction of semantic and contextual information, thus alleviating the loss of image feature information and improving the ability to identify similar categories. The proposed SRCBTFusion-Net's performance on the Vaihingen and Potsdam datasets is superior to the state-of-the-art methods. The code will be freely available at https://github.com/js257/SRCBTFusion-Net . |
|---|---|
| Bibliografie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0196-2892 1558-0644 |
| DOI: | 10.1109/TGRS.2023.3336689 |