MCPT: Mixed Convolutional Parallel Transformer for Polarimetric SAR Image Classification

Uloženo v:
Podrobná bibliografie
Název: MCPT: Mixed Convolutional Parallel Transformer for Polarimetric SAR Image Classification
Autoři: Wenke Wang, Jianlong Wang, Bibo Lu, Boyuan Liu, Yake Zhang, Chunyang Wang
Zdroj: Remote Sensing, Vol 15, Iss 11, p 2936 (2023)
Informace o vydavateli: MDPI AG
Rok vydání: 2023
Sbírka: Directory of Open Access Journals: DOAJ Articles
Témata: polarimetric SAR, convolutional neural network, vision transformer, mixed depthwise convolution tokenization, parallel encoder, global average pooling, Science
Popis: Vision transformers (ViT) have the characteristics of massive training data and complex model, which cannot be directly applied to polarimetric synthetic aperture radar (PolSAR) image classification tasks. Therefore, a mixed convolutional parallel transformer (MCPT) model based on ViT is proposed for fast PolSAR image classification. First of all, a mixed depthwise convolution tokenization is introduced. It replaces the learnable linear projection in the original ViT to obtain patch embeddings. The process of tokenization can reduce computational and parameter complexity and extract features of different receptive fields as input to the encoder. Furthermore, combining the idea of shallow networks with lower latency and easier optimization, a parallel encoder is implemented by pairing the same modules and recombining to form parallel blocks, which can decrease the network depth and computing power requirement. In addition, the original class embedding and position embedding are removed during tokenization, and a global average pooling layer is added after the encoder for category feature extraction. Finally, the experimental results on AIRSAR Flevoland and RADARSAT-2 San Francisco datasets show that the proposed method achieves a significant improvement in training and prediction speed. Meanwhile, the overall accuracy achieved was 97.9% and 96.77%, respectively.
Druh dokumentu: article in journal/newspaper
Jazyk: English
Relation: https://www.mdpi.com/2072-4292/15/11/2936; https://doaj.org/toc/2072-4292; https://doaj.org/article/1d3f011a206e47dd9deda43a270647b0
DOI: 10.3390/rs15112936
Dostupnost: https://doi.org/10.3390/rs15112936
https://doaj.org/article/1d3f011a206e47dd9deda43a270647b0
Přístupové číslo: edsbas.FC53F6DB
Databáze: BASE
Popis
Abstrakt:Vision transformers (ViT) have the characteristics of massive training data and complex model, which cannot be directly applied to polarimetric synthetic aperture radar (PolSAR) image classification tasks. Therefore, a mixed convolutional parallel transformer (MCPT) model based on ViT is proposed for fast PolSAR image classification. First of all, a mixed depthwise convolution tokenization is introduced. It replaces the learnable linear projection in the original ViT to obtain patch embeddings. The process of tokenization can reduce computational and parameter complexity and extract features of different receptive fields as input to the encoder. Furthermore, combining the idea of shallow networks with lower latency and easier optimization, a parallel encoder is implemented by pairing the same modules and recombining to form parallel blocks, which can decrease the network depth and computing power requirement. In addition, the original class embedding and position embedding are removed during tokenization, and a global average pooling layer is added after the encoder for category feature extraction. Finally, the experimental results on AIRSAR Flevoland and RADARSAT-2 San Francisco datasets show that the proposed method achieves a significant improvement in training and prediction speed. Meanwhile, the overall accuracy achieved was 97.9% and 96.77%, respectively.
DOI:10.3390/rs15112936