CapViT: Cross-context capsule vision transformers for land cover classification with airborne multispectral LiDAR data

•Capsule vision transformer formulation for entity-aware feature extraction.•Cross-context transformer encoders for high-quality feature embedding.•Dual-path multi-head self-attention modules for feature semantic promotion.•Cross-context capsule vision transformer for land cover classification. Equi...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	International journal of applied earth observation and geoinformation Ročník 111; s. 102837
Hlavní autoři:	Yu, Yongtao, Jiang, Tao, Gao, Junyong, Guan, Haiyan, Li, Dilong, Gao, Shangbing, Tang, E, Wang, Wenhao, Tang, Peng, Li, Jonathan
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Elsevier B.V 01.07.2022
Témata:	Capsule network Cross-context feature data collection Feature self-attention Land cover lidar Multispectral light detection and ranging (MS-LiDAR) spatial data vision Vision transformer Feature self-attention Vision transformer Multispectral light detection and ranging (MS-LiDAR) Land cover Capsule network Cross-context feature
ISSN:	1569-8432, 1872-826X
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	•Capsule vision transformer formulation for entity-aware feature extraction.•Cross-context transformer encoders for high-quality feature embedding.•Dual-path multi-head self-attention modules for feature semantic promotion.•Cross-context capsule vision transformer for land cover classification. Equipped with multiple channels of laser scanners, multispectral light detection and ranging (MS-LiDAR) devices possess more advanced prospects in earth observation tasks compared with their single-band counterparts. It also opens up a potential-competitive solution to conducting land cover mapping with MS-LiDAR devices. In this paper, we develop a cross-context capsule vision transformer (CapViT) to serve for land cover classification with MS-LiDAR data. Specifically, the CapViT is structurized with three streams of capsule transformer encoders, which are stacked by capsule transformer (CapFormer) blocks, to exploit long-range global feature interactions at different context scales. These cross-context feature semantics are finally effectively fused to supervise accurate land cover type inferences. In addition, the CapFormer block parallels dual-path multi-head self-attention modules functioning to interpret both spatial token correlations and channel feature interdependencies, which favor significantly to the semantic promotion of feature encodings. Consequently, with the semantic-promoted feature encodings to boost the feature representation distinctiveness and quality, the land cover classification accuracy is effectively improved. The CapViT is elaborately testified on two MS-LiDAR datasets. Both quantitative assessments and comparative analyses demonstrate the competitive capability and advanced performance of the CapViT in tackling land cover classification issues.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1569-8432 1872-826X
DOI:	10.1016/j.jag.2022.102837