PVT v2: Improved baselines with Pyramid Vision Transformer

Transformers have recently lead to encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs: (i) a linear complexity attention layer, (ii) an overlapping patch embedding, and (iii) a convolut...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computational visual media (Beijing) Jg. 8; H. 3; S. 415 - 424
Hauptverfasser:	Wang, Wenhai, Xie, Enze, Li, Xiang, Fan, Deng-Ping, Song, Kaitao, Liang, Ding, Lu, Tong, Luo, Ping, Shao, Ling
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Beijing Tsinghua University Press 01.09.2022 Springer Nature B.V
Schlagworte:	Archives & records Artificial Intelligence Classification Complexity Computer Graphics Computer Science Computer vision Image Processing and Computer Vision Linearity Research Article Segmentation Semantics User Interfaces and Human Computer Interaction transformers object detection semantic segmentation image classification dense prediction
ISSN:	2096-0433, 2096-0662
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Transformers have recently lead to encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs: (i) a linear complexity attention layer, (ii) an overlapping patch embedding, and (iii) a convolutional feed-forward network. With these modifications, PVT v2 reduces the computational complexity of PVT v1 to linearity and provides significant improvements on fundamental vision tasks such as classification, detection, and segmentation. In particular, PVT v2 achieves comparable or better performance than recent work such as the Swin transformer. We hope this work will facilitate state-of-the-art transformer research in computer vision. Code is available at https://github.com/whai362/PVT .
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2096-0433 2096-0662
DOI:	10.1007/s41095-022-0274-8