PVT v2: Improved baselines with Pyramid Vision Transformer

Transformers have recently lead to encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs: (i) a linear complexity attention layer, (ii) an overlapping patch embedding, and (iii) a convolut...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Computational visual media (Beijing) Ročník 8; číslo 3; s. 415 - 424
Hlavní autoři:	Wang, Wenhai, Xie, Enze, Li, Xiang, Fan, Deng-Ping, Song, Kaitao, Liang, Ding, Lu, Tong, Luo, Ping, Shao, Ling
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Beijing Tsinghua University Press 01.09.2022 Springer Nature B.V
Témata:	Archives & records Artificial Intelligence Classification Complexity Computer Graphics Computer Science Computer vision Image Processing and Computer Vision Linearity Research Article Segmentation Semantics User Interfaces and Human Computer Interaction transformers object detection semantic segmentation image classification dense prediction
ISSN:	2096-0433, 2096-0662
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Buďte první, kdo okomentuje tento záznam!