TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices. In this paper, we present a mobile-friendly architecture named Token Pyramid Vision T...
Uložené v:
| Vydané v: | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 12073 - 12083 |
|---|---|
| Hlavní autori: | , , , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English |
| Vydavateľské údaje: |
IEEE
01.06.2022
|
| Predmet: | |
| ISSN: | 1063-6919 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices. In this paper, we present a mobile-friendly architecture named Token Pyramid Vision Transformer (TopFormer). The proposed TopFormer takes Tokens from various scales as input to produce scale-aware semantic features, which are then in-Jected into the corresponding tokens to augment the representation. Experimental results demonstrate that our method significantly outperforms CNN- and ViT-based networks across several semantic segmentation datasets and achieves a good trade-off between accuracy and latency. On the ADE20K dataset, TopFormer achieves 5% higher accuracy in mIoU than MobileNetV3 with lower latency on an ARM-based mobile device. Furthermore, the tiny version of TopFormer achieves real-time inference on an ARM-based mobile device with competitive results. The code and models are available at: https://github.com/hustvl/TopFormer. |
|---|---|
| AbstractList | Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices. In this paper, we present a mobile-friendly architecture named Token Pyramid Vision Transformer (TopFormer). The proposed TopFormer takes Tokens from various scales as input to produce scale-aware semantic features, which are then in-Jected into the corresponding tokens to augment the representation. Experimental results demonstrate that our method significantly outperforms CNN- and ViT-based networks across several semantic segmentation datasets and achieves a good trade-off between accuracy and latency. On the ADE20K dataset, TopFormer achieves 5% higher accuracy in mIoU than MobileNetV3 with lower latency on an ARM-based mobile device. Furthermore, the tiny version of TopFormer achieves real-time inference on an ARM-based mobile device with competitive results. The code and models are available at: https://github.com/hustvl/TopFormer. |
| Author | Zhang, Wenqiang Shen, Chunhua Wang, Xinggang Luo, Guozhong Yu, Gang Chen, Tao Huang, Zilong Liu, Wenyu |
| Author_xml | – sequence: 1 givenname: Wenqiang surname: Zhang fullname: Zhang, Wenqiang organization: Huazhong University of Science and Technology,China – sequence: 2 givenname: Zilong surname: Huang fullname: Huang, Zilong organization: Tencent PCG,China – sequence: 3 givenname: Guozhong surname: Luo fullname: Luo, Guozhong organization: Tencent PCG,China – sequence: 4 givenname: Tao surname: Chen fullname: Chen, Tao organization: Fudan University,China – sequence: 5 givenname: Xinggang surname: Wang fullname: Wang, Xinggang email: xgwang@hust.edu.cn organization: Huazhong University of Science and Technology,China – sequence: 6 givenname: Wenyu surname: Liu fullname: Liu, Wenyu email: liuwy@hust.edu.cn organization: Huazhong University of Science and Technology,China – sequence: 7 givenname: Gang surname: Yu fullname: Yu, Gang organization: Tencent PCG,China – sequence: 8 givenname: Chunhua surname: Shen fullname: Shen, Chunhua organization: Zhejiang University,China |
| BookMark | eNotjN1KwzAYQKMouM09gV7kBTq_L2nz452UTYWJQ6u3I22-SHRNR9ubvf2GenUOHDhTdpG6RIzdIiwQwd6Vn5u3QihjFgKEWACi1mdsikoVubK5kudsgqBkpizaKzYfhm8AkAJRWTNhy6rbr7q-pf6eV90PJb459K6Nnle9S0P4TfwE_tLVcUf8nVqXxtic5KulNLoxdumaXQa3G2j-zxn7WC2r8ilbvz4-lw_rLAqQYyakr53W3hFogV4pdEJigYV2pL3yhQBsQm1NAz5oUFbntUcnpQmerAtyxm7-vpGItvs-tq4_bK0xIMDIIzrUThs |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/CVPR52688.2022.01177 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences |
| EISBN | 1665469463 9781665469463 |
| EISSN | 1063-6919 |
| EndPage | 12083 |
| ExternalDocumentID | 9880208 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: NSFC grantid: 61733007,61876212,62071127,61773176 funderid: 10.13039/501100001809 |
| GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-i203t-23dba77dae0721d661a2315157ae7d6d5201cfb98c0df706974bd1a338fde9af3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 258 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000870759105016&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:15:10 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-23dba77dae0721d661a2315157ae7d6d5201cfb98c0df706974bd1a338fde9af3 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_9880208 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-June |
| PublicationDateYYYYMMDD | 2022-06-01 |
| PublicationDate_xml | – month: 06 year: 2022 text: 2022-June |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) |
| PublicationTitleAbbrev | CVPR |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003211698 |
| Score | 2.6487713 |
| Snippet | Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 12073 |
| SubjectTerms | Computer architecture Computer vision Deep learning Deep learning architectures and techniques; Segmentation grouping and shape analysis Mobile handsets Semantics Shape Transformers |
| Title | TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation |
| URI | https://ieeexplore.ieee.org/document/9880208 |
| WOSCitedRecordID | wos000870759105016&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZKxcBUoEW85YGRtKntxDFr1YoBqggC6lY59gVFqEmVtkj8e85pKEJiYfLJi6Xz474733dHyI3mZpgJi75JGClPCMG9VBvlRQ5sG9fnStRE4Qc5nUazmYpb5HbHhQGAOvkM-k6s__JtaTYuVDZQeNiYY_buSSm3XK1dPIWjJxOqqGHHDX01GL3GT66YiUvgYqzvip_97qFSm5BJ53-LH5LeDxePxjsrc0RaUByTTgMeaXM1V10yTsrlBAEoVHc0Kd-hoPFnpRe5pck3NoWK4kAfyxSfAvoMC9RqblB4WzQMpKJHXibjZHTvNT0SvJz5fO0xblMtpdXgCp1ZtLYaERuCFKlB2tAGaOBNlqrI-DaTfojuQ2qHGh3TzILSGT8h7aIs4JTQMJA2wu3hgmuhQqVDnoogYCliDMWAnZGu08p8uS2DMW8Ucv739AU5cGrfZlVdkva62sAV2Tcf63xVXdd79wV8eZmj |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwEA5DBX2auom_zYOPdmuTtGl8HRsTt1G0yt5GmqRSZO3oNsH_3ktXJ4IvPuXIS-AuyX2X3HeH0K2kykuZhtgkCIXDGKNOIpVwQgu2le1zxSqi8IhPJuF0KqIGuttyYYwxVfKZ6Vix-svXhVrbp7KugM1GLLN312eMeBu21vZFhUIsE4iw5sd5ruj2XqMnW87EpnAR0rHlz353UamcyKD5v-UPUfuHjYejrZ85Qg2TH6NmDR9xfTiXLdSPi8UAIKgp73FcvJscR5-lnGcax9_o1JQYBjwuErgM8LOZg14zBcLbvOYg5W30MujHvaFTd0lwMuLSlUOoTiTnWhpb6kyDv5WA2QCmcGm4DrQPLl6liQiVq1PuBhBAJNqTEJqm2giZ0hO0kxe5OUU48LkOwUCUUclEIGRAE-b7JAGUIYghZ6hltTJbbAphzGqFnP89fYP2h_F4NBs9TB4v0IE1wSbH6hLtrMq1uUJ76mOVLcvryo5fqxec6g |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=TopFormer%3A+Token+Pyramid+Transformer+for+Mobile+Semantic+Segmentation&rft.au=Zhang%2C+Wenqiang&rft.au=Huang%2C+Zilong&rft.au=Luo%2C+Guozhong&rft.au=Chen%2C+Tao&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=12073&rft.epage=12083&rft_id=info:doi/10.1109%2FCVPR52688.2022.01177&rft.externalDocID=9880208 |