Token Contrast for Weakly-Supervised Semantic Segmentation

Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the local structure perception of CNN, CAM usually cannot identify the integral object regions. Though the recent Vision Transformer (ViT) c...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 3093 - 3102
Hlavní autoři: Ru, Lixiang, Zheng, Heliang, Zhan, Yibing, Du, Bo
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.06.2023
Témata:
ISSN:1063-6919
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the local structure perception of CNN, CAM usually cannot identify the integral object regions. Though the recent Vision Transformer (ViT) can remedy this flaw, we observe it also brings the over-smoothing issue, i.e., the final patch tokens incline to be uniform. In this work, we propose Token Contrast (ToCo) to address this issue and further explore the virtue of ViT for WSSS. Firstly, motivated by the observation that intermediate layers in ViT can still retain semantic diversity, we designed a Patch Token Contrast module (PTC). PTC supervises the final patch tokens with the pseudo token relations derived from intermediate layers, allowing them to align the semantic regions and thus yield more accurate CAM. Secondly, to further differentiate the low-confidence regions in CAM, we devised a Class Token Contrast module (CTC) inspired by the fact that class tokens in ViT can capture high-level semantics. CTC facilitates the representation consistency between uncertain local regions and global objects by contrasting their class tokens. Experiments on the PASCAL VOC and MS COCO datasets show the proposed ToCo can remarkably surpass other single-stage competitors and achieve comparable performance with state-of-the-art multi-stage methods. Code is available at https://github.com/rulixiang/ToCo.
AbstractList Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by the local structure perception of CNN, CAM usually cannot identify the integral object regions. Though the recent Vision Transformer (ViT) can remedy this flaw, we observe it also brings the over-smoothing issue, i.e., the final patch tokens incline to be uniform. In this work, we propose Token Contrast (ToCo) to address this issue and further explore the virtue of ViT for WSSS. Firstly, motivated by the observation that intermediate layers in ViT can still retain semantic diversity, we designed a Patch Token Contrast module (PTC). PTC supervises the final patch tokens with the pseudo token relations derived from intermediate layers, allowing them to align the semantic regions and thus yield more accurate CAM. Secondly, to further differentiate the low-confidence regions in CAM, we devised a Class Token Contrast module (CTC) inspired by the fact that class tokens in ViT can capture high-level semantics. CTC facilitates the representation consistency between uncertain local regions and global objects by contrasting their class tokens. Experiments on the PASCAL VOC and MS COCO datasets show the proposed ToCo can remarkably surpass other single-stage competitors and achieve comparable performance with state-of-the-art multi-stage methods. Code is available at https://github.com/rulixiang/ToCo.
Author Ru, Lixiang
Du, Bo
Zheng, Heliang
Zhan, Yibing
Author_xml – sequence: 1
  givenname: Lixiang
  surname: Ru
  fullname: Ru, Lixiang
  email: rulixiang@whu.edu.cn
  organization: Institute of Artificial Intelligence, School of Computer Science, National Engineering Research Center for Multimedia Software, Wuhan University,Hubei Key Laboratory of Multimedia and Network Communication Engineering,China
– sequence: 2
  givenname: Heliang
  surname: Zheng
  fullname: Zheng, Heliang
  email: zhengheliang@jd.com
  organization: JD Explore Academy,China
– sequence: 3
  givenname: Yibing
  surname: Zhan
  fullname: Zhan, Yibing
  email: zhanyibing@jd.com
  organization: JD Explore Academy,China
– sequence: 4
  givenname: Bo
  surname: Du
  fullname: Du, Bo
  email: dubo@whu.edu.cn
  organization: Institute of Artificial Intelligence, School of Computer Science, National Engineering Research Center for Multimedia Software, Wuhan University,Hubei Key Laboratory of Multimedia and Network Communication Engineering,China
BookMark eNotjttKw0AURUdRsNb8QR_yA6lnzmRuvknwBgXFVn0sk8mJjG0mJRmF_r0Bfdp7w2KxL9lZ7CMxtuCw5BzsdfX-8ipRo10ioFgCCMATllltjZDT4GjNKZtxUKJQltsLlo3jF0wccq6smbGbTb-jmFd9TIMbU972Q_5Bbrc_FuvvAw0_YaQmX1PnYgp-Kp8dxeRS6OMVO2_dfqTsP-fs7f5uUz0Wq-eHp-p2VQSEMhWlqV2LTkwnwFgArZEkOcWdx6Y2rgEppZe6rU1Lmsh4r5DXknStoHRezNnizxuIaHsYQueG45bDZJcKxS-Bi0ss
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52729.2023.00302
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9798350301298
EISSN 1063-6919
EndPage 3102
ExternalDocumentID 10204562
Genre orig-research
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62225113,62002090
  funderid: 10.13039/501100001809
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i204t-48baf2a310608900772e5ea61ac2db8ad0555c57fb8fe7ee8cc621b5e7b604ac3
IEDL.DBID RIE
ISICitedReferencesCount 133
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001058542603040&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:56:29 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-48baf2a310608900772e5ea61ac2db8ad0555c57fb8fe7ee8cc621b5e7b604ac3
PageCount 10
ParticipantIDs ieee_primary_10204562
PublicationCentury 2000
PublicationDate 2023-June
PublicationDateYYYYMMDD 2023-06-01
PublicationDate_xml – month: 06
  year: 2023
  text: 2023-June
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.6227999
Snippet Weakly-Supervised Semantic Segmentation (WSSS) using image-level labels typically utilizes Class Activation Map (CAM) to generate the pseudo labels. Limited by...
SourceID ieee
SourceType Publisher
StartPage 3093
SubjectTerms Codes
Computer vision
Image analysis
Pattern recognition
Scene analysis and understanding
Semantic segmentation
Semantics
Transformers
Title Token Contrast for Weakly-Supervised Semantic Segmentation
URI https://ieeexplore.ieee.org/document/10204562
WOSCitedRecordID wos001058542603040&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED7RioGpPIp4KwOrS-LEL9aKiqmqoEC3ynYuqCpNqyZF4t9juwHEwMBmWZZsnX32d-f77gCuqU5SHit3-6VMkazgSAxnSKiWWklrKKY2FJsQw6GcTNSoIasHLgwihuAz7Plm-MvPl3bjXWVOw2lA7C1oCSG2ZK1vh0rqTBmuZEOPS2J1038ePTDq0GPP1wjv-fNMfxVRCW_IoPPP2feh-8PGi0bf78wB7GB5CJ0GPkaNclZHcDtezrGMfL6pta7qyMHR6AX1_O2DPG5W_k6o_HhcOGHOrGu8LhriUdmFp8HduH9PmtIIZOYWUZNMGl1Q7bAZj6XyOXkoMtQ80ZbmRurc5_GyTBRGFigQpbWcJoahMDzOtE2PoV0uSzyBiOlCaY6M5oxm6Af4v8wszhNnv5oET6HrZTFdbbNfTL_EcPZH_znseXFvw6kuoF2vN3gJu_a9nlXrq7BnnxrHl4U
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8MwDLZgIMFpPIZ40wPXjjZN0oTrxDTEmCYYj9uUpC6axrpp7ZD49yRdGeLAgVsURUrkxMlnx58NcElUGPFA2tsvYtKnKUdfc4Y-UUJJYTTByJTFJuJeT7y-yn5FVi-5MIhYBp9h0zXLv_xkahbOVWY1nJSIfR02GKUkXNK1Vi6VyBozXIqKIBcG8qr13H9gxOLHpqsS3nQnmvwqo1K-Iu36P-ffgcYPH8_rr16aXVjDbA_qFYD0KvXM9-F6MB1j5rmMU3OVF54FpN4LqvH7p_-4mLlbIXfjcWLFOTK28TapqEdZA57aN4NWx6-KI_gju4jCp0KrlCiLznggpMvKQ5Ch4qEyJNFCJS6Tl2FxqkWKMaIwhpNQM4w1D6gy0QHUsmmGh-AxlUrFkZGEEYpugPvNpEESWgtWh3gEDSeL4WyZ_2L4LYbjP_ovYKszuO8Ou7e9uxPYdqJfBledQq2YL_AMNs1HMcrn5-X-fQEuE5rM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Token+Contrast+for+Weakly-Supervised+Semantic+Segmentation&rft.au=Ru%2C+Lixiang&rft.au=Zheng%2C+Heliang&rft.au=Zhan%2C+Yibing&rft.au=Du%2C+Bo&rft.date=2023-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=3093&rft.epage=3102&rft_id=info:doi/10.1109%2FCVPR52729.2023.00302&rft.externalDocID=10204562