A Simple Framework for Text-Supervised Semantic Segmentation
Text-supervised semantic segmentation is a novel research topic that allows semantic segments to emerge with image-text contrasting. However, pioneering methods could be subject to specifically designed network architectures. This paper shows that a vanilla contrastive language-image pretraining (CL...
Uložené v:
| Vydané v: | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) s. 7071 - 7080 |
|---|---|
| Hlavní autori: | , , , , , |
| Médium: | Konferenčný príspevok.. |
| Jazyk: | English Japanese |
| Vydavateľské údaje: |
IEEE
01.06.2023
|
| Predmet: | |
| ISSN: | 1063-6919 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Text-supervised semantic segmentation is a novel research topic that allows semantic segments to emerge with image-text contrasting. However, pioneering methods could be subject to specifically designed network architectures. This paper shows that a vanilla contrastive language-image pretraining (CLIP) model is an effective text-supervised semantic segmentor by itself. First, we reveal that a vanilla CLIP is inferior to localization and segmentation due to its optimization being driven by densely aligning visual and language representations. Second, we propose the locality-driven alignment (LoDA) to address the problem, where CLIP optimization is driven by sparsely aligning local representations. Third, we propose a simple segmentation (SimSeg) framework. LoDA and SimSeg jointly amelio-rate a vanilla CLIP to produce impressive semantic segmentation results. Our method outperforms previous state-of-the-art methods on PASCAL VOC 2012, PASCAL Context and COCO datasets by large margins. Code and models are available at github.com/muyangyi/SimSeg. |
|---|---|
| AbstractList | Text-supervised semantic segmentation is a novel research topic that allows semantic segments to emerge with image-text contrasting. However, pioneering methods could be subject to specifically designed network architectures. This paper shows that a vanilla contrastive language-image pretraining (CLIP) model is an effective text-supervised semantic segmentor by itself. First, we reveal that a vanilla CLIP is inferior to localization and segmentation due to its optimization being driven by densely aligning visual and language representations. Second, we propose the locality-driven alignment (LoDA) to address the problem, where CLIP optimization is driven by sparsely aligning local representations. Third, we propose a simple segmentation (SimSeg) framework. LoDA and SimSeg jointly amelio-rate a vanilla CLIP to produce impressive semantic segmentation results. Our method outperforms previous state-of-the-art methods on PASCAL VOC 2012, PASCAL Context and COCO datasets by large margins. Code and models are available at github.com/muyangyi/SimSeg. |
| Author | Yang, Cheng Yoshie, Osamu Lu, Hongtao Cui, Quan Wu, Hao Yi, Muyang |
| Author_xml | – sequence: 1 givenname: Muyang surname: Yi fullname: Yi, Muyang organization: AI Institute, Shanghai Jiao Tong University,MoE Key Lab of Artificial Intelligence,Department of Computer Science and Engineering – sequence: 2 givenname: Quan surname: Cui fullname: Cui, Quan organization: Waseda University – sequence: 3 givenname: Hao surname: Wu fullname: Wu, Hao organization: ByteDance Inc – sequence: 4 givenname: Cheng surname: Yang fullname: Yang, Cheng organization: ByteDance Inc – sequence: 5 givenname: Osamu surname: Yoshie fullname: Yoshie, Osamu organization: Waseda University – sequence: 6 givenname: Hongtao surname: Lu fullname: Lu, Hongtao organization: AI Institute, Shanghai Jiao Tong University,MoE Key Lab of Artificial Intelligence,Department of Computer Science and Engineering |
| BookMark | eNotjMtKw0AUQEdRsNb8QRf5gcQ7c2cyueCmFFuFgmKq2zJN7sho8yCJr783oKtzFodzKc6atmEhFhJSKYGuVy-PT0ZZRakChSlAluOJiMhSjgYQpKL8VMwkZJhkJOlCRMPwBgCopMwon4mbZVyEujtyvO5dzV9t_x77to93_D0mxUfH_WcYuIoLrl0zhnKS15qb0Y2hba7EuXfHgaN_zsXz-na3uku2D5v71XKbBEU0JoRsKyXZlNrnDiWVoKAko5VBq5X0ptKarK88WX3InM2n2Btdep0R6wPOxeLvG5h53_Whdv3PXk4XRDT4C8j5Sq8 |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/CVPR52729.2023.00683 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences |
| EISBN | 9798350301298 |
| EISSN | 1063-6919 |
| EndPage | 7080 |
| ExternalDocumentID | 10203335 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: NSFC grantid: 62176155 funderid: 10.13039/501100001807 |
| GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-i299t-93e7d21e5c4f8a319c020c9542537421f5d4497fdf974b6a7821ef54cf469e4b3 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 29 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001058542607041&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:56:33 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English Japanese |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i299t-93e7d21e5c4f8a319c020c9542537421f5d4497fdf974b6a7821ef54cf469e4b3 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_10203335 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-06-01 |
| PublicationDateYYYYMMDD | 2023-06-01 |
| PublicationDate_xml | – month: 06 year: 2023 text: 2023-06-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) |
| PublicationTitleAbbrev | CVPR |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003211698 |
| Score | 2.3920157 |
| Snippet | Text-supervised semantic segmentation is a novel research topic that allows semantic segments to emerge with image-text contrasting. However, pioneering... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 7071 |
| SubjectTerms | and reasoning Codes Computer vision language Location awareness Network architecture Semantic segmentation Semantics Vision Visualization |
| Title | A Simple Framework for Text-Supervised Semantic Segmentation |
| URI | https://ieeexplore.ieee.org/document/10203335 |
| WOSCitedRecordID | wos001058542607041&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED7RioGpPIp4ywNrSmI7dSyxoIqKqapoQd0q1z6jDE2rvn4_5yQtLAxMsWJZiS76fPc5990BPHqfGEPUJ_I6jiNpnCXMcRMRX-aoM-ekqZpNqMEgm0z0sBarl1oYRCyTz7AThuW_fLew23BURgjnsRAibUBDKVWJtQ4HKoKoTFdntTwuifVT73P4nnKKHjuhR3jI4ArFAX81USl9SL_1z6efQvtHjceGBz9zBkdYnEOrDh9ZDc71BTy_sFEeiv2y_j7jilFIysaB3I62y7ArrMMKnJM5c0uDr3ktPSra8NF_Hffeoro5QpSTB9lEWqByPMHUSp8ZApKl17M6JQwKoruJT52UWnnniTHMuoYigQR9Kq0nQoxyJi6hWSwKvAJGHJFTpECzZiatU1pxQjXSRRlPU9fQDtaYLqv6F9O9IW7-uH8LJ8HgVULVHTQ3qy3ew7HdbfL16qH8at8yDZcN |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEJ4omugJHxjf7sFrsd0H7SZeDJFgREIEDTey7IP0QCE8_P3OtgW9ePDUTTebNtN8O_Nt55sBuHcuUgqpT-BkGAZcGY2YoypAvkytTIzhqmg2EXe7yXAoe6VYPdfCWGvz5DNb98P8X76Z6bU_KkOE05AxJnZhT3BOo0KutT1SYUhmGjIpBXJRKB-an713QTF-rPsu4T6Hy5cH_NVGJfcireo_n38EtR89HultPc0x7NjsBKplAElKeC5P4fGJ9FNf7pe0NjlXBINSMvD0tr-e-31h6VfYKRo01TiYTEvxUVaDj9bzoNkOyvYIQYo-ZBVIZmNDIys0d4lCKGl8PS0FopAh4Y2cMJzL2BmHnGHcUBgLRNYJrh1SYsvH7Awq2Syz50CQJVKMFXBWjbk2sYwp4triJVYOpy6g5q0xmhcVMEYbQ1z-cf8ODtqDt86o89J9vYJDb_wiveoaKqvF2t7Avv5apcvFbf4FvwFBEZpU |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=A+Simple+Framework+for+Text-Supervised+Semantic+Segmentation&rft.au=Yi%2C+Muyang&rft.au=Cui%2C+Quan&rft.au=Wu%2C+Hao&rft.au=Yang%2C+Cheng&rft.date=2023-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=7071&rft.epage=7080&rft_id=info:doi/10.1109%2FCVPR52729.2023.00683&rft.externalDocID=10203335 |