PointCLIP: Point Cloud Understanding by CLIP
Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance on 2D visual recognition, which learns to match images with their corresponding texts in open-vocabulary settings. However, it remains under explored that whether CLIP,...
Gespeichert in:
| Veröffentlicht in: | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) S. 8542 - 8552 |
|---|---|
| Hauptverfasser: | , , , , , , , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.06.2022
|
| Schlagworte: | |
| ISSN: | 1063-6919 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance on 2D visual recognition, which learns to match images with their corresponding texts in open-vocabulary settings. However, it remains under explored that whether CLIP, pre-trained by large-scale image-text pairs in 2D, can be generalized to 3D recognition. In this paper, we identify such a setting is feasible by proposing PointCLIP, which conducts alignment between CLIP-encoded point clouds and 3D category texts. Specifically, we encode a point cloud by projecting it onto multi-view depth maps and aggregate the view-wise zero-shot prediction in an end-to-end manner, which achieves efficient knowledge transfer from 2D to 3D. We further design an inter-view adapter to better extract the global feature and adaptively fuse the 3D few-shot knowledge into CLIP pre-trained in 2D. By just fine-tuning the adapter under few-shot settings, the performance of PointCLIP could be largely improved. In addition, we observe the knowledge complementary property between PointCLIP and classical 3D-supervised networks. Via simple ensemble during inference, PointCLIP contributes to favorable performance enhancement over state-of-the-art 3D networks. Therefore, PointCLIP is a promising alternative for effective 3D point cloud understanding under low data regime with marginal resource cost. We conduct thorough experiments on Model-NetlO, ModelNet40 and ScanObjectNN to demonstrate the effectiveness of PointCLIP. Code is available at https://github.com/ZrrSkywalker/PointCLIP. |
|---|---|
| AbstractList | Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance on 2D visual recognition, which learns to match images with their corresponding texts in open-vocabulary settings. However, it remains under explored that whether CLIP, pre-trained by large-scale image-text pairs in 2D, can be generalized to 3D recognition. In this paper, we identify such a setting is feasible by proposing PointCLIP, which conducts alignment between CLIP-encoded point clouds and 3D category texts. Specifically, we encode a point cloud by projecting it onto multi-view depth maps and aggregate the view-wise zero-shot prediction in an end-to-end manner, which achieves efficient knowledge transfer from 2D to 3D. We further design an inter-view adapter to better extract the global feature and adaptively fuse the 3D few-shot knowledge into CLIP pre-trained in 2D. By just fine-tuning the adapter under few-shot settings, the performance of PointCLIP could be largely improved. In addition, we observe the knowledge complementary property between PointCLIP and classical 3D-supervised networks. Via simple ensemble during inference, PointCLIP contributes to favorable performance enhancement over state-of-the-art 3D networks. Therefore, PointCLIP is a promising alternative for effective 3D point cloud understanding under low data regime with marginal resource cost. We conduct thorough experiments on Model-NetlO, ModelNet40 and ScanObjectNN to demonstrate the effectiveness of PointCLIP. Code is available at https://github.com/ZrrSkywalker/PointCLIP. |
| Author | Qiao, Yu Li, Kunchang Zhang, Wei Gao, Peng Guo, Ziyu Cui, Bin Zhang, Renrui Miao, Xupeng Li, Hongsheng |
| Author_xml | – sequence: 1 givenname: Renrui surname: Zhang fullname: Zhang, Renrui email: zhangrenrui@pjlab.org.cn organization: Shanghai AI Laboratory – sequence: 2 givenname: Ziyu surname: Guo fullname: Guo, Ziyu organization: Peking University,School of CS and Key Lab of HCST – sequence: 3 givenname: Wei surname: Zhang fullname: Zhang, Wei organization: Shanghai AI Laboratory – sequence: 4 givenname: Kunchang surname: Li fullname: Li, Kunchang organization: Shanghai AI Laboratory – sequence: 5 givenname: Xupeng surname: Miao fullname: Miao, Xupeng organization: Peking University,School of CS and Key Lab of HCST – sequence: 6 givenname: Bin surname: Cui fullname: Cui, Bin organization: Peking University,School of CS and Key Lab of HCST – sequence: 7 givenname: Yu surname: Qiao fullname: Qiao, Yu organization: Shanghai AI Laboratory – sequence: 8 givenname: Peng surname: Gao fullname: Gao, Peng email: gaopeng@pjlab.org.cn organization: Shanghai AI Laboratory – sequence: 9 givenname: Hongsheng surname: Li fullname: Li, Hongsheng email: hsli@ee.cuhk.edu.hk organization: The Chinese University of Hong Kong,CUHK-SenseTime Joint Laboratory |
| BookMark | eNotjc1OwzAQhA0Cibb0CeCQByDprh07u9xQVGilSESIcq1s7KCg4qAkHPr2hJ_TzHwazczFWexiEOIaIUMEXpUv9ZOWhiiTIGUGQMqciDkao3PDuVGnYoZgVGoY-UIsh-EdAJRENEwzcVN3bRzLalvfJr82KQ_dl0920Yd-GG30bXxL3DH5qVyK88YehrD814XY3a-fy01aPT5sy7sqbSWoMbWQOw5Nrj0XCIrQGQ_0KjWw1dM3awkErkB0eZiooUaj9SFMITgn1UJc_e22E9t_9u2H7Y97poKYQH0DxvBClA |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/CVPR52688.2022.00836 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore Digital Library IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore Digital Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences |
| EISBN | 1665469463 9781665469463 |
| EISSN | 1063-6919 |
| EndPage | 8552 |
| ExternalDocumentID | 9878980 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-i203t-a04b9ef45d9710381b6d08c2509a5003952080b711b4e25068f51adeee25ebb23 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 303 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000870759101058&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Jun 25 06:01:13 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-a04b9ef45d9710381b6d08c2509a5003952080b711b4e25068f51adeee25ebb23 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_9878980 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-June |
| PublicationDateYYYYMMDD | 2022-06-01 |
| PublicationDate_xml | – month: 06 year: 2022 text: 2022-June |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) |
| PublicationTitleAbbrev | CVPR |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003211698 |
| Score | 2.6673737 |
| Snippet | Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance on 2D visual recognition,... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 8542 |
| SubjectTerms | 3D from multi-view and sensors; Transfer/low-shot/long-tail learning; Vision + language Fuses Image recognition Knowledge engineering Point cloud compression Three-dimensional displays Training Visualization |
| Title | PointCLIP: Point Cloud Understanding by CLIP |
| URI | https://ieeexplore.ieee.org/document/9878980 |
| WOSCitedRecordID | wos000870759101058&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH7M4cHT1E38TQ4eF9emaZLntTg8yCjixm5jaVIYSCtbJ_jfm2Rl7uDFWwiB8BJCvi953_cAHpSQhpmCUQc2UspjVVLEklHkbrwQSpbBrHr2KicTNZ9j3oHhXgtjrQ3JZ_bRN8NfvqmLrX8qGzl-rFA5gn4kpdhptfbvKYljMgJVq46LIxxls_zNm5n4BC7mbTmDD_NBDZVwhYx7_5v8FAa_WjyS72-ZM-jY6hx6LXgk7dHc9GGY16uqyRxRfyKhSbKPemvI9FC9QvQ38UMGMB0_v2cvtC2FQFcsShq6jLhGW_LUoPSW5rEWJlKFwy_oSxokmDIH_bSMY82t6xWqTOOlcUGw1GrNkgvoVnVlL4EYB4kSLU0prP9Ts5g6hsbRysIi8lJfQd8Hv_jcuV0s2riv_-6-gRO_urvkqVvoNuutvYPj4qtZbdb3YYt-AJEcjzc |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH6MKehp6ib-NgePi2vTNsnzWhwT5yiyjd3G0qQwkFa2TvC_N-nK3MGLtxAC4SWEfO_H9z2AB8mFZjpl1IKNiIa-zChixiiGdj3nUmSVWPV0KEYjOZth0oDujgtjjKmKz8yjG1a5fF2kGxcq61n_WKK0DvqB65xVs7V2EZXA-jIcZc2P8z3sxdPk3cmZuBIu5oQ5KyXmvS4q1SfSb_1v-xPo_LLxSLL7Z06hYfIzaNXwkdSPc92GblIs8zK2rvoTqYYk_ig2mkz2-StEfRO3pAOT_vM4HtC6GQJdMi8o6cILFZosjDQKJ2ruK649mVoEg66pQYARs-BPCd9XobGzXGaRv9DWCBYZpVhwDs28yM0FEG1BUaCEzrhxWTWD9jRliEakBjHM1CW0nfHzz63exby2--rv6Xs4GozfhvPhy-j1Go7dSW9LqW6gWa425hYO069yuV7dVdf1Az1KkoA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=PointCLIP%3A+Point+Cloud+Understanding+by+CLIP&rft.au=Zhang%2C+Renrui&rft.au=Guo%2C+Ziyu&rft.au=Zhang%2C+Wei&rft.au=Li%2C+Kunchang&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=8542&rft.epage=8552&rft_id=info:doi/10.1109%2FCVPR52688.2022.00836&rft.externalDocID=9878980 |