PointCLIP: Point Cloud Understanding by CLIP
Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance on 2D visual recognition, which learns to match images with their corresponding texts in open-vocabulary settings. However, it remains under explored that whether CLIP,...
Saved in:
| Published in: | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 8542 - 8552 |
|---|---|
| Main Authors: | , , , , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.06.2022
|
| Subjects: | |
| ISSN: | 1063-6919 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance on 2D visual recognition, which learns to match images with their corresponding texts in open-vocabulary settings. However, it remains under explored that whether CLIP, pre-trained by large-scale image-text pairs in 2D, can be generalized to 3D recognition. In this paper, we identify such a setting is feasible by proposing PointCLIP, which conducts alignment between CLIP-encoded point clouds and 3D category texts. Specifically, we encode a point cloud by projecting it onto multi-view depth maps and aggregate the view-wise zero-shot prediction in an end-to-end manner, which achieves efficient knowledge transfer from 2D to 3D. We further design an inter-view adapter to better extract the global feature and adaptively fuse the 3D few-shot knowledge into CLIP pre-trained in 2D. By just fine-tuning the adapter under few-shot settings, the performance of PointCLIP could be largely improved. In addition, we observe the knowledge complementary property between PointCLIP and classical 3D-supervised networks. Via simple ensemble during inference, PointCLIP contributes to favorable performance enhancement over state-of-the-art 3D networks. Therefore, PointCLIP is a promising alternative for effective 3D point cloud understanding under low data regime with marginal resource cost. We conduct thorough experiments on Model-NetlO, ModelNet40 and ScanObjectNN to demonstrate the effectiveness of PointCLIP. Code is available at https://github.com/ZrrSkywalker/PointCLIP. |
|---|---|
| AbstractList | Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance on 2D visual recognition, which learns to match images with their corresponding texts in open-vocabulary settings. However, it remains under explored that whether CLIP, pre-trained by large-scale image-text pairs in 2D, can be generalized to 3D recognition. In this paper, we identify such a setting is feasible by proposing PointCLIP, which conducts alignment between CLIP-encoded point clouds and 3D category texts. Specifically, we encode a point cloud by projecting it onto multi-view depth maps and aggregate the view-wise zero-shot prediction in an end-to-end manner, which achieves efficient knowledge transfer from 2D to 3D. We further design an inter-view adapter to better extract the global feature and adaptively fuse the 3D few-shot knowledge into CLIP pre-trained in 2D. By just fine-tuning the adapter under few-shot settings, the performance of PointCLIP could be largely improved. In addition, we observe the knowledge complementary property between PointCLIP and classical 3D-supervised networks. Via simple ensemble during inference, PointCLIP contributes to favorable performance enhancement over state-of-the-art 3D networks. Therefore, PointCLIP is a promising alternative for effective 3D point cloud understanding under low data regime with marginal resource cost. We conduct thorough experiments on Model-NetlO, ModelNet40 and ScanObjectNN to demonstrate the effectiveness of PointCLIP. Code is available at https://github.com/ZrrSkywalker/PointCLIP. |
| Author | Qiao, Yu Li, Kunchang Zhang, Wei Gao, Peng Guo, Ziyu Cui, Bin Zhang, Renrui Miao, Xupeng Li, Hongsheng |
| Author_xml | – sequence: 1 givenname: Renrui surname: Zhang fullname: Zhang, Renrui email: zhangrenrui@pjlab.org.cn organization: Shanghai AI Laboratory – sequence: 2 givenname: Ziyu surname: Guo fullname: Guo, Ziyu organization: Peking University,School of CS and Key Lab of HCST – sequence: 3 givenname: Wei surname: Zhang fullname: Zhang, Wei organization: Shanghai AI Laboratory – sequence: 4 givenname: Kunchang surname: Li fullname: Li, Kunchang organization: Shanghai AI Laboratory – sequence: 5 givenname: Xupeng surname: Miao fullname: Miao, Xupeng organization: Peking University,School of CS and Key Lab of HCST – sequence: 6 givenname: Bin surname: Cui fullname: Cui, Bin organization: Peking University,School of CS and Key Lab of HCST – sequence: 7 givenname: Yu surname: Qiao fullname: Qiao, Yu organization: Shanghai AI Laboratory – sequence: 8 givenname: Peng surname: Gao fullname: Gao, Peng email: gaopeng@pjlab.org.cn organization: Shanghai AI Laboratory – sequence: 9 givenname: Hongsheng surname: Li fullname: Li, Hongsheng email: hsli@ee.cuhk.edu.hk organization: The Chinese University of Hong Kong,CUHK-SenseTime Joint Laboratory |
| BookMark | eNotjc1OwzAQhA0Cibb0CeCQByDprh07u9xQVGilSESIcq1s7KCg4qAkHPr2hJ_TzHwazczFWexiEOIaIUMEXpUv9ZOWhiiTIGUGQMqciDkao3PDuVGnYoZgVGoY-UIsh-EdAJRENEwzcVN3bRzLalvfJr82KQ_dl0920Yd-GG30bXxL3DH5qVyK88YehrD814XY3a-fy01aPT5sy7sqbSWoMbWQOw5Nrj0XCIrQGQ_0KjWw1dM3awkErkB0eZiooUaj9SFMITgn1UJc_e22E9t_9u2H7Y97poKYQH0DxvBClA |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/CVPR52688.2022.00836 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Applied Sciences |
| EISBN | 1665469463 9781665469463 |
| EISSN | 1063-6919 |
| EndPage | 8552 |
| ExternalDocumentID | 9878980 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-i203t-a04b9ef45d9710381b6d08c2509a5003952080b711b4e25068f51adeee25ebb23 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 303 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000870759101058&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Jun 25 06:01:13 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-a04b9ef45d9710381b6d08c2509a5003952080b711b4e25068f51adeee25ebb23 |
| PageCount | 11 |
| ParticipantIDs | ieee_primary_9878980 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-June |
| PublicationDateYYYYMMDD | 2022-06-01 |
| PublicationDate_xml | – month: 06 year: 2022 text: 2022-June |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) |
| PublicationTitleAbbrev | CVPR |
| PublicationYear | 2022 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0003211698 |
| Score | 2.6673286 |
| Snippet | Recently, zero-shot and few-shot learning via Contrastive Vision-Language Pre-training (CLIP) have shown inspirational performance on 2D visual recognition,... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 8542 |
| SubjectTerms | 3D from multi-view and sensors; Transfer/low-shot/long-tail learning; Vision + language Fuses Image recognition Knowledge engineering Point cloud compression Three-dimensional displays Training Visualization |
| Title | PointCLIP: Point Cloud Understanding by CLIP |
| URI | https://ieeexplore.ieee.org/document/9878980 |
| WOSCitedRecordID | wos000870759101058&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH7M4cHT1E38TQ4eV5e2SZt4LQ5Po4iT3UZfk8BAWtlawf_eJC1zBy_eQiiEl_LI95L3fR_AQ-IcbcPIBLpwlBxayAC5oQEywVVpU1MVyptNpIuFWK1kPoDpngujtfbNZ_rRDf1bvqrL1l2VzWx9LKSwBfpRmiYdV2t_nxLbSiaRomfHhVTOsvf81YmZuAauyMlyeh3mAw8Vf4TMR_9b_BQmv1w8ku9PmTMY6OocRj14JH1q7sYwzetN1WS2UH8ifkiyj7pVZHnIXiH4TdwnE1jOn9-yl6C3Qgg2EY2boKAMpTaMK5k6SfMQE0VFafGLdJYGseSRhX6YhiEybWcTYXhYKBtExDViFF_AsKorfQnE4imbpgU1ygjGkUmGWBqJLFUCLVy6grELfv3ZqV2s-7iv_56-gRO3u13z1C0Mm22r7-C4_Go2u-29_0U_abaPww |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFH4MFfQ0dRN_m4PH1aVp0iZei2PiHEU22W00TQIDaWXrBP97k6zMHbx4C6EQXsoj30ve930A97FztA2JCXTuKDk4F4FkBgeScqYKm5oqV95sIhmP-Wwmshb0tlwYrbVvPtMPbujf8lVVrN1VWd_Wx1xwW6DvM0oJ3rC1tjcqka1lYsEbflyIRT99z96cnIlr4SJOmNMrMe-4qPhDZND-3_LH0P1l46Fse86cQEuXp9Bu4CNqknPVgV5WLco6taX6I_JDlH5Ua4Wmu_wVJL-R-6QL08HTJB0GjRlCsCA4qoMcUym0oUyJxImahzJWmBcWwQhnahAJRiz4k0kYSqrtbMwNC3NlgyBMS0miM9grq1KfA7KIyiZqjo0ynDJJBZWyMELSRHFpAdMFdFzw88-N3sW8ifvy7-k7OBxOXkfz0fP45QqO3E5vWqmuYa9ervUNHBRf9WK1vPW_6wcZ35MK |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=PointCLIP%3A+Point+Cloud+Understanding+by+CLIP&rft.au=Zhang%2C+Renrui&rft.au=Guo%2C+Ziyu&rft.au=Zhang%2C+Wei&rft.au=Li%2C+Kunchang&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=8542&rft.epage=8552&rft_id=info:doi/10.1109%2FCVPR52688.2022.00836&rft.externalDocID=9878980 |